CS 441: MODERN COMPUTER ARCHITECTURE -------------------------------------- Assignment #5 ------------- Due date: 05/03/99 1. The semantics of the atomic fetch_and_add operation is that it adds its second argument to the memory location in its first argument and returns the value of the memory location as it was before the addition. Consider the following solution to implementing a barrier using the fetch_and_add. BARRIER (B, N) { if ( fetch_and_add(B, 1) == N-1) B = 0; else while (B != 0); } The barrier should be capable of supporting the following code being run in parallel by N processors. while (condition) { Compute for a while BARRIER (BAR, N); } Is there a problem with the implementation of the barrier? If so, explain. 2. Consider a bus-based shared memory multiprocessor with the following characteristics: * The bus has 64 data lines and 32 address lines. (Each address is 32 bits and instruction is also 32 bits.) * The processor clock is twice as fast as the bus clock. * Each processor has a split instruction/data cache (32 KB) * The cache block size is 32 bytes. * Snooping caches using an invalidate protocol with write back are employed (MSI) * The access time of main (shared) memory is 10 CPU cycles. The following are measurements made for a suite of programs running on this machine. * The average memory data traffic is constituted thus: * private reads are 70% * private writes are 20% * shared reads are 8% * shared writes are 2% * Cache hit rates are as follows: * 97% for private data * 95% for shared data * 98.5% for instructions * 40% of the instructions are either loads or stores * The CPI ignoring miss penalties is 2.0 (a) What is the effective CPI after factoring cache misses? (b) What is the maximum number of processors that can be supported without saturating the bus? (c) Is there an advantage in using an MESI (Modified/Exclusive/ Shared/Invalid) protocol instead? Any reasonable assumptions you make should be clearly stated at the begining of your answer.