CS 441: MODERN COMPUTER ARCHITECTURE -------------------------------------- Assignment #2 ------------- Due date: 02/19/99 1. Problem 3.9 (Chap. 3, p. 218) of text (H&P). 2. (a) Compile the following segment of code so as to minimize the execution time in cycles ( 1 cycle is a stage delay in the DLX pipeline.) What is the total execution time? double a[4001], k; int i; . . . for (i=0; i<4000; i++) a[i] = (a[i] + a[i+1])*k; // S1 (b) Suppose the * in statement S1 was replaced by /, show the new program and calculate the new execution time. Assume the following: No more than four unrolled loops may comprise the body of a single loop iteration in the transformed code. The FP add unit takes 4 cycles and is pipelined (4 stages). The FP multiply unit takes 5 cycles and is pipelined (5 stages). The FP divide unit takes 6 cycles and is not pipelined. --- All functional units can execute in parallel. We have access to 16 (double precision) FP registers. The scalar k is initially in register F0. R1 initially holds the address of element a[0] and the address of a[4000] is 0. Both the branch condition and the branch target address are computed in the second stage (ID) of the DLX pipe. Forwarding from pipeline registers to previous stages is supported in hardware.