logo资料库

computer architecture solution.pdf

第1页 / 共20页
第2页 / 共20页
第3页 / 共20页
第4页 / 共20页
第5页 / 共20页
第6页 / 共20页
第7页 / 共20页
第8页 / 共20页
资料共20页,剩余部分请下载后查看
2 ■ Solutions to Case Studies and Exercises Appendix A Solutions (by Rui Ma and Gregory D. Peterson) A.1 The first challenge of this exercise is to obtain the instruction mix. The instruction frequencies in Figure A.27 must add to 100, although gap and gcc add to 100.2 and 99.5 percent, respectively, because of rounding error. Because each total must in reality be 100, we should not attempt to scale the per instruction average frequen- cies by the shown totals of 100.2 and 99.5. However, in computing the average fre- quencies to one significant digit to the right of the decimal point, we should be careful to use an unbiased rounding scheme so that the total of the averaged fre- quencies is kept as close to 100 as possible. One such scheme is called round to even, which makes the least significant digit always even. For example, 0.15 rounds to 0.2, but 0.25 also rounds to 0.2. For a summation of terms, round to even will not accumulate an error as would, for example, rounding up where 0.15 rounds to 0.2 and 0.25 rounds to 0.3. For gap and gcc the average instruction frequencies are shown in Figure S.12. Instruction load store add sub mul compare load imm cond branch cond move jump call return shift and or xor other logical Average of gap and gcc % 25.8 11.8 20.0 2.0 0.8 4.4 3.6 10.7 0.5 0.8 1.1 1.1 2.4 4.4 8.2 2.0 0.2 Figure S.12 MIPS dynamic instruction mix average for gap and gcc. The exercise statement gives CPI information in terms of four major instruction categories, with two subcategories for conditional branches. To compute the Copyright © 2012 Elsevier, Inc. All rights reserved.
Appendix A Solutions ■ 3 average CPI we need to aggregate the instruction frequencies in Figure S.12 to match these categories. This is the second challenge, because it is easy to mis- categorize instructions. The four main categories are ALU, load/store, condi- tional branch, and jumps. ALU instructions are any that take operands from the set of registers and return a result to that set of registers. Load/store instructions access memory. Conditional branch instructions must be able to set the program counter to a new value based on a condition. Jump-type instructions set the pro- gram counter to a new value no matter what. With the above category definitions, the frequency of ALU instructions is the sum of the frequencies of the add, sub, mul, compare, load imm (remember, this instruction does not access memory, instead the value to load is encoded in a field within the instruction itself), cond move (implemented as an OR instruction between a controlling register and the register with the data to move to the desti- nation register), shift, and, or, xor, and other logical for a total of 48.5%. The fre- quency of load/store instructions is the sum of the frequencies of load and store for a total of 37.6%. The frequency of conditional branches is 10.7%. Finally, the frequency of jumps is the sum of the frequencies of the jump-type instructions, namely jump, call, and return, for a total of 3.0%. Now ∑= Instruction category frequency Clock cycles for category × ) ( + 0.367 ) 1.4( ) + ( 0.107 ) 0.6( ( ) 2.0( ) 1 0.6–( ) 1.5( ) ) + + ( 0.03 ) 1.2( ) + Effective CPI categories ( ) 1.0( 0.485 1.24 = = A.2 See the solution for A.1 (above) for a discussion regarding the solution methodol- ogy for this exercise. As with problem A.1, the frequency of ALU instructions is the sum of the fre- quencies of the add, sub, mul, compare, load imm (remember, this instruction does not access memory, instead the value to load is encoded in a field within the instruction itself ), cond move (implemented as an OR instruction between a con- trolling register and the register with the data to move to the destination register), shift, and, or, xor, and other logical for a total of 51.1%. The frequency of load/ store instructions is the sum of the frequencies of load and store for a total of 35.0%. The frequency of conditional branches is 11.0%. Finally, the frequency of jumps is the sum of the frequencies of the jump-type instructions, namely jump, call, and return, for a total of 2.8%. × 11.0% 60% 1.5 11.0% 40% 1.2 2.8% 1.23 35.0% 2.0 + × × × × × + × CPI = 1.0 51.1% 1.4 + + = Copyright © 2012 Elsevier, Inc. All rights reserved.
4 ■ Solutions to Case Studies and Exercises Instruction load store add sub mul compare load imm cond branch cond move jump call return shift and or xor other logical Average of gzip and perlbmk % 24.4 10.6 21.8 3.8 – 5.2 1.6 11.0 1.5 1.2 0.8 0.8 1.3 5.3 6.8 3.6 0.2 Figure S.13 MIPS dynamic instruction mix average for gzip and perlbmk. A.3 This exercise is similar to A.1 and A.2, but focuses on floating point intensive programs. Instruction Load Store Add Sub Mul Compare load imm cond branch cond move Jump Call Return Shift Figure S.14 Continued Average of gzip and perlbmk % 9.8 2.4 17.8 3.0 0.6 – 5.6 1.0 – – – – 1.0 Copyright © 2012 Elsevier, Inc. All rights reserved.
And Or Xor other logical load FP store FP add FP sub FP mul FP div FP move reg-reg FP compare FP cond mov FP other FP Appendix A Solutions ■ 5 0.9 4.1 – – 16.5 11.6 8.6 6.2 8.2 0.2 1.4 0.4 0.4 0.8 Figure S.14 MIPS dynamic instruction mix for lucas and swim. ALU instructions: (17.8% + 3.0% + 0.6% + 5.6% + 1.0% + 0.9% + 4.1%) = 33.0% Load-stores: (9.4% + 2.4%) = 11.8% Conditional branches: 1.0% Jumps: 0% FP add: (8.6% + 6.2%) = 14.8% Load-store FP: (16.5% + 11.6%) = 28.1% Other FP: (1.4% + 0.4% + 0.4% + 0.8%) = 3.0% CPI = 1.0 × 33.0% + 1.4 × 11.8% + 2.0 × 1.0% × 60% + 1.5 × 1.0% × 40% + 6.0 × 8.2% + 4.0 × 14.8% + 20 × 0.2% + 1.5 × 28.1% + 2.0 × 3.0% = 2.12 A.4 This exercise is similar to A.3. Instruction Load Store Add Sub Mul Compare load imm cond branch cond move Jump Figure S.15 Continued Average of applu and art % 16.0 1.4 30.2 1.2 1.2 3.7 6.8 7.0 0.2 – Copyright © 2012 Elsevier, Inc. All rights reserved.
6 ■ Solutions to Case Studies and Exercises Call Return Shift And Or Xor other logical load FP store FP add FP sub FP mul FP div FP move reg-reg FP compare FP cond mov FP other FP – – 0.4 – 1.0 1.6 – 11.7 4.4 3.4 1.4 6.4 0.4 0.8 0.4 0.3 – Figure S.15 MIPS dynamic instruction mix for applu and art. ALU instructions: (30.2% + 1.2% + 1.2% + 3.7% + 6.8% + 0.2% + 0.4% + 1.0% + 1.6%) = 46.3% Load-stores: (16.0% + 1.4%) = 17.4% Conditional branches: 7.0% Jumps: 0% FP add: (3.4% + 1.4%) = 4.8% Load-store FP: (11.7% + 4.4%) = 16.1% Other FP: (0.8% + 0.4% + 0.3%) = 1.5% CPI = 1.0 × 46.3% + 1.4 × 17.4% + 2.0 × 7.0% × 60% + 1.5 × 7.0% × 40% + 6.0 × 6.4% + 4.0 × 4.8% + 20 × 0.4% + 1.5 × 16.1% + 2.0 × 1.5% = 1.76 A.5 Take the code sequence one line at a time. 1. A = B + C 2. B = A + C = B + C + C 3. D = A – B = (B + C) – (B + C + C) = – C ;The operands here are given, not computed by the code, so copy propagation will not transform this statement. ;Here A is a computed value, so transform the code by substituting A = B + C to get ;Now no operand is computed ;Both operands are computed so substitute for both to get ;Simplify algebraically to get ;This is a given, not computed, operand Copyright © 2012 Elsevier, Inc. All rights reserved.
Appendix A Solutions ■ 7 Copy propagation has increased the work in statement 2 from one addition to two. It has changed the work in statement 3 from subtraction to negation, possi- bly a savings. The above suggests that writing optimizing compilers means incor- porating sophisticated trade-off analysis capability to control any optimizing steps, if the best results are to be achieved. A.6 No solution provided. A.7 a. The point of this exercise is to highlight the value of compiler optimizations. In this exercise registers are not used to hold updated values; values are stored to memory when updated and subsequently reloaded. Because all the addresses of all the variables (including all array elements) can fit in 16 bits, we can use immediate instructions to construct addresses. Figure S.16 shows one possible translation of the given C code fragment. ex_a_7: loop: DADD SW LD DSLL DADDI LD LD DADD LD DSLL DADDI SD LD DADDI SD LD DADDI BNEZ R1,R0,R0 7000(R0),R1 R1,7000(R0) R2,R1,#3 R3,R2,#3000 R4,0(R3) R5,5000(R0) R6,R4,R5 R1,7000(R0) R2,R1,#3 R7,R2,#1000 0(R7),R6 R1,7000(R0) R1,R1,#1 7000(R0),R1 R1,7000(R0) R8,R1,#-101 R8,loop ;R0 = 0, initialize i = 0 ;store i ;get value of i ;R2 = word offset of B[i] ;add base address of B to R2 ;load B[i] ;load C ;B[i] + C ;get value of i ;R2 = word offset of A[i] ;add base address of A to R2 ;A[i] ← B[i] + C ;get value of i ;increment i ;store i ;get value of i ;is counter at 101? ;if not 101, repeat Figure S.16 MIPS code to implement the C loop without using registers to hold updated values for future use or to pass values to a subsequent loop iteration. The number of instructions executed dynamically is the number of initializa- tion instructions plus the number of instructions in the loop times the number of iterations: Instructions executed = 2 + (16 × 101) = 1618 The number of memory-data references is a count of the load and store instructions executed: Memory-data references executed = 0 + (8 × 101) = 808 Copyright © 2012 Elsevier, Inc. All rights reserved.
8 ■ Solutions to Case Studies and Exercises Since MIPS instructions are 4 bytes in size, code size is the number of instructions times 4: Instruction bytes = 4 × 18 = 72 ex_b_7: movq movq movq $0x0,%rax $0x0,%rbp %rax,0x1b58(%rbp) # store i to location 7000 # rax = 0, initialize i = 0 # base pointer = 0 ($1b58) movq loop: movq mov shl 0x1388(%rbp),%rdx # load C from 5000 ($1388) 0x1b58(%rbp),%rax # get value of i %rax,%rbx $0x3,%rbx # rbx gets copy of i # rbx now has i * 8 movq 0x0bb8(%rbx),%rcx # load B[i] (3000 = $0bb8) add mov shl movq to rcx # B[i] + C %rdx,%cdx # rbx gets copy of i %rax,%rbx $0x3,%rbx # rbx now has i * 8 %rcx,0x03e8(%rbx) # A[i] ← B[i] + C (base address of A is 1000) movq 0x1b58(%rbp),%rax # get value of i add $0x1,%rax movq %rax,0x1b58(%rbp) # save i cmpq $0x0065,%rax jae loop # increment i # is counter at 101 ($0065)? # if not 101, repeat Figure S.17 x86 code to implement the C loop without using registers to hold updated values for future use or to pass values to a subsequent loop iteration. b. This problem is similar to part (a), but with x86-64 instructions instead. The number of instructions executed dynamically is the number of initializa- tion instructions plus the number of instructions in the loop times the number of iterations: Instructions executed = 3 + (13 × 101) = 1316 A.8 This problem focuses on the challenges related to instruction set encoding. The length of the instructions are 12 bits. There are 32 registers so the length of the register field will be 5 bits for each register operand. We use addr[11] to addr[0] to represent the 12 bits of the address as shown in the tables below. a. In the first case, we need to support 3 two-address instructions. These can be encoded as follows: 3 two-address instr. Other instructions addr[11:10] '00', '01', '10' '11' addr[9:5] '00000' to '11111' '00000' to '11111' addr[4:0] '00000' to '11111' '00000' to '11111' Copyright © 2012 Elsevier, Inc. All rights reserved.
Appendix A Solutions ■ 9 Hence, for the one-address and two-address instructions must be mapped to the remaining 10 bits with the upper two bits encoded as ‘11’. The one- address instructions are then encoded with the addr[9:5] field using ‘00000’ to ‘11101’ for the 30 instruction types, leaving the addr[4:0] field or the regis- ter operand. This leaves the patterns with ‘11’ followed by ‘11110’ in the upper seven bits and ‘00000’ to ‘11111’ in the lower five bits to encode 32 of the zero-address instructions. The remaining zero-address instructions can be encoded using ‘11’ followed by ‘11111’ in the upper seven bits and ‘0000’ to ‘01100’ in the lower five bits to encode the other 13 zero-address instructions. 3 two-address instr. 30 one-address instr. 45 zero-address instr. addr[11:10] '00', '01', '10' '11' '11' '11' addr[9:5] '00000' to '11111' '00000' to '11101' '11110' '11111' addr[4:0] '00000' to '11111' '00000' to '11111' '00000' to '11111' '00000' to '01100' Hence, it is possible to have the above instruction encodings. c. b. This scenario is similar to above, with the two-address instructions encoded with the upper two bits as ‘00’ to ‘01’. The one-address instructions can be encoded with the upper two bits as ‘11’ and using ‘00000’ to ‘11110’ to dif- ferentiate the 31 one-address instructions. The pattern ‘11’ and ‘11111’ in the upper seven bits is then used to encode the zero-address instructions, with the lower five bits to differentiate them. We can only encode 32 of these patterns, not the 35 that are required; hence, it is impossible to have these instruction encodings. In this part, we already have encoded three two-address instructions as above. Moreover, we have 24 zero-address instructions encoded as below with ‘00000’ in the addr[9:5] field and ‘00000’ to ‘10111’ in the addr[4:0] field. We would like to fit as many one-address instructions as we can. Note that we cannot take advantage of any of the unused encodings with ‘11’ and ‘00000’ in the upper seven bits because we would need to use the entire addr[4:0] field for the single address of the operand. Hence, we can use the encodings with ‘00001’ to ‘11111’ in addr[9:5] for the one-address instructions and save the last five bits for the register address. Because this includes 31 patterns, we can support up to 31 one-address instructions. Note that we could also add up to eight additional zero-address instructions if we wish as well. 3 two-address instr. 24 zero-address instr. X one-address instr. addr[11:10] '00', '01', '10' '11' '11' addr[9:5] '00000' to '11111' '00000' '00001' to '11111' addr[4:0] '00000' to '11111' '00000' to '10111' '00000' to '11111' Copyright © 2012 Elsevier, Inc. All rights reserved.
分享到:
收藏