1
Solutions
 
Chapter 1  Solutions 
S-3
1.1   Personal computer (includes workstation and laptop): Personal computers 
emphasize delivery of good performance to single users at low cost and usually 
execute third-party soft ware.
Personal mobile device (PMD, includes tablets): PMDs are battery operated 
with  wireless  connectivity  to  the  Internet  and  typically  cost  hundreds  of 
dollars, and, like PCs, users can download soft ware (“apps”) to run on them. 
Unlike PCs, they no longer have a keyboard and mouse, and are more likely 
to rely on a touch-sensitive screen or even speech input.
Server: Computer used to run large problems and usually accessed via a network.
Warehouse scale computer: Th  ousands of processors forming a large cluster.
Supercomputer: Computer composed of hundreds to thousands of processors 
and terabytes of memory.
Embedded computer: Computer designed to run one application or one set 
of related applications and integrated into a single system.
1.2 
a.  Performance via Pipelining
b.  Dependability via Redundancy
c.  Performance via Prediction
d.  Make the Common Case Fast
e.  Hierarchy of Memories
f.  Performance via Parallelism
g.  Design for Moore’s Law
h.  Use Abstraction to Simplify Design
1.3   Th  e program is compiled into an assembly language program, which is then 
assembled into a machine language program.
1.4 
a.   1280  ⫻  1024  pixels  ⫽  1,310,720  pixels  ⫽⬎  1,310,720  ⫻  3  ⫽  3,932,160 
bytes/frame.
b.  3,932,160 bytes ⫻ (8 bits/byte) /100E6 bits/second ⫽ 0.31 seconds
1.5 
a.  performance of P1 (instructions/sec) ⫽ 3 ⫻ 109/1.5 ⫽ 2 ⫻ 109
performance of P2 (instructions/sec) ⫽ 2.5 ⫻ 109/1.0 ⫽ 2.5 ⫻ 109
performance of P3 (instructions/sec) ⫽ 4 ⫻ 109/2.2 ⫽ 1.8 ⫻ 109
S-4 
Chapter 1  Solutions
b.  cycles(P1) ⫽ 10 ⫻ 3 ⫻ 109 ⫽ 30 ⫻ 109 s
cycles(P2) ⫽ 10 ⫻ 2.5 ⫻ 109 ⫽ 25 ⫻ 109 s
cycles(P3) ⫽ 10 ⫻ 4 ⫻ 109 ⫽ 40 ⫻ 109 s
c.  No. instructions(P1) ⫽ 30 ⫻ 109/1.5 ⫽ 20 ⫻ 109
No. instructions(P2) ⫽ 25 ⫻ 109/1 ⫽ 25 ⫻ 109
No. instructions(P3) ⫽ 40 ⫻ 109/2.2 ⫽ 18.18 ⫻ 109
CPInew ⫽ CPIold ⫻ 1.2, then CPI(P1) ⫽ 1.8, CPI(P2) ⫽ 1.2, CPI(P3) ⫽ 2.6
f ⫽ No. instr. ⫻ CPI/time, then
f(P1) ⫽ 20 ⫻ 109 ⫻1.8/7 ⫽ 5.14 GHz
f(P2) ⫽ 25 ⫻ 109 ⫻ 1.2/7 ⫽ 4.28 GHz
f(P1) ⫽ 18.18 ⫻ 109 ⫻ 2.6/7 ⫽ 6.75 GHz
1.6 
a.   Class  A:  105  instr.  Class  B:  2  ⫻  105  instr.  Class  C:  5  ⫻  105  instr. 
Class D: 2 ⫻ 105 instr.
Time ⫽ No. instr. ⫻ CPI/clock rate
Total time P1 ⫽ (105 ⫹ 2 ⫻ 105 ⫻ 2 ⫹ 5 ⫻ 105 ⫻ 3 ⫹ 2 ⫻ 105 ⫻ 3)/(2.5 ⫻ 
109) ⫽ 10.4 ⫻ 10⫺4 s
Total time P2 ⫽ (105 ⫻ 2 ⫹ 2 ⫻ 105 ⫻ 2 ⫹ 5 ⫻ 105 ⫻ 2 ⫹ 2 ⫻ 105 ⫻ 2)/
(3 ⫻ 109) ⫽ 6.66 ⫻ 10⫺4 s
CPI(P1) ⫽ 10.4 ⫻ 10⫺4 ⫻ 2.5 ⫻ 109/106 ⫽ 2.6
CPI(P2) ⫽ 6.66 ⫻ 10⫺4 ⫻ 3 ⫻ 109/106 ⫽ 2.0
b.   clock cycles(P1) ⫽ 105 ⫻ 1⫹ 2 ⫻ 105 ⫻ 2 ⫹ 5 ⫻ 105 ⫻ 3 ⫹ 2 ⫻ 105 ⫻ 3 
⫽ 26 ⫻ 105
clock cycles(P2) ⫽ 105 ⫻ 2⫹ 2 ⫻ 105 ⫻ 2 ⫹ 5 ⫻ 105 ⫻ 2 ⫹ 2 ⫻ 105 ⫻ 2 
⫽ 20 ⫻ 105
1.7 
a.  CPI ⫽ Texec ⫻ f/No. instr.
Compiler A CPI ⫽ 1.1
Compiler B CPI ⫽ 1.25
b.  fB/fA ⫽ (No. instr.(B) ⫻ CPI(B))/(No. instr.(A) ⫻ CPI(A)) ⫽ 1.37
c.  TA/Tnew ⫽ 1.67
TB/Tnew ⫽ 2.27
 
Chapter 1  Solutions 
S-5
1.8 
1.8.1  C ⫽ 2 ⫻ DP/(V2*F)
Pentium 4: C ⫽ 3.2E–8F
Core i5 Ivy Bridge: C ⫽ 2.9E–8F
1.8.2  Pentium 4: 10/100 ⫽ 10%
Core i5 Ivy Bridge: 30/70 ⫽ 42.9%
1.8.3  (Snew ⫹ Dnew)/(Sold ⫹ Dold) ⫽ 0.90
Dnew ⫽ C ⫻ Vnew 2 ⫻ F
Sold ⫽ Vold ⫻ I
Snew ⫽ Vnew ⫻ I
Th  erefore:
Vnew ⫽ [Dnew/(C ⫻ F)]1/2
Dnew ⫽ 0.90 ⫻ (Sold ⫹ Dold) ⫺ Snew
Snew ⫽ Vnew ⫻ (Sold/Vold)
Pentium 4:
Snew ⫽ Vnew ⫻ (10/1.25) ⫽ Vnew ⫻ 8
Dnew ⫽ 0.90 ⫻ 100 ⫺ Vnew ⫻ 8 ⫽ 90 ⫺ Vnew ⫻ 8
Vnew ⫽ [(90 ⫺ Vnew ⫻ 8)/(3.2E8 ⫻ 3.6E9)]1/2
Vnew ⫽ 0.85 V
Core i5:
Snew ⫽ Vnew ⫻ (30/0.9) ⫽ Vnew ⫻ 33.3
Dnew ⫽ 0.90 ⫻ 70 ⫺ Vnew ⫻ 33.3 ⫽ 63 ⫺ Vnew ⫻ 33.3
Vnew ⫽ [(63 ⫺ Vnew ⫻ 33.3)/(2.9E8 ⫻ 3.4E9)]1/2
Vnew ⫽ 0.64 V
1.9 
1.9.1
p
1
2
4
8
# arith inst.
# L/S inst. 
# branch inst.
cycles
ex. time
speedup
2.56E9
1.83E9
9.12E8
4.57E8
1.28E9
9.14E8
4.57E8
2.29E8
2.56E8
2.56E8
2.56E8
2.56E8
7.94E10
5.67E10
2.83E10
1.42E10
39.7
28.3
14.2
7.10
1
1.4
2.8
5.6
S-6 
Chapter 1  Solutions
1.9.2
p
1
2
4
8
ex. time
41.0
29.3
14.6
7.33
1.9.3  3
1.10 
1.10.1  die area15cm ⫽ wafer area/dies per wafer ⫽ pi*7.52 / 84 ⫽ 2.10 cm2
yield15cm ⫽ 1/(1⫹(0.020*2.10/2))2 ⫽ 0.9593
die area20cm ⫽ wafer area/dies per wafer ⫽ pi*102/100 ⫽ 3.14 cm2
yield20cm ⫽ 1/(1⫹(0.031*3.14/2))2 ⫽ 0.9093
1.10.2  cost/die15cm ⫽ 12/(84*0.9593) ⫽ 0.1489
cost/die20cm ⫽ 15/(100*0.9093) ⫽ 0.1650
1.10.3  die area15cm ⫽ wafer area/dies per wafer ⫽ pi*7.52/(84*1.1) ⫽ 1.91 cm2
yield15cm ⫽ 1/(1 ⫹ (0.020*1.15*1.91/2))2 ⫽ 0.9575
die area20cm ⫽ wafer area/dies per wafer ⫽ pi*102/(100*1.1) ⫽ 2.86 cm2
yield20cm ⫽ 1/(1 ⫹ (0.03*1.15*2.86/2))2 ⫽ 0.9082
1.10.4   defects  per  area0.92  ⫽  (1–y^.5)/(y^.5*die_area/2)  ⫽  (1⫺0.92^.5)/
(0.92^.5*2/2) ⫽ 0.043 defects/cm2
defects  per  area0.95  ⫽  (1–y^.5)/(y^.5*die_area/2)  ⫽  (1⫺0.95^.5)/
(0.95^.5*2/2) ⫽ 0.026 defects/cm2
1.11 
1.11.1  CPI ⫽ clock rate ⫻ CPU time/instr. count
clock rate ⫽ 1/cycle time ⫽ 3 GHz
CPI(bzip2) ⫽ 3 ⫻ 109 ⫻ 750/(2389 ⫻ 109)⫽ 0.94
1.11.2  SPEC ratio ⫽ ref. time/execution time
SPEC ratio(bzip2) ⫽ 9650/750 ⫽ 12.86
1.11.3.  CPU time ⫽ No. instr. ⫻ CPI/clock rate
If CPI and clock rate do not change, the CPU time increase is equal to the 
increase in the of number of instructions, that is 10%.
 
Chapter 1  Solutions 
S-7
1.11.4  CPU time(before) ⫽ No. instr. ⫻ CPI/clock rate
CPU time(aft er) ⫽ 1.1 ⫻ No. instr. ⫻ 1.05 ⫻ CPI/clock rate
CPU time(aft er)/CPU time(before) ⫽ 1.1 ⫻ 1.05 ⫽1.155. Th  us, CPU time 
is increased by 15.5%.
1.11.5  SPECratio ⫽ reference time/CPU time
SPECratio(aft er)/SPECratio(before) ⫽ CPU time(before)/CPU time(aft er) ⫽ 
1/1.1555 ⫽ 0.86. Th  e SPECratio is decreased by 14%.
1.11.6  CPI ⫽ (CPU time ⫻ clock rate)/No. instr.
CPI ⫽ 700 ⫻ 4 ⫻ 109/(0.85 ⫻ 2389 ⫻ 109) ⫽ 1.37
1.11.7  Clock rate ratio ⫽ 4 GHz/3 GHz ⫽ 1.33
CPI @ 4 GHz ⫽ 1.37, CPI @ 3 GHz ⫽ 0.94, ratio ⫽ 1.45
Th  ey are diff erent because, although the number of instructions has been 
reduced by 15%, the CPU time has been reduced by a lower percentage.
1.11.8  700/750 ⫽ 0.933. CPU time reduction: 6.7%
1.11.9  No. instr. ⫽ CPU time ⫻ clock rate/CPI
No. instr. ⫽ 960 ⫻ 0.9 ⫻ 4 ⫻ 109/1.61 ⫽ 2146 ⫻ 109
1.11.10  Clock rate ⫽ No. instr. ⫻ CPI/CPU time.
Clock ratenew ⫽ No. instr. ⫻ CPI/0.9 ⫻ CPU time ⫽ 1/0.9 clock rateold ⫽ 
3.33 GHz
1.11.11  Clock rate ⫽ No. instr. ⫻ CPI/CPU time.
Clock  ratenew  ⫽  No.  instr.  ⫻  0.85⫻  CPI/0.80  CPU  time  ⫽  0.85/0.80, 
clock rateold ⫽ 3.18 GHz
1.12 
1.12.1  T(P1) ⫽ 5 ⫻ 109 ⫻ 0.9 / (4 ⫻ 109) ⫽ 1.125 s
T(P2) ⫽ 109 ⫻ 0.75 / (3 ⫻ 109) ⫽ 0.25 s
clock rate (P1) ⬎ clock rate(P2), performance(P1) < performance(P2)
1.12.2  T(P1) ⫽ No. instr. ⫻ CPI/clock rate
T(P1) ⫽ 2.25 3 1021 s
T(P2) 5 N ⫻ 0.75/(3 ⫻ 109), then N ⫽ 9 ⫻ 108
1.12.3  MIPS ⫽ Clock rate ⫻ 10⫺6/CPI
MIPS(P1) ⫽ 4 ⫻ 109 ⫻ 10⫺6/0.9 ⫽ 4.44 ⫻ 103
S-8 
Chapter 1  Solutions
MIPS(P2) ⫽ 3 ⫻ 109 ⫻ 10⫺6/0.75 ⫽ 4.0 ⫻ 103
MIPS(P1) ⬎ MIPS(P2), performance(P1) ⬍ performance(P2) (from 11a)
1.12.4  MFLOPS ⫽ No. FP operations ⫻ 10⫺6/T
MFLOPS(P1) ⫽ .4 ⫻ 5E9 ⫻ 1E-6/1.125 ⫽ 1.78E3
MFLOPS(P2) ⫽ .4 ⫻ 1E9 ⫻ 1E-6/.25 ⫽ 1.60E3
MFLOPS(P1)  ⬎  MFLOPS(P2),  performance(P1)  ⬍  performance(P2) 
(from 11a)
1.13 
1.13.1  Tfp ⫽ 70 ⫻ 0.8 ⫽ 56 s. Tnew ⫽ 56⫹85⫹55⫹40 ⫽ 236 s. Reduction: 5.6%
1.13.2   Tnew ⫽ 250 ⫻ 0.8 ⫽ 200 s, Tfp
⫹Tbranch ⫽ 165 s, Tint ⫽ 35 s. Reduction 
⫹Tl/s
time INT: 58.8%
1.13.3  Tnew ⫽ 250 ⫻ 0.8 ⫽ 200 s, Tfp
1.14 
1.14.1   Clock cycles ⫽ CPIfp ⫻ No. FP instr. ⫹ CPIint ⫻ No. INT instr. ⫹ CPIl/s ⫻ 
⫹Tl/s ⫽ 210 s. NO
⫹Tint
No. L/S instr. ⫹ CPIbranch ⫻ No. branch instr.
TCPU ⫽ clock cycles/clock rate ⫽ clock cycles/2 ⫻ 109
clock cycles ⫽ 512 ⫻ 106; TCPU ⫽ 0.256 s
To have the number of clock cycles by improving the CPI of FP instructions:
CPIimproved fp ⫻ No. FP instr. ⫹ CPIint ⫻ No. INT instr. ⫹ CPIl/s ⫻ No. L/S 
instr. ⫹ CPIbranch ⫻ No. branch instr. ⫽ clock cycles/2
CPIimproved fp ⫽ (clock cycles/2 ⫺ (CPIint ⫻ No. INT instr. ⫹ CPIl/s ⫻ No. L/S 
instr. ⫹ CPIbranch ⫻ No. branch instr.)) / No. FP instr.
CPIimproved fp ⫽ (256⫺462)/50 ⬍0 ⫽⫽⬎ not possible
1.14.2  Using the clock cycle data from a.
To have the number of clock cycles improving the CPI of L/S instructions:
CPIfp ⫻ No. FP instr. ⫹ CPIint ⫻ No. INT instr. ⫹ CPIimproved l/s ⫻ No. L/S 
instr. ⫹ CPIbranch ⫻ No. branch instr. ⫽ clock cycles/2
CPIimproved l/s ⫽ (clock cycles/2 ⫺ (CPIfp ⫻ No. FP instr. ⫹ CPIint ⫻ No. INT 
instr. ⫹ CPIbranch ⫻ No. branch instr.)) / No. L/S instr.
CPIimproved l/s ⫽ (256⫺198)/80 ⫽ 0.725
1.14.3   Clock cycles ⫽ CPIfp ⫻ No. FP instr. ⫹ CPIint ⫻ No. INT instr. ⫹ CPIl/s ⫻ 
No. L/S instr. ⫹ CPIbranch ⫻ No. branch instr.
 
Chapter 1  Solutions 
S-9
TCPU ⫽ clock cycles/clock rate ⫽ clock cycles/2 ⫻ 109
CPIint  ⫽  0.6  ⫻  1  ⫽  0.6;  CPIfp  ⫽  0.6  ⫻  1  ⫽  0.6;  CPIl/s  ⫽  0.7  ⫻  4  ⫽  2.8; 
CPIbranch ⫽ 0.7 ⫻ 2 ⫽ 1.4
TCPU (before improv.) ⫽ 0.256 s; TCPU (aft er improv.)⫽ 0.171 s
1.15 
processors
exec. time/
processor
time 
actual speedup/ideal 
w/overhead
speedup
speedup
1
2
4
8
16
100
50
25
12.5
6.25
54
29
16.5
10.25
100/54 ⫽ 1.85
100/29 ⫽ 3.44
100/16.5 ⫽ 6.06
100/10.25 ⫽ 9.76
1.85/2 ⫽ .93
3.44/4 ⫽ 0.86
6.06/8 ⫽ 0.75
9.76/16 ⫽ 0.61