L.1
L.2
L.3
L.4
L.5
L.6
Chapter 1 Solutions
Chapter 2 Solutions
Chapter 3 Solutions
Chapter 4 Solutions
Chapter 5 Solutions
Chapter 6 Solutions
L-2
L-7
L-20
L-30
L-46
L-52
 
 
L
Solutions to Case Study 
Exercises
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
L-2
I
Appendix L  
Solutions to Case Study Exercises
L.1 
 Chapter 1 Solutions
Case Study 1: Chip Fabrication Cost
1.1
a.
Yield
=
+
1
0.7 × 1.99
 4–
-------------------------
4.0
=
0.28
b.
Yield
=
+
1
0.75 × 3.80
 4–
----------------------------
4.0
=
0.12
c.
The Sun Niagara is substantially larger, since it places 8 cores on a chip rather
than 1.
1.2
a.
Yield
=
+
1
0.30 × 3.89
 4–
----------------------------
4.0
=
0.36
Dies per wafer
=
)2
π ×  30 2⁄
------------------------------
(
3.89
–
π × 30
(
----------------------------------
)
sqrt 2 × 3.89
=
182 33.8
–
=
148
Cost per die
=
$500
--------------------------
148 × 0.36
=
$9.38
Yield
=
+
1
.7 × 1.86
 4–
----------------------
4.0
=
0.32
Dies per wafer
=
)2
π ×  30 2⁄
------------------------------
(
1.86
–
π × 30
(
----------------------------------
)
sqrt 2 × 1.86
=
380 48.9
–
=
331
Cost per die
=
$500
-----------------------
331 × .32
=
$4.72
 × 
.4 = $3.75
$9.38
×
Selling price = ($9.38 + $3.75) 
 2 = $26.26
Profit = $26.26 – $4.72 = $21.54
×
 500,000 = 1,500,000/month
Rate of sale = 3 
×
Profit = 1,500,000 
 $21.54 = $32,310,000
$1,000,000,000/$32,310,000 = 31 months
Yield
=
=
0.71
+
1
.75 × 3.80 8⁄
 4–
--------------------------------
=
=
Prob of error
4.0
–
1 0.71
0.29
×
×
7
 8 = 0.21
 
 0.71
Prob of one defect = 0.29 
×
×
2
6
 
Prob of two defects = 0.29
 0.71
 
 28 = 0.30
×
Prob of one or two = 0.21 
 0.30 = 0.51
8
0.71
 = .06 (now we see why this method is inaccurate!)
b.
c.
d.
e.
1.3
a.
b.
c.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
L.1 Chapter 1 Solutions
I
L
3
-
d.
e.
0.51 ⁄ 0.06 = 8.5
×
×
 
 $1.50 = $200,000,000
 $150 + 8.5
 
x
x
= 885,938 8-core chips, 8,416,390 chips total
x 
×
 $100 – (9.5
 
x
×
 $80) – 9.5
 
x
Case Study 2: Power Consumption in Computer Systems
1.4
a.
b.
c.
1.5
a.
×
 3.7 + 2 
×
 7.9
 = 146
x
 = 79 + 2 
x
.70
×
4.0 W 
 .4 + 7.9 W 
The 7200 rpm drive takes 60 s to read/seek and 40 s idle for a particular job.
×
The 5400 rpm disk requires 4/3 
 60 s, or 80 s to do the same thing. There-
fore, it is idle 20% of the time.
×
 .6 = 6.34 W
------------------------------------------------------------
)
(
79 W 2.3 W 7.0 W
14 KW
+
+
=
158
b.
14 KW
----------------------------------------------------------------------
)
(
+  × 7.0 W
79 W 2.3 W 2
+
=
146
c.
MTTF
=
1
------------------
9 × 106
+
8 × 
1
------------
4500
+
1
------------------
3 × 104
=
8 × 2000
300
-------------------------------------
+
9 × 106
=
16301
------------------
9 × 106
1
---------------------------
Failure rate
=
9 × 106
------------------
16301
=
522 hours
1.6
a.
See Figure L.1.
SPECjbb
SPECweb
Sun Fire T2000
IBM x346
213
42.4
91.2
9.93
Figure L.1
Power/performance ratios.
b.
c.
a.
b.
1.7
Sun Fire T2000
More expensive servers can be more compact, allowing more computers to be
stored in the same amount of space. Because real estate is so expensive, this
is a huge concern. Also, power may not be the same for both systems. It can
cost more to purchase a chip that is optimized for lower power consumption.
50%
Power new
--------------------------
Power old
=
)2 ×  F × 0.50
)
(
V  × 0.50
(
--------------------------------------------------------------
V 2 × F
=
0.53
=
0.125
c.
=
.70
)
-------------------------------- ; x
x–(
1
x 2⁄+
=
60%
 
 
 
 
 
 
 
 
 
 
 
 
L-4
I
Appendix L  
Solutions to Case Study Exercises
d.
Power new
--------------------------
Power old
=
)2 ×  F × 0.50
)
(
V  × 0.70
--------------------------------------------------------------
V 2
(
 × F
=
0.72 × 0.5
=
0.245
Case Study 3: The Cost of Reliability (and Failure) in Web 
Servers
1.8
a.
b.
×
14 days 
 $1.4 million⁄day = $19.6 million
$4 billion – $19.6 million = $3.98 billion
Increase in total revenue: 4.8/3.9 = 1.23
In the fourth quarter, the rough estimate would be a loss of 1.23 
lion = $24.1 million.
×
 $19.6 mil-
c. Losing $1.4 million × .50 = $700,000 per day. This pays for $700,000/$7,500
d.
= 93 computers per day.
It depends on how the 2.6 million visitors are counted. 
If the 2.6 million visitors are not unique, but are actually visitors each day
summed across a month: 2.6 million × 8.4 = 21.84 million transactions per
month. $5.38 × 21.84 million = $117 million per month.
If the 2.6 million visitors are assumed to visit every day: 2.6 million × 8.4 ×
31 = 677 million transactions per month. $5.38 × 677 million = $3.6 billion
per month, which is clearly not the case, or else their online service would not
make money.
1.9
a. FIT = 109⁄ MTTF
MTTF = 109⁄ FIT = 109⁄ 100 = 10,000,000
b.
Availability
=
MTTF
---------------------------------------
MTTF MTTR
+
=
107
--------------------
107
24+
=
about 100%
1.10 Using the simplifying assumption that all failures are independent, we sum the
probability of failure rate of all of the computers:
Failure rate = 1000 × 10–7 = 10–4 = 
105
--------
109
 FIT = 105, therefore MTTF = 
109
--------
105
 = 104
1.11
a. Assuming that we do not repair the computers, we wait for how long it takes
for 3,334 computers to fail.
3,334 × 10,000,000 = 33,340,000,000 hours
b. Total cost of the decision: $1,000 × 10,000 computers = $10 million 
Expected  benefit  of  the  decision:  Gain  a  day  of  downtime  for  every
33,340,000,000  hours  of  uptime.  This  would  save  us  $1.4  million  each
3,858,000 years. This would definitely not be worth it.
L.1 Chapter 1 Solutions
I L-5
Case Study 4: Performance
1.12
a. See Figure L.2.
Chip
Memory performance
Dhrystone performance
Athlon 64 X2 4800+
Pentium EE 840
Pentium D 820
Athlon 64 X2 3800+
Pentium 4
Athlon 64 3000+
Pentium 4 570
Widget X
1.14
1.08
1
0.98
0.91
0.98
1.17
2.33
1.36
1.24
1
1.13
0.5
0.5
0.74
0.33
Figure L.2 Performance of several processors normalized to the Pentium 4 570.
b. See Figure L.3.
Chip
Athlon 64 X2 4800+
Pentium EE 840
Pentium D 820
Athlon 64 X2 3800+
Pentium 4
Athlon 64 3000+
Pentium 4 570
Processor X
Arithmetic mean
Arithmetic mean of 
normalized
12070.5
11060.5
9110
10035
5176
5290.5
7355.5
6000
1.25
1.16
1
1.05
0.95
0.95
0.77
1.33
Figure L.3 Arithmetic mean of several processors.
c. The arithmetic mean of the original performance shows that the Athlon 64 X2
4800+ is the fastest processor. 
The arithmetic mean of the normalized processors shows that Processor X is
the fastest processor.
d. Single processors: .05
Dual processors: 1.17
e. Solutions will vary.
L-6 I Appendix L  Solutions to Case Study Exercises
f. Dual processors gain in CPU performance (exhibited by the Dhrystone per-
formance), but they do not necessarily increase in memory performance. This
makes sense because, although they are doubling the processing power, dual
processors do not change the memory hierarchy very much. Benchmarks that
exercise the memory often do not fit in the size of the cache, so doubling the
cache does not help the memory benchmarks substantially. In some applica-
tions, however, they could gain substantially due to the increased cache avail-
able.
1.13
a. Pentium 4 570: .4 × 3,501 + .6 × 11,210 = 8,126
Athlon 64 X2 4,800+: .4 × 3,423 + .6 × 20,718 = 13,800
b. 20,718/7,621 = 2.7
c. x × 3,501 + (1x) × 11,210 = x × 3,000 + (1x) × 15,220
x = .89
.89/.11 = 8x ratio of memory to processor computation
1.14
a. Amdahl’s Law:
1
----------------------
.4 2⁄
.6
+
=
1.25x speedup
b. Amdahl’s Law:
1
----------------------------
.99 2⁄
.01
+
=
1.98x speedup
c. Amdahl’s Law:
---------------------------------------------------
)
.4 2⁄
.2
1
+(
.8  ×  .6
+
=
1.19x speedup
d. Amdahl’s Law:
---------------------------------------------------------
)
.99 2⁄
.8
1
.2  ×  .01
+
(
+
=
1.11x speedup
L.2 Chapter 2 Solutions
I L-7
L.2 
 Chapter 2 Solutions
Case Study 1: Exploring the Impact of Microarchitectural 
Techniques
2.1 The baseline performance (in cycles, per loop iteration) of the code sequence in
Figure 2.35, if no new instruction’s execution could be initiated until the previous
instruction’s execution had completed, is 37, as shown in Figure L.4. How did I
come up with that number? Each instruction requires one clock cycle of execu-
tion (a clock cycle in which that instruction, and only that instruction, is occupy-
ing the execution units; since every instruction must execute, the loop will take at
least  that  many  clock  cycles).  To  that  base  number,  we  add  the  extra  latency
cycles. Don’t forget the branch shadow cycle.
Loop:    LD        F2,0(Rx)     1 + 3
         MULTD     F2,F0,F2     1 + 4
         DIVD      F8,F2,F0     1 + 10
         LD        F4,0(Ry)     1 + 3
         ADDD      F4,F0,F4     1 + 2
         ADDD      F10,F8,F2    1 + 2
         SD        F4,0(Ry)     1 + 1
         ADDI      Rx,Rx,#8     1
         ADDI      Ry,Ry,#8     1
         SUB       R20,R4,Rx    1
         BNZ       R20,Loop     1 + 1
                                ____
         cycles per loop iter                37
Figure L.4 Baseline performance (in cycles, per loop iteration) of the code sequence
in Figure 2.35.
2.2 How  many  cycles  would  the  loop  body  in  the  code  sequence  in  Figure  2.35
require if the pipeline detected true data dependencies and only stalled on those,
rather than blindly stalling everything just because one functional unit is busy?
The  answer  is  27,  as  shown  in  Figure  L.5.  Remember,  the  point  of  the  extra
latency cycles is to allow an instruction to complete whatever actions it needs, in
order  to  produce  its  correct  output.  Until  that  output  is  ready,  no  dependent
instructions  can  be  executed.  So  the  first LD  must  stall  the  next  instruction  for
three clock cycles. The MULTD produces a result for its successor, and therefore
must stall 4 more clocks, and so on.