1
Solutions
Solution 1.1
1.1.1 Computer used to run large problems and usually accessed via a network:
5 supercomputers
1.1.2 1015 or 250 bytes: 7 petabyte
1.1.3 Computer composed of hundreds to thousands of processors and terabytes
of memory: 3 servers
1.1.4 Today’s science fi ction application that probably will be available in near
future: 1 virtual worlds
1.1.5 A kind of memory called random access memory: 12 RAM
1.1.6 Part of a computer called central processor unit: 13 CPU
1.1.7 Thousands of processors forming a large cluster: 8 datacenters
1.1.8 A microprocessor containing several processors in the same chip: 10 multi-
core processors
1.1.9 Desktop computer without screen or keyboard usually accessed via a net-
work: 4 low-end servers
1.1.10 Currently the largest class of computer that runs one application or one
set of related applications: 9 embedded computers
1.1.11 Special language used to describe hardware components: 11 VHDL
1.1.12 Personal computer delivering good performance to single users at low
cost: 2 desktop computers
1.1.13 Program that translates statements in high-level language to assembly
language: 15 compiler
S2
Chapter 1 Solutions
1.1.14 Program that translates symbolic instructions to binary instructions:
21 assembler
1.1.15 High-level language for business data processing: 25 cobol
1.1.16 Binary language that the processor can understand: 19 machine language
1.1.17 Commands that the processors understand: 17 instruction
1.1.18 High-level language for scientifi c computation: 26 fortran
1.1.19 Symbolic representation of machine instructions: 18 assembly language
1.1.20 Interface between user’s program and hardware providing a variety of
services and supervision functions: 14 operating system
1.1.21 Software/programs developed by the users: 24 application software
1.1.22 Binary digit (value 0 or 1): 16 bit
1.1.23 Software layer between the application software and the hardware that
includes the operating system and the compilers: 23 system software
1.1.24 High-level language used to write application and system software: 20 C
1.1.25 Portable language composed of words and algebraic expressions that
must be translated into assembly language before run in a computer: 22 high-level
language
1.1.26 1012 or 240 bytes: 6 terabyte
Solution 1.2
1.2.1 8 bits × 3 colors = 24 bits/pixel = 4 bytes/pixel. 1280 × 800 pixels = 1,024,000
pixels. 1,024,000 pixels × 4 bytes/pixel = 4,096,000 bytes (approx 4 Mbytes).
1.2.2 2 GB = 2000 Mbytes. No. frames = 2000 Mbytes/4 Mbytes = 500 frames.
1.2.3 Network speed: 1 gigabit network ==> 1 gigabit/per second = 125 Mbytes/
second. File size: 256 Kbytes = 0.256 Mbytes. Time for 0.256 Mbytes = 0.256/125 =
2.048 ms.
Chapter 1 Solutions
S3
1.2.4 2 microseconds from cache ==> 20 microseconds from DRAM. 20 micro-
seconds from DRAM ==> 2 seconds from magnetic disk. 20 microseconds from
DRAM ==> 2 ms from fl ash memory.
Solution 1.3
1.3.1 P2 has the highest performance
performance of P1 (instructions/sec) = 2 × 109/1.5 = 1.33 × 109
performance of P2 (instructions/sec) = 1.5 × 109/1.0 = 1.5 × 109
performance of P3 (instructions/sec) = 3 × 109/2.5 = 1.2 × 109
1.3.2 No. cycles = time × clock rate
cycles(P1) = 10 × 2 × 109 = 20 × 109 s
cycles(P2) = 10 × 1.5 × 109 = 15 × 109 s
cycles(P3) = 10 × 3 × 109 = 30 × 109 s
time = (No. instr. × CPI)/clock rate, then No. instructions = No. cycles/CPI
instructions(P1) = 20 × 109/1.5 = 13.33 × 109
instructions(P2) = 15 × 109/1 = 15 × 109
instructions(P3) = 30 × 109/2.5 = 12 × 109
= timeold × 0.7 = 7 s
1.3.3 timenew
CPI = CPI × 1.2, then CPI(P1) = 1.8, CPI(P2) = 1.2, CPI(P3) = 3
ƒ = No. instr. × CPI/time, then
ƒ(P1) = 13.33 × 109 × 1.8/7 = 3.42 GHz
ƒ(P2) = 15 × 109 × 1.2/7 = 2.57 GHz
ƒ(P3) = 12 × 109 × 3/7 = 5.14 GHz
1.3.4 IPC = 1/CPI = No. instr./(time × clock rate)
IPC(P1) = 1.42
IPC(P2) = 2
IPC(P3) = 3.33
1.3.5 Timenew/Timeold
1.3.6 Timenew/Timeold
So Instructionsnew
= 7/10 = 0.7. So ƒnew
= 9/10 = 0.9.
= ƒold/0.7 = 1.5 GHz/0.7 = 2.14 GHz.
= Instructionsold × 0.9 = 30 × 109 × 0.9 = 27 × 109.
S4
Chapter 1 Solutions
Solution 1.4
1.4.1 P2
Class A: 105 instr.
Class B: 2 × 105 instr.
Class C: 5 × 105 instr.
Class D: 2 × 105 instr.
Time = No. instr. × CPI/clock rate
−4
P1: Time class A = 0.66 × 10
−4
Time class B = 2.66 × 10
−4
Time class C = 10 × 10
−4
Time class D = 5.33 × 10
−4
Total time P1 = 18.65 × 10
P2: Time class A = 10
−4
−4
Time class B = 2 × 10
−4
Time class C = 5 × 10
−4
Time class D = 3 × 10
−4
Total time P2 = 11 × 10
1.4.2 CPI = time × clock rate/No. instr.
CPI(P1) = 18.65 × 10
CPI(P2) = 11 × 10
−4 × 2 × 109/106 = 2.2
−4 × 1.5 × 109/106 = 2.79
1.4.3
clock cycles(P1) = 105 × 1 + 2 × 105 × 2 + 5 × 105 × 3 + 2 × 105 × 4 = 28 × 105
clock cycles(P2) = 105 × 2 + 2 × 105 × 2 + 5 × 105 × 2 + 2 × 105 × 3 = 22 × 105
1.4.4
(500 × 1 + 50 × 5 + 100 × 5 + 50 × 2) × 0.5 × 10–9 = 675 ns
1.4.5 CPI = time × clock rate/No. instr.
CPI = 675 × 10–9 × 2 × 109/700 = 1.92
1.4.6
Time = (500 × 1 + 50 × 5 + 50 × 5 + 50 × 2) × 0.5 × 10–9 = 550 ns
Speed-up = 675 ns/550 ns = 1.22
CPI = 550 × 10–9 × 2 × 109/700 = 1.57
Chapter 1 Solutions
S5
Solution 1.5
1.5.1
a. 1G, 0.75G inst/s
b. 1G, 1.5G inst/s
1.5.2
a.
b.
P2 is 1.33 times faster than P1
P1 is 1.03 times faster than P2
1.5.3
a.
b.
P2 is 1.31 times faster than P1
P1 is 1.00 times faster than P2
1.5.4
a. 2.05 µs
b. 1.93 µs
1.5.5
a. 0.71 µs
b. 0.86 µs
1.5.6
a. 1.30 times faster
b. 1.40 times faster
Solution 1.6
1.6.1
Compiler A CPI
Compiler B CPI
a.
b.
1.00
0.80
1.17
0.58
S6
Chapter 1 Solutions
1.6.2
a. 0.86
b. 1.37
1.6.3
a.
b.
1.6.4
a.
b.
Compiler A speed-up
Compiler B speed-up
1.52
1.21
P1 peak
4G Inst/s
4G Inst/s
1.77
0.88
P2 peak
3G Inst/s
3G Inst/s
1.6.5 Speed-up, P1 versus P2:
a. 0.967105263
b. 0.730263158
1.6.6
a. 6.204081633
b. 8.216216216
Solution 1.7
1.7.1
Geometric mean clock rate ratio = (1.28 × 1.56 × 2.64 × 3.03 × 10.00 × 1.80 ×
0.74)1/7 = 2.15
Geometric mean power ratio = (1.24 × 1.20 × 2.06 × 2.88 × 2.59 × 1.37 × 0.92)1/7 =
1.62
1.7.2
Largest clock rate ratio = 2000 MHz/200 MHz = 10 (Pentium Pro to Pentium 4
Willamette)
Largest power ratio = 29.1 W/10.1 W = 2.88 (Pentium to Pentium Pro)
Chapter 1 Solutions
S7
1.7.3
Clock rate: 2.667 × 109/12.5 × 106 = 212.8
Power: 95 W/3.3 W = 28.78
1.7.4 C = P/V2 × clockrate
−6
80286: C = 0.0105 × 10
−6
80386: C = 0.01025 × 10
−6
80486: C = 0.00784 × 10
−6
Pentium: C = 0.00612 × 10
Pentium Pro: C = 0.0133 × 10
Pentium 4 Willamette: C = 0.0122 × 10
−6
Pentium 4 Prescott: C = 0.00183 × 10
Core 2: C = 0.0294 × 10
1.7.5 3.3/1.75 = 1.78 (Pentium Pro to Pentium 4 Willamette)
−6
−6
−6
1.7.6
Pentium to Pentium Pro: 3.3/5 = 0.66
Pentium Pro to Pentium 4 Willamette: 1.75/3.3 = 0.53
Pentium 4 Willamette to Pentium 4 Prescott: 1.25/1.75 = 0.71
Pentium 4 Prescott to Core 2: 1.1/1.25 = 0.88
Geometric mean = 0.68
Solution 1.8
= V2 × clock rate × C. Power2
1.8.1 Power1
C2/C1 = 0.9 × 52 × 0.5 × 109/3.32 × 1 × 109 = 1.03
= 0.9 Power1
1.8.2 Power2/Power1
= V2
2 × clock rate2/V1
2 × clock rate1
Power2/Power1 = 0.87 => Reduction of 13%
1.8.3
2 × 1 × 109 × 0.8 × C1 = 0.6 × Power1
Power2 = V2
Power1 = 52 × 0.5 × 109 × C1
V2
V2 = ( (0.6 × 52 × 0.5 × 109)/(1 × 109 × 0.8) )1/2 = 3.06 V
2 × 1 × 109 × 0.8 × C1 = 0.6 × 52 × 0.5 × 109 × C1
S8
Chapter 1 Solutions
1.8.4 Powernew
power scales by 1.
= 1 × Cold × V2
old/(2
−1/4)2 × clock rate × 21/2 = Powerold. Thus,
1.8.5 1/2
−1/2 = 21/2
1.8.6 Voltage = 1.1 × 1/2
−1/4 = 0.92 V. Clock rate = 2.667 × 21/2 = 3.771 GHz
Solution 1.9
1.9.1
a. 1/49 × 100 = 2%
b. 45/120 × 100 = 37.5%
1.9.2
a.
b.
Ileak = 1/3.3 = 0.3
Ileak = 45/1.1 = 40.9
1.9.3
a.
b.
Powerst/Powerdyn = 1/49 = 0.02
Powerst/Powerdyn = 45/57 = 0.6
= 0.6 => Powerst
= 0.6 × Powerdyn
1.9.4 Powerst/Powerdyn
a.
Powerst = 0.6 × 40 W = 24 W
Powerst = 0.6 × 30 W = 18 W
b.
1.9.5
a.
b.
Ilk = 24/0.8 = 30 A
Ilk = 18/0.8 = 22.5 A