1
Solutions
Solution 1.1
1.1.1 Computer used to run large problems and usually accessed via a network:
(3) servers
1.1.2 1015 or 250 bytes: (7) petabyte
1.1.3 A class of computers composed of hundred to thousand processors and tera-
bytes of memory and having the highest performance and cost: (5) supercomputers
1.1.4 Today’s science fi ction application that probably will be available in near
future: (1) virtual worlds
1.1.5 A kind of memory called random access memory: (12) RAM
1.1.6 Part of a computer called central processor unit: (13) CPU
1.1.7 Thousands of processors forming a large cluster: (8) data centers
1.1.8 Microprocessors containing several processors in the same chip: (10) multi-
core processors
1.1.9 Desktop computer without a screen or keyboard usually accessed via a net-
work: (4) low-end servers
1.1.10 A computer used to running one predetermined application or collection
of software: (9) embedded computers
1.1.11 Special language used to describe hardware components: (11) VHDL
1.1.12 Personal computer delivering good performance to single users at low cost:
(2) desktop computers
1.1.13 Program that translates statements in high-level language to assembly
language: (15) compiler
Sol01-9780123747501.indd S1
Sol01-9780123747501.indd S1
9/5/11 11:24 AM
9/5/11 11:24 AM
S2
Chapter 1 Solutions
1.1.14 Program that translates symbolic instructions to binary instructions:
(21) assembler
1.1.15 High-level language for business data processing: (25) Cobol
1.1.16 Binary language that the processor can understand: (19) machine language
1.1.17 Commands that the processors understand: (17) instruction
1.1.18 High-level language for scientifi c computation: (26) Fortran
1.1.19 Symbolic representation of machine instructions: (18) assembly language
1.1.20 Interface between user’s program and hardware providing a variety of
services and supervision functions: (14) operating system
1.1.21 Software/programs developed by the users: (24) application software
1.1.22 Binary digit (value 0 or 1): (16) bit
1.1.23 Software layer between the application software and the hardware that
includes the operating system and the compilers: (23) system software
1.1.24 High-level language used to write application and system software: (20) C
1.1.25 Portable language composed of words and algebraic expressions that must
be translated into assembly language before run in a computer: (22) high-level
language
1.1.26 1012 or 240 bytes: (6) terabyte
Solution 1.2
1.2.1 8 bits × 3 colors = 24 bits/pixel = 3 bytes/pixel.
AQ 1
a.
b.
Confi guration 1: 640 × 480 pixels = 179,200 pixels => 179,200 × 3 = 537,600 bytes/frame
Confi guration 2: 1280 × 1024 pixels = 1,310,720 pixels => 1,310,720 × 3 = 3,932,160
bytes/frame
Confi guration 1: 1024 × 768 pixels = 786,432 pixels => 786,432 × 3 = 2,359,296
bytes/frame
Confi guration 2: 2560 × 1600 pixels = 4,096,000 pixels => 4,096,000 × 3 = 12,288,000
bytes/frame
Sol01-9780123747501.indd S2
Sol01-9780123747501.indd S2
9/5/11 11:24 AM
9/5/11 11:24 AM
Chapter 1 Solutions
S3
1.2.2 No. frames = integer part of (Capacity of main memory/bytes per frame)
a.
b.
Confi guration 1: Main memory: 2 GB = 2000 Mbytes. Frame: 537.600 Mbytes => No. frames = 3
Confi guration 2: Main memory: 4 GB = 4000 Mbytes. Frame: 3,932.160 Mbytes => No. frames = 1
Confi guration 1: Main memory: 2 GB = 2000 Mbytes. Frame: 2,359.296 Mbytes => No. frames = 0
Confi guration 2: Main memory: 4 GB = 4000 Mbytes. Frame: 12,288 Mbytes => No. frames = 0
1.2.3 File size: 256 Kbytes = 0.256 Mbytes.
Same solution for a) and b)
Confi guration 1: Network speed: 100 Mbit/sec = 12.5 Mbytes/sec. Time = 0.256/12.5 = 20.48 ms
Confi guration 2: Network speed: 1 Gbit/sec = 125 Mbytes/sec. Time = 0.256/125 = 2.048 ms
AQ 2
1.2.4
a.
b.
2 microseconds from cache ⇒ 20 microseconds from DRAM.
2 microseconds from cache ⇒ 20 microseconds from DRAM.
1.2.5
a.
b.
2 microseconds from cache ⇒ 2 ms from Flash memory.
2 microseconds from cache ⇒ 4.28 ms from Flash memory.
1.2.5
a.
b.
2 microseconds from cache ⇒ 2 s from magnetic disk.
2 microseconds from cache ⇒ 5.7 s from magnetic disk.
Solution 1.3
1.3.1 P2 has the highest performance.
Instr/sec = f/CPI
a.
b.
performance of P1 (instructions/sec) = 3 × 109/1.5 = 2 × 109
performance of P2 (instructions/sec) = 2.5 × 109/1.0 = 2.5 × 109
performance of P3 (instructions/sec) = 4 × 109/2.2 = 1.8 × 109
performance of P1 (instructions/sec) = 2 × 109/1.2 = 1.66 × 109
performance of P2 (instructions/sec) = 3 × 109/0.8 = 3.75 × 109
performance of P3 (instructions/sec) = 4 × 109/2 = 2 × 109
Sol01-9780123747501.indd S3
Sol01-9780123747501.indd S3
9/5/11 11:24 AM
9/5/11 11:24 AM
S4
Chapter 1 Solutions
1.3.2 No. cycles = time × clock rate
time = (No. Instr × CPI)/clock rate, then No. instructions = No. cycles/CPI
AQ 3
a.
b.
cycles(P1) = 10 × 3 × 109 = 30 × 109 s
cycles(P2) = 10 × 2.5 × 109 = 25 × 109 s
cycles(P3) = 10 × 4 × 109 = 40 × 109 s
No. instructions(P1) = 30 × 109/1.5 = 20 × 109
No. instructions(P2) = 25 × 109/1 = 25 × 109
No. instructions(P3) = 40 × 109/2.2 = 18.18 × 109
cycles(P1) = 10 × 2 × 109 = 20 × 109 s
cycles(P2) = 10 × 3 × 109 = 30 × 109 s
cycles(P3) = 10 × 4 × 109 = 40 × 109 s
No. instructions(P1) = 20 × 109/1.2 = 16.66 × 109
No. instructions(P2) = 30 × 109/0.8 = 37.5 × 109
No. instructions(P3) = 40 × 109/2 = 20 × 109
1.3.3 timenew = timeold × 0.7 = 7 s
a.
b.
CPInew = CPIold × 1.2, then CPI(P1) = 1.8, CPI(P2) = 1.2, CPI(P3) = 2.6
f = No. Instr × CPI/time, then
f(P1) = 20 × 109 × 1.8 / 7 = 5.14 GHz
f(P2) = 25 × 109 × 1.2 / 7 = 4.28 GHz
f(P1) = 18.18 × 109 × 2.6 / 7 = 6.75 GHz
CPInew = CPIold × 1.2, then CPI(P1) = 1.44, CPI(P2) = 0.96, CPI(P3) = 2.4
f = No. Instr × CPI/time, then
f(P1) = 16.66 × 109 × 1.44/7 = 3.42 GHz
f(P2) = 37.5 × 109 × 0.96/7 = 5.14 GHz
f(P1) = 20 × 109 × 2.4/7 = 6.85 GHz
1.3.4 IPC = 1/CPI = No. instr/(time × clock rate)
a.
b.
IPC(P1) = 0.95
IPC(P2) = 1.2
IPC(P3) = 2.5
IPC(P1) = 2
IPC(P2) = 1.25
IPC(P3) = 0.89
1.3.5
a.
b.
Timenew/Timeold = 7/10 = 0.7. So fnew = fold/0.7 = 2.5 GHz/0.7 = 3.57 GHz.
Timenew/Timeold = 5/8 = 0.625. So fnew = fold/0.625 = 4.8 GHz.
Sol01-9780123747501.indd S4
Sol01-9780123747501.indd S4
9/5/11 11:24 AM
9/5/11 11:24 AM
1.3.6
Chapter 1 Solutions
S5
a.
b.
Timenew/Timeold = 9/10 = 0.9. Then Instructionsnew = Instructionsold × 0.9 = 30 × 109 × 0.9 = 27
× 109.
Timenew/Timeold = 7/8 = 0.875. Then Instructionsnew = Instructionsold × 0.875 = 26.25 × 109.
Solution 1.4
1.4.1
Class A: 105 instr.
Class B: 2 × 105 instr.
Class C: 5 × 105 instr.
Class D: 2 × 105 instr.
Time = No. instr × CPI/clock rate
a.
b.
Total time P1 = (105 + 2 × 105 × 2 + 5 × 105 × 3 + 2 × 105 × 3)/(2.5 × 109) = 10.4 × 10−4 s
Total time P2 = (105 × 2 + 2 × 105 × 2 + 5 × 105 × 2 + 2 × 105 × 2)/(3 × 109) = 6.66 × 10−4 s
Total time P1 = (105 × 2 + 2 × 105 × 1.5 + 5 × 105 × 2 + 2 × 105)/(2.5 × 109) = 6.8 × 10−4 s
Total time P2 = (105 + 2 × 105 × 2 + 5 × 105 + 2 × 105)/(3 × 109) = 4 × 10−4 s
1.4.2 CPI = time × clock rate/No. instr
a.
b.
CPI (P1) = 10.4 × 10−4 × 2.5 × 109/106 = 2.6
CPI (P2) = 6.66 × 10−4 × 3 × 109/106 = 2.0
CPI (P1) = 6.8 × 10−4 × 2.5 × 109/106 = 1.7
CPI (P2) = 4 × 10−4 × 3 × 109/106 = 1.2
1.4.3
a.
b.
clock cycles (P1) = 105 × 1 + 2 × 105 × 2 + 5 × 105 × 3 + 2 × 105 × 3 = 26 × 105
clock cycles (P2) = 105 × 2 + 2 × 105 × 2 + 5 × 105 × 2 + 2 × 105 × 2 = 20 × 105
clock cycles (P1) = 17 × 105
clock cycles (P2) = 12 × 105
1.4.4
a.
b.
(650 × 1 + 100 × 5 + 600 × 5 + 50 × 2) × 0.5 × 10–9 = 2,125 ns
(750 × 1 + 250 × 5 + 500 × 5 + 500 × 2) × 0.5 × 10–9 = 2,750 ns
1.4.5 CPI = time × clock rate/No. instr
a.
b.
CPI = 2,125 × 10–9 × 2 × 109/1,400 = 3.03
CPI = 2,750 × 10–9 × 2 × 109/2,000 = 2.75
Sol01-9780123747501.indd S5
Sol01-9780123747501.indd S5
9/5/11 11:24 AM
9/5/11 11:24 AM
S6
Chapter 1 Solutions
1.4.6
a.
b.
Time = (650 × 1 + 100 × 5 + 300 × 5 + 50 × 2) × 0.5 × 10–9 = 1,375 ns
Speedup = 2,125 ns/1,375 ns = 1.54
CPI = 1,375 × 10–9 × 2 × 109/1,100 = 2.5
Time = (750 × 1 + 250 × 5 + 250 × 5 + 500 × 2) × 0.5 × 10–9 = 2,125 ns
Speedup = 2,750 ns/2,125 ns = 1.29
CPI = 2,125 × 10–9 × 2 × 109/1,750 = 2.43
Solution 1.5
1.5.1
a.
b.
P1: 2 × 109 inst/sec, P2: 2 × 109 inst/sec
P1: 2 × 109 inst/sec, P2: 3 × 109 inst/sec
1.5.2
a.
b.
T(P2)/T(P1) = 4/7;
P2 is 1.75 times faster than P1
T(P2)/T(P1 )= 4.66/5; P2 is 1.07 times faster than P1
1.5.3
a.
b.
T(P2)/T(P1) = 4.5/8;
P2 is 1.77 times faster than P1
T(P2)/T(P1) = 5.33/5.5; P2 is 1.03 times faster than P1
1.5.4
a.
b.
2.91 µs
2.50 µs
1.5.5
a.
b.
0.78 µs
0.90 µs
1.5.6
a.
b.
T = 0.68µs => 1.14 times faster
T = 0.75µs => 1.20 times faster
Sol01-9780123747501.indd S6
Sol01-9780123747501.indd S6
9/5/11 11:24 AM
9/5/11 11:24 AM
Chapter 1 Solutions
S7
Solution 1.6
1.6.1 CPI = Texec × f/No. Instr
Compiler A CPI
a.
b.
1.8
1.1
Compiler B CPI
1.5
1.25
1.6.2 fA/fB = (No. Instr(A) ´ CPI(A))/(No. Instr(B) ´ CPI(B))
a.
b.
fA/fB = 1
fA/fB = 0.73
1.6.3
a.
b.
1.6.4
a.
b.
Speedup vs. Compiler A
Speedup vs. Compiler B
Tnew/TA = 0.36
Tnew/TA = 0.6
P1 Peak
4 × 109 Inst/s
4 × 109 Inst/s
Tnew/TB = 0.36
Tnew/TB = 0.44
P2 Peak
2 × 109 Inst/s
3 × 109 Inst/s
1.6.5 Speedup, P1 versus P2:
a.
b.
T1/T2 = 1.9
T1/T2 = 1.5
1.6.6
a.
b.
4.37 GHz
6 GHz
Solution 1.7
1.7.1
Geometric mean clock rate ratio = (1.28 × 1.56 × 2.64 × 3.03 × 10.00 × 1.80 ×
0.74)1/7 = 2.15
Geometric mean power ratio = (1.24 × 1.20 × 2.06 × 2.88 × 2.59 × 1.37 × 0.92)1/7 = 1.62
Sol01-9780123747501.indd S7
Sol01-9780123747501.indd S7
9/5/11 11:24 AM
9/5/11 11:24 AM
S8
Chapter 1 Solutions
1.7.2
Largest clock rate ratio = 2000 MHz/200 MHz = 10 (Pentium Pro to Pentium 4
Willamette)
Largest power ratio = 29.1 W/10.1 W = 2.88 (Pentium to Pentium Pro)
1.7.3
Clock rate: 2.667 × 109/12.5 × 106 = 213.36
Power: 95 W/3.3 W = 28.78
1.7.4 C = P/V2 × clock rate
80286: C = 0.0105 × 10−6
80386: C = 0.01025 × 10−6
80486: C = 0.00784 × 10−6
Pentium: C = 0.00612 × 10−6
Pentium Pro: C = 0.0133 × 10−6
Pentium 4 Willamette: C = 0.0122 ×10−6
Pentium 4 Prescott: C = 0.00183 × 10−6
Core 2: C = 0.0294 ×10−6
1.7.5 3.3/1.75 = 1.78 (Pentium Pro to Pentium 4 Willamette)
1.7.6
Pentium to Pentium Pro: 3.3/5 = 0.66
Pentium Pro to Pentium 4 Willamette: 1.75/3.3 = 0.53
Pentium 4 Willamette to Pentium 4 Prescott: 1.25/1.75 = 0.71
Pentium 4 Prescott to Core 2: 1.1/1.25 = 0.88
Geometric mean = 0.68
Solution 1.8
1.8.1 Power = V2 × clock rate × C. Power2 = 0.9 Power1
a.
b.
C2/C1 = 0.9 × 1.752 × 1.5 × 109/(1.22 × 2 × 109) = 1.43
C2/C1 = 0.9 × 1.12 × 3 × 109/(0.82 × 4 × 109) = 1.27
1.8.2 Power2/Power1 = V2
2 × clock rate2/(V1
2 × clock rate1)
a.
b.
Power2/Power1 = 0.62 => Reduction of 38%
Power2/Power1 = 0.7 => Reduction of 30%
Sol01-9780123747501.indd S8
Sol01-9780123747501.indd S8
9/5/11 11:24 AM
9/5/11 11:24 AM