logo资料库

Computer Organization and Design 4th Edition.pdf

第1页 / 共919页
第2页 / 共919页
第3页 / 共919页
第4页 / 共919页
第5页 / 共919页
第6页 / 共919页
第7页 / 共919页
第8页 / 共919页
资料共919页,剩余部分请下载后查看
Front Cover
MIPS Reference Data
In Praise of Computer Organization and Design: The Hardware/Software Interface, Revised Fourth Edition
Acknowledgments
Computer Organization and Design: The Hardware/Software Interface
Copyright Page
Dedication Page
Contents
Preface
About This Book
About the Other Book
Changes for the Fourth Edition
Instructor Support
Concluding Remarks
Acknowledgments for the Fourth Edition
1 Computer Abstractions and Technology
1.1 Introduction
Classes of Computing Applications and Their Characteristics
What You Can Learn in This Book
1.2 Below Your Program
From a High-Level Language to the Language of Hardware
1.3 Under the Covers
Anatomy of a Mouse
Through the Looking Glass
Opening the Box
A Safe Place for Data
Communicating with Other Computers
Technologies for Building Processors and Memory
1.4 Performance
Defining Performance
Measuring Performance
CPU Performance and Its Factors
Instruction Performance
The Classic CPU Performance Equation
1.5 The Power Wall
1.6 The Sea Change: The Switch from Uniprocessors to Multiprocessors
1.7 Real Stuff: Manufacturing and Benchmarking the AMD Opteron X4
SPEC CPU Benchmark
SPEC Power Benchmark
1.8 Fallacies and Pitfalls
1.9 Concluding Remarks
Road Map for This Book
1.10 Historical Perspective and Further Reading
1.11 Exercises
Exercise 1.1
Exercise 1.2
Exercise 1.3
Exercise 1.4
Exercise 1.5
Exercise 1.6
Exercise 1.7
Exercise 1.8
Exercise 1.9
Exercise 1.10
Exercise 1.11
Exercise 1.12
Exercise 1.13
Exercise 1.14
Exercise 1.15
Exercise 1.16
2 Instructions: Language of the Computer
2.1 Introduction
2.2 Operations of the Computer Hardware
2.3 Operands of the Computer Hardware
Memory Operands
Constant or Immediate Operands
2.4 Signed and Unsigned Numbers
Summary
2.5 Representing Instructions in the Computer
MIPS Fields
2.6 Logical Operations
2.7 Instructions for Making Decisions
Loops
Case/Switch Statement
2.8 Supporting Procedures in Computer Hardware
Using More Registers
Nested Procedures
Allocating Space for New Data on the Stack
Allocating Space for New Data on the Heap
2.9 Communicating with People
Characters and Strings in Java
2.10 MIPS Addressing for 32-Bit Immediates and Addresses
32-Bit Immediate Operands
Addressing in Branches and Jumps
MIPS Addressing Mode Summary
Decoding Machine Language
2.11 Parallelism and Instructions: Synchronization
2.12 Translating and Starting a Program
Compiler
Assembler
Linker
Loader
Dynamically Linked Libraries
Starting a Java Program
2.13 A C Sort Example to Put It All Together
The Procedure swap
Register Allocation for swap
Code for the Body of the Procedure swap
The Full swap Procedure
The Procedure sort
Register Allocation for sort
Code for the Body of the Procedure sort
The Procedure Call in sort
Passing Parameters in sort
Preserving Registers in sort
The Full Procedure sort
2.14 Arrays versus Pointers
Array Version of Clear
Pointer Version of Clear
Comparing the Two Versions of Clear
2.15 Advanced Material: Compiling C and Interpreting Java
2.16 Real Stuff: ARM Instructions
Addressing Modes
Compare and Conditional Branch
Unique Features of ARM
2.17 Real Stuff: x86 Instructions
Evolution of the Intel x86
x86 Registers and Data Addressing Modes
x86 Integer Operations
x86 Instruction Encoding
x86 Conclusion
2.18 Fallacies and Pitfalls
2.19 Concluding Remarks
2.20 Historical Perspective and Further Reading
2.21 Exercises
Exercise 2.1
Exercise 2.2
Exercise 2.3
Exercise 2.4
Exercise 2.5
Exercise 2.6
Exercise 2.7
Exercise 2.8
Exercise 2.9
Exercise 2.10
Exercise 2.11
Exercise 2.12
Exercise 2.13
Exercise 2.14
Exercise 2.15
Exercise 2.16
Exercise 2.17
Exercise 2.18
Exercise 2.19
Exercise 2.20
Exercise 2.21
Exercise 2.22
Exercise 2.23
Exercise 2.24
Exercise 2.25
Exercise 2.26
Exercise 2.27
Exercise 2.28
Exercise 2.29
Exercise 2.30
Exercise 2.31
Exercise 2.32
Exercise 2.33
Exercise 2.34
Exercise 2.35
Exercise 2.36
Exercise 2.37
Exercise 2.38
Exercise 2.39
Exercise 2.40
3 Arithmetic for Computers
3.1 Introduction
3.2 Addition and Subtraction
Arithmetic for Multimedia
Summary
3.3 Multiplication
Sequential Version of the Multiplication Algorithmand Hardware
Signed Multiplication
Faster Multiplication
Multiply in MIPS
Summary
3.4 Division
A Division Algorithm and Hardware
Signed Division
Faster Division
Divide in MIPS
Summary
3.5 Floating Point
Floating-Point Representation
Floating-Point Addition
Floating-Point Multiplication
Floating-Point Instructions in MIPS
Accurate Arithmetic
Summary
3.6 Parallelism and Computer Arithmetic: Associativity
3.7 Real Stuff: Floating Point in the x86
The x86 Floating-Point Architecture
The Intel Streaming SIMD Extension 2 (SSE2)Floating-Point Architecture
3.8 Fallacies and Pitfalls
3.9 Concluding Remarks
3.10 Historical Perspective and Further Reading
3.11 Exercises
Exercise 3.1
Exercise 3.2
Exercise 3.3
Exercise 3.4
Exercise 3.5
Exercise 3.6
Exercise 3.7
Exercise 3.8
Exercise 3.9
Exercise 3.10
Exercise 3.11
Exercise 3.12
Exercise 3.13
Exercise 3.14
Exercise 3.15
4 The Processor
4.1 Introduction
A Basic MIPS Implementation
An Overview of the Implementation
4.2 Logic Design Conventions
Clocking Methodology
4.3 Building a Datapath
Creating a Single Datapath
4.4 A Simple Implementation Scheme
The ALU Control
Designing the Main Control Unit
Operation of the Datapath
Finalizing Control
Why a Single-Cycle Implementation Is Not Used Today
4.5 An Overview of Pipelining
Designing Instruction Sets for Pipelining
Pipeline Hazards
Structural Hazards
Data Hazards
Control Hazards
Pipeline Overview Summary
4.6 Pipelined Datapath and Control
Graphically Representing Pipelines
Pipelined Control
4.7 Data Hazards: Forwarding versus Stalling
Data Hazards and Stalls
4.8 Control Hazards
Assume Branch Not Taken
Reducing the Delay of Branches
Dynamic Branch Prediction
Pipeline Summary
4.9 Exceptions
How Exceptions Are Handled in the MIPS Architecture
Exceptions in a Pipelined Implementation
4.10 Parallelism and Advanced Instruction-Level Parallelism
The Concept of Speculation
Static Multiple Issue
An Example: Static Multiple Issue with the MIPS ISA
Dynamic Multiple-Issue Processors
Dynamic Pipeline Scheduling
Power Efficiency and Advanced Pipelining
4.11 Real Stuff: the AMD Opteron X4 (Barcelona) Pipeline
4.12 Advanced Topic: an Introduction to Digital Design Using a Hardware Design Language to Describe and Model a Pipeline and More Pipelining Illustrations
4.13 Fallacies and Pitfalls
4.14 Concluding Remarks
4.15 Historical Perspective and Further Reading
4.16 Exercises
Exercise 4.1
Exercise 4.2
Exercise 4.3
Exercise 4.4
Exercise 4.5
Exercise 4.6
Exercise 4.7
Exercise 4.8
Exercise 4.9
Exercise 4.10
Exercise 4.11
Exercise 4.12
Exercise 4.13
Exercise 4.14
Exercise 4.15
Exercise 4.16
Exercise 4.17
Exercise 4.18
Exercise 4.19
Exercise 4.20
Exercise 4.21
Exercise 4.22
Exercise 4.23
Exercise 4.24
Exercise 4.25
Exercise 4.26
Exercise 4.27
Exercise 4.28
Exercise 4.29
Exercise 4.30
Exercise 4.31
Exercise 4.32
Exercise 4.33
Exercise 4.34
Exercise 4.35
Exercise 4.36
Exercise 4.37
Exercise 4.38
Exercise 4.39
5 Large and Fast: Exploiting Memory Hierarchy
5.1 Introduction
5.2 The Basics of Caches
Accessing a Cache
Handling Cache Misses
Handling Writes
An Example Cache: The Intrinsity FastMATH Processor
Designing the Memory System to Support Caches
Summary
5.3 Measuring and Improving Cache Performance
Reducing Cache Misses by More Flexible Placement of Blocks
Locating a Block in the Cache
Choosing Which Block to Replace
Reducing the Miss Penalty Using Multilevel Caches
Summary
5.4 Virtual Memory
Placing a Page and Finding It Again
Page Faults
What about Writes?
Making Address Translation Fast: the TLB
The Intrinsity FastMATH TLB
Integrating Virtual Memory, TLBs, and Caches
Implementing Protection with Virtual Memory
Handling TLB Misses and Page Faults
Summary
5.5 A Common Framework for Memory Hierarchies
Question 1: Where Can a Block Be Placed?
Question 2: How Is a Block Found?
Question 3: Which Block Should Be Replaced on a Cache Miss?
Question 4: What Happens on a Write?
The Three Cs: An Intuitive Model for Understanding the Behavior of Memory Hierarchies
5.6 Virtual Machines
Requirements of a Virtual Machine Monitor
(Lack of) Instruction Set Architecture Support for Virtual Machines
Protection and Instruction Set Architecture
5.7 Using a Finite-State Machine to Control a Simple Cache
A Simple Cache
Finite-State Machines
FSM for a Simple Cache Controller
5.8 Parallelism and Memory Hierarchies: Cache Coherence
Basic Schemes for Enforcing Coherence
Snooping Protocols
5.9 Advanced Material: Implementing Cache Controllers
5.10 Real Stuff: the AMD Opteron X4 (Barcelona) and Intel Nehalem Memory Hierarchies
The Memory Hierarchies of the Nehalem and Opteron
Techniques to Reduce Miss Penalties
5.11 Fallacies and Pitfalls
5.12 Concluding Remarks
5.13 Historical Perspective and Further Reading
5.14 Exercises
Exercise 5.1
Exercise 5.2
Exercise 5.3
Exercise 5.4
Exercise 5.5
Exercise 5.6
Exercise 5.7
Exercise 5.8
Exercise 5.9
Exercise 5.10
Exercise 5.11
Exercise 5.12
Exercise 5.13
Exercise 5.14
Exercise 5.15
Exercise 5.16
Exercise 5.17
Exercise 5.18
6 Storage and Other I/O Topics
6.1 Introduction
6.2 Dependability, Reliability, and Availability
6.3 Disk Storage
6.4 Flash Storage
6.5 Connecting Processors, Memory, and I/O Devices
Connection Basics
The I/O Interconnects of the x86 Processors
6.6 Interfacing I/O Devices to the Processor, Memory, and Operating System
Giving Commands to I/O Devices
Communicating with the Processor
Interrupt Priority Levels
Transferring the Data between a Device and Memory
Direct Memory Access and the Memory System
6.7 I/O Performance Measures: Examples from Disk and File Systems
Transaction Processing I/O Benchmarks
File System and Web I/O Benchmarks
6.8 Designing an I/O System
6.9 Parallelism and I/O: Redundant Arrays of Inexpensive Disks
No Redundancy (RAID 0)
Mirroring (RAID 1)
Error Detecting and Correcting Code (RAID 2)
Bit-Interleaved Parity (RAID 3)
Block-Interleaved Parity (RAID 4)
Distributed Block-Interleaved Parity (RAID 5)
P + Q Redundancy (RAID 6)
RAID Summary
6.10 Real Stuff: Sun Fire x4150 Server
6.11 Advanced Topics: Networks
6.12 Fallacies and Pitfalls
6.13 Concluding Remarks
6.14 Historical Perspective and Further Reading
6.15 Exercises
Exercise 6.1
Exercise 6.2
Exercise 6.3
Exercise 6.4
Exercise 6.5
Exercise 6.6
Exercise 6.7
Exercise 6.8
Exercise 6.9
Exercise 6.10
Exercise 6.11
Exercise 6.12
Exercise 6.13
Exercise 6.14
Exercise 6.15
Exercise 6.16
Exercise 6.17
Exercise 6.18
Exercise 6.19
Exercise 6.20
7 Multicores, Multiprocessors, and Clusters
7.1 Introduction
7.2 The Difficulty of Creating Parallel Processing Programs
7.3 Shared Memory Multiprocessors
7.4 Clusters and Other Message-Passing Multiprocessors
7.5 Hardware Multithreading
7.6 SISD, MIMD, SIMD, SPMD, and Vector
SIMD in x86: Multimedia Extensions
Vector
Vector versus Scalar
Vector versus Multimedia Extensions
7.7 Introduction to Graphics Processing Units
An Introduction to the NVIDIA GPU Architecture
Putting GPUs into Perspective
7.8 Introduction to Multiprocessor Network Topologies
Implementing Network Topologies
7.9 Multiprocessor Benchmarks
7.10 Roofline: A Simple Performance Model
The Roofline Model
Comparing Two Generations of Opterons
7.11 Real Stuff: Benchmarking Four Multicores Using the Roofline Model
Four Multicore Systems
Sparse Matrix
Structured Grid
Productivity
7.12 Fallacies and Pitfalls
7.13 Concluding Remarks
7.14 Historical Perspective and Further Reading
7.15 Exercises
Exercise 7.1
Exercise 7.2
Exercise 7.3
Exercise 7.4
Exercise 7.5
Exercise 7.6
Exercise 7.7
Exercise 7.8
Exercise 7.9
Exercise 7.10
Exercise 7.11
Exercise 7.12
Exercise 7.13
Exercise 7.14
Exercise 7.15
Exercise 7.16
Exercise 7.17
Exercise 7.18
Exercise 7.19
Exercise 7.20
Exercise 7.21
Exercise 7.22
Exercise 7.23
Appendix A. Graphics and Computing GPUs
A.1 Introduction
A Brief History of GPU Evolution
GPU Graphics Trends
Heterogeneous System
GPU Evolves into Scalable Parallel Processor
Why CUDA and GPU Computing?
GPU Unifies Graphics and Computing
GPU Visual Computing Applications
A.2 GPU System Architectures
Heterogeneous CPU–GPU System Architecture
The Historical PC (circa 1990)
Game Consoles
GPU Interfaces and Drivers
Graphics Logical Pipeline
Mapping Graphics Pipeline to Unified GPU Processors
Basic Unified GPU Architecture
Processor Array
A.3 Programming GPUs
Programming Real-Time Graphics
Logical Graphics Pipeline
Graphics Shader Programs
Pixel Shader Example
Programming Parallel Computing Applications
Data Parallel Problem Decomposition
Scalable Parallel Programming with CUDA
The CUDA Paradigm
Restrictions
Implications for Architecture
A.4 Multithreaded Multiprocessor Architecture
Massive Multithreading
Multiprocessor Architecture
Single-Instruction Multiple-Thread (SIMT)
SIMT Warp Execution and Divergence
Managing Threads and Thread Blocks
Thread Instructions
Instruction Set Architecture (ISA)
Memory Access Instructions
Barrier Synchronization for Thread Communication
Streaming Processor (SP)
Special Function Unit (SFU)
Comparing with Other Multiprocessors
Multithreaded Multiprocessor Conclusion
A.5 Parallel Memory System
DRAM Considerations
Caches
MMU
Memory Spaces
Global memory
Shared memory
Local Memory
Constant Memory
Texture Memory
Surfaces
Load/Store Access
ROP
A.6 Floating-point Arithmetic
Supported Formats
Basic Arithmetic
Specialized Arithmetic
Texture Operations
Performance
Double precision
A.7 Real Stuff: The NVIDIA GeForce 8800
Streaming Processor Array (SPA)
Texture/Processor Cluster (TPC)
Streaming Multiprocessor (SM)
Instruction Set
Streaming Processor (SP)
Special Function Unit (SFU)
Rasterization
Raster Operations Processor (ROP) and Memory System
Scalability
Performance
Dense Linear Algebra Performance
FFT Performance
Sorting Performance
A.8 Real Stuff: Mapping Applications to GPUs
Sparse Matrices
Caching in Shared memory
Scan and Reduction
Radix Sort
N-Body Applications on a GPU1
N-Body Mathematics
Optimization for GPU Execution
Using Shared memory
Using Multiple Threads per Body
Performance Comparison
Results
A.9 Fallacies and Pitfalls
A.10 Concluding Remarks
Acknowledgments
A.11 Historical Perspective and Further Reading
Appendix B. Assemblers, Linkers, and the SPIM Simulator
B.1 Introduction
When to Use Assembly Language
Drawbacks of Assembly Language
B.2 Assemblers
Object File Format
Additional Facilities
B.3 Linkers
B.4 Loading
B.5 Memory Usage
B.6 Procedure Call Convention
Procedure Calls
Procedure Call Example
Another Procedure Call Example
B.7 Exceptions and Interrupts
B.8 Input and Output
B.9 SPIM
Simulation of a Virtual Machine
Getting Started with SPIM
Surprising Features
Byte Order
System Calls
B.10 MIPS R2000 Assembly Language
Addressing Modes
Assembler Syntax
Encoding MIPS Instructions
Instruction Format
Addition (with overflow)
Multiply (without overflow)
Arithmetic and Logical Instructions
Absolute value
Addition (with overflow)
Addition (without overflow)
Addition immediate (with overflow)
Addition immediate (without overflow)
AND
AND immediate
Count leading ones
Count leading zeros
Divide (with overflow)
Divide (without overflow)
Divide (with overflow)
Divide (without overflow)
Multiply
Unsigned multiply
Multiply (without overflow)
Multiply (with overflow)
Unsigned multiply (with overflow)
Multiply add
Unsigned multiply add
Multiply subtract
Unsigned multiply subtract
Negate value (with overflow)
Negate value (without overflow)
NOR
NOT
OR
OR immediate
Remainder
Unsigned remainder
Shift left logical
Shift left logical variable
Shift right arithmetic
Shift right arithmetic variable
Shift right logical
Shift right logical variable
Rotate left
Rotate right
Subtract (with overflow)
Subtract (without overflow)
Exclusive OR
XOR immediate
Constant-Manipulating Instructions
Load upper immediate
Load immediate
Comparison Instructions
Set less than
Set less than unsigned
Set less than immediate
Set less than unsigned immediate
Set equal
Set greater than equal
Set greater than equal unsigned
Set greater than
Set greater than unsigned
Set less than equal
Set less than equal unsigned
Set not equal
Branch Instructions
Branch instruction
Branch coprocessor false
Branch coprocessor true
Branch on equal
Branch on greater than equal zero
Branch on greater than equal zero and link
Branch on greater than zero
Branch on less than equal zero
Branch on less than and link
Branch on less than zero
Branch on not equal
Branch on equal zero
Branch on greater than equal
Branch on greater than equal unsigned
Branch on greater than
Branch on greater than unsigned
Branch on less than equal
Branch on less than equal unsigned
Branch on less than
Branch on less than unsigned
Branch on not equal zero
Jump Instructions
Jump
Jump and link
Jump and link register
Jump register
Trap Instructions
Trap if equal
Trap if equal immediate
Trap if not equal
Trap if not equal immediate
Trap if greater equal
Unsigned trap if greater equal
Trap if greater equal immediate
Unsigned trap if greater equal immediate
Trap if less than
Unsigned trap if less than
Trap if less than immediate
Unsigned trap if less than immediate
Load Instructions
Load address
Load byte
Load unsigned byte
Load halfword
Load unsigned halfword
Load word
Load word coprocessor 1
Load word left
Load word right
Load doubleword
Unaligned load halfword
Unaligned load halfword unsigned
Unaligned load word
Load linked
Store Instructions
Store byte
Store halfword
Store word
Store word coprocessor 1
Store double coprocessor 1
Store word left
Store word right
Store doubleword
Unaligned store halfword
Unaligned store word
Store conditional
Data Movement Instructions
Move
Move from hi
Move from lo
Move to hi
Move to lo
Move from coprocessor 0
Move from coprocessor 1
Move double from coprocessor 1
Move to coprocessor 0
Move to coprocessor 1
Move conditional not zero
Move conditional zero
Move conditional on FP false
Floating-Point Instructions
Floating-point absolute value double
Floating-point absolute value single
Floating-point addition double
Floating-point addition single
Floating-point ceiling to word
Compare equal double
Compare equal single
Compare less than equal double
Compare less than equal single
Compare less than double
Compare less than single
Convert single to double
Convert integer to double
Convert double to single
Convert integer to single
Convert double to integer
Convert single to integer
Floating-point divide double
Floating-point divide single
Floating-point floor to word
Load floating-point double
Load floating-point single
Move floating-point double
Move floating-point single
Move conditional floating-point double false
Move conditional floating-point single false
Move conditional floating-point double true
Move conditional floating-point single true
Move conditional floating-point double not zero
Move conditional floating-point single not zero
Move conditional floating-point double zero
Move conditional floating-point single zero
Floating-point multiply double
Floating-point multiply single
Negate double
Negate single
Floating-point round to word
Square root double
Square root single
Store floating-point double
Store floating-point single
Floating-point subtract double
Floating-point subtract single
Floating-point truncate to word
Exception and Interrupt Instructions
Exception return
System call
Break
No operation
B.11 Concluding Remarks
Further Reading
B.12 Exercises
Index
A
B
C
D
E
F
G
H
I
J
K
L
M
N
O
P
Q
R
S
T
U
V
W
X
r e h t e g o t ) 4 d n a 3 s n m u l o c ( e d i s m o t t o b d l o F . 2 d r a c e t a r a p e s o t n o i t a r o f r e p g n o l a l l u P . 1 ) ” d r a C n e e r G “ ( d r a C a t a D e c n e r e f e R S P I M M I P S Reference Data CORE INSTRUCTION SET NAME, MNEMONIC add Add Add Immediate Add Imm. Unsigned Add Unsigned And And Immediate addi addiu addu and andi Branch On Equal beq Branch On Not Equal bne Jump Jump And Link Jump Register j jal jr Load Byte Unsigned lbu Load Halfword Unsigned Load Linked Load Upper Imm. Load Word Nor Or Or Immediate Set Less Than Set Less Than Imm. Set Less Than Imm. Unsigned lhu ll lui lw nor or ori slt slti FOR- MAT OPERATION (in Verilog) R R[rd] = R[rs] + R[rt] I R[rt] = R[rs] + SignExtImm I R[rt] = R[rs] + SignExtImm R R[rd] = R[rs] + R[rt] R R[rd] = R[rs] & R[rt] I R[rt] = R[rs] & ZeroExtImm I I if(R[rs]==R[rt]) PC=PC+4+BranchAddr if(R[rs]!=R[rt]) PC=PC+4+BranchAddr PC=JumpAddr J J R[31]=PC+8;PC=JumpAddr R PC=R[rs] I I R[rt]={24’b0,M[R[rs] +SignExtImm](7:0)} R[rt]={16’b0,M[R[rs] +SignExtImm](15:0)} I R[rt] = M[R[rs]+SignExtImm] I R[rt] = {imm, 16’b0} I R[rt] = M[R[rs]+SignExtImm] R R[rd] = ~ (R[rs] | R[rt]) R R[rd] = R[rs] | R[rt] I R[rt] = R[rs] | ZeroExtImm R R[rd] = (R[rs] < R[rt]) ? 1 : 0 I R[rt] = (R[rs] < SignExtImm)? 1 : 0 (2) (2) (3) OPCODE / FUNCT (Hex) (1) 0 / 20 hex (1,2) (2) (3) (4) (4) (5) (5) (2) (2) (2,7) 8 hex 9 hex 0 / 21 0 / 24 hex hex c hex 4 hex 5 hex 2 hex 3 hex 0 / 08 hex 24 hex 25 hex hex hex 30 f 23 hex 0 / 27 0 / 25 hex hex d hex 0 / 2a hex a hex b hex sltiu I R[rt] = (R[rs] < SignExtImm) ? 1 : 0 (2,6) Set Less Than Unsig. Shift Left Logical Shift Right Logical sltu sll srl R R[rd] = (R[rs] < R[rt]) ? 1 : 0 R R[rd] = R[rt] << shamt R R[rd] = R[rt] > > shamt > (6) 0 / 2b 0 / 00 0 / 02 hex hex hex Store Byte Store Conditional Store Halfword Store Word Subtract Subtract Unsigned sb sw sh sc I I I hex hex 29 38 28 (2) sub (2,7) atomic M[R[rs]+SignExtImm](7:0) = R[rt](7:0) M[R[rs]+SignExtImm] = R[rt]; ) ? 1 : 0 R[rt] = ( M[R[rs]+SignExtImm](15:0) = R[rt](15:0) I M[R[rs]+SignExtImm] = R[rt] R R[rd] = R[rs] - R[rt] R R[rd] = R[rs] - R[rt] subu (1) May cause overflow exception (2) SignExtImm = { 16{immediate[15]}, immediate } (3) ZeroExtImm = { 16{1b’0}, immediate } (4) BranchAddr = { 14{immediate[15]}, immediate, 2’b0 } (5) JumpAddr = { PC+4[31:28], address, 2’b0 } (6) Operands considered unsigned numbers (vs. 2 s comp.) (7) Atomic test&set pair; R[rt] = 1 if pair atomic, 0 if not atomic (2) (2) hex (1) 0 / 22 0 / 23 2b hex hex hex ’ BASIC INSTRUCTION FORMATS rt rd shamt funct 21 20 16 15 rt 11 10 6 5 immediate 26 25 rs rs R I J opcode opcode opcode 31 31 31 26 25 21 20 16 15 address 26 25 0 0 0 Copyright 2009 by Elsevier, Inc., All rights reserved. From Patterson and Hennessy, Computer Organization and Design, 4th ed. 1 ARITHMETIC CORE INSTRUCTION SET 2 OPCODE / FMT /FT / FUNCT (Hex) 11/8/1/-- 11/8/0/-- 0/--/--/1a (6) 0/--/--/1b 11/10/--/0 FOR- MAT OPERATION NAME, MNEMONIC Branch On FP True Branch On FP False Divide Divide Unsigned FP Add Single FP Add Double FP Compare Single FP Compare Double * ( is x eq bc1t bc1f div divu add.s add.d c . . x s * c . . x d * op op F[ft]) ? 1 : 0 if(FPcond)PC=PC+4+BranchAddr (4) if(!FPcond)PC=PC+4+BranchAddr(4) {F[ft],F[ft+1]} {F[ft],F[ft+1]}) ? 1 : 0 FI FI R Lo=R[rs]/R[rt]; Hi=R[rs]%R[rt] R Lo=R[rs]/R[rt]; Hi=R[rs]%R[rt] FR F[fd ]= F[fs] + F[ft] FR {F[fd],F[fd+1]} = {F[fs],F[fs+1]} + FR FPcond = (F[fs] FR FPcond = ({F[fs],F[fs+1]} is ==, <, or <=) ( is 32, 3c, or 3e) y op FR F[fd] = F[fs] / F[ft] FR {F[fd],F[fd+1]} = {F[fs],F[fs+1]} / {F[ft],F[ft+1]} FR F[fd] = F[fs] * F[ft] FR {F[fd],F[fd+1]} = {F[fs],F[fs+1]} * FR F[fd]=F[fs] - F[ft] FR {F[fd],F[fd+1]} = {F[fs],F[fs+1]} - {F[ft],F[ft+1]} I F[rt]=M[R[rs]+SignExtImm] F[rt]=M[R[rs]+SignExtImm]; F[rt+1]=M[R[rs]+SignExtImm+4] {F[ft],F[ft+1]} , or lwc1 sub.d sub.s mul.d mul.s div.d ) ( le div.s , lt FP Divide Single FP Divide Double FP Multiply Single FP Multiply Double FP Subtract Single FP Subtract Double Load FP Single Load FP Double Move From Hi Move From Lo Move From Control Multiply Multiply Unsigned Shift Right Arith. Store FP Single Store FP Double FLOATING-POINT INSTRUCTION FORMATS mfhi mflo mfc0 mult multu sra swc1 ldc1 sdc1 I R R[rd] = Hi R R[rd] = Lo R R[rd] = CR[rs] R {Hi,Lo} = R[rs] * R[rt] R {Hi,Lo} = R[rs] * R[rt] R R[rd] = R[rt] >> shamt I M[R[rs]+SignExtImm] = F[rt] I M[R[rs]+SignExtImm] = F[rt]; M[R[rs]+SignExtImm+4] = F[rt+1] 11/11/--/0 11/10/--/ y 11/11/--/ y 11/10/--/3 11/11/--/3 11/10/--/2 11/11/--/2 11/10/--/1 11/11/--/1 (2) 31/--/--/-- (2) 35/--/--/-- 0 /--/--/10 0 /--/--/12 10 /0/--/0 0/--/--/18 (6) 0/--/--/19 0/--/--/3 (2) 39/--/--/-- (2) 3d/--/--/-- FR opcode 26 25 opcode FI 31 31 fmt fmt 21 20 ft ft fs fd funct 16 15 11 10 6 5 immediate 26 25 21 20 16 15 0 0 PSEUDOINSTRUCTION SET NAME Branch Less Than Branch Greater Than Branch Less Than or Equal Branch Greater Than or Equal Load Immediate Move MNEMONIC OPERATION blt bgt ble bge li move if(R[rs]R[rt]) PC = Label if(R[rs]<=R[rt]) PC = Label if(R[rs]>=R[rt]) PC = Label R[rd] = immediate R[rd] = R[rs] REGISTER NAME, NUMBER, USE, CALL CONVENTION NAME NUMBER USE $zero $at $v0-$v1 $a0-$a3 $t0-$t7 $s0-$s7 $t8-$t9 $k0-$k1 $gp $sp $fp $ra 0 1 2-3 4-7 8-15 16-23 24-25 26-27 28 29 30 31 The Constant Value 0 Assembler Temporary Values for Function Results and Expression Evaluation Arguments Temporaries Saved Temporaries Temporaries Reserved for OS Kernel Global Pointer Stack Pointer Frame Pointer Return Address PRESERVED ACROSS A CALL? N.A. No No No No Yes No No Yes Yes Yes Yes
(2) Binary funct (5:0) movz.f movn.f funct (5:0) sll Hexa- deci- mal (1) MIPS (2) MIPS mult multu div divu sync mfhi mthi mflo mtlo srl sra sllv srlv srav jr j jal beq bne blez bgtz addi addiu jalr slti movz sltiu movn andi ori xori lui add.f sub.f mul.f div.f sqrt.f abs.f mov.f neg.f OPCODES, BASE CONVERSION, ASCII SYMBOLS MIPS ASCII opcode Char- (31:26) acter 0 NUL (1) 1 SOH 2 STX 3 ETX 4 EOT 5 ENQ 6 ACK 7 BEL 8 BS 9 HT a LF b VT c FF d CR e SO SI f 10 DLE 11 DC1 12 DC2 13 DC3 14 DC4 15 NAK 16 SYN 17 ETB 18 CAN 19 EM 1a SUB 1b ESC 1c FS 1d GS 1e RS 1f US 20 Space ! 21 " 22 # 23 24 $ 25 % 26 & ’ 27 ( 28 ) 29 * 2a + 2b , 2c - 2d 2e . / 2f 0 30 1 31 2 32 3 33 4 34 35 5 6 36 7 37 8 38 9 39 : 3a ; 3b < 3c 3d = > 3e 3f ? 00 0000 00 0001 00 0010 00 0011 00 0100 00 0101 00 0110 00 0111 00 1000 00 1001 00 1010 00 1011 syscall round.w.f 00 1100 trunc.w.f 00 1101 break ceil.w.f 00 1110 floor.w.f 00 1111 01 0000 01 0001 01 0010 01 0011 01 0100 01 0101 01 0110 01 0111 01 1000 01 1001 01 1010 01 1011 01 1100 01 1101 01 1110 01 1111 10 0000 10 0001 10 0010 10 0011 10 0100 10 0101 10 0110 10 0111 10 1000 10 1001 10 1010 10 1011 10 1100 10 1101 10 1110 10 1111 11 0000 c.f.f 11 0001 c.un.f 11 0010 c.eq.f 11 0011 c.ueq.f 11 0100 c.olt.f 11 0101 c.ult.f 11 0110 c.ole.f 11 0111 c.ule.f c.sf.f 11 1000 c.ngle.f 11 1001 11 1010 c.seq.f 11 1011 c.ngl.f 11 1100 c.lt.f c.nge.f 11 1101 11 1110 c.le.f c.ngt.f 11 1111 Deci- mal 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 add addu sub subu and or xor nor swr cache ll lwc1 lwc2 pref lb lh lwl lw lbu lhu lwr tge tgeu tlt tltu teq cvt.s.f cvt.d.f sb sh swl sw ldc1 ldc2 sc swc1 swc2 sdc1 sdc2 slt sltu cvt.w.f tne 3 Hexa- deci- mal ASCII Char- acter 40 @ A 41 B 42 C 43 D 44 E 45 46 F G 47 H 48 I 49 J 4a K 4b 4c L 4d M 4e N O 4f P 50 Q 51 R 52 S 53 T 54 55 U 56 V 57 W X 58 Y 59 Z 5a [ 5b \ 5c ] 5d 5e ^ _ 5f ‘ 60 a 61 b 62 c 63 d 64 e 65 66 f g 67 h 68 i 69 j 6a k 6b l 6c m 6d 6e n o 6f p 70 q 71 r 72 s 73 t 74 75 u v 76 w 77 x 78 y 79 z 7a { 7b | 7c 7d } 7e ~ 7f DEL Deci- mal 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 (1) opcode(31:26) == 0 (2) opcode(31:26) == 17ten (11hex); if fmt(25:21)==16ten (10hex) f = s (single); if fmt(25:21)==17ten (11hex) f = d (double) IEEE 754 FLOATING-POINT STANDARD (-1)S × (1 + Fraction) × 2(Exponent - Bias) where Single Precision Bias = 127, Double Precision Bias = 1023. IEEE Single Precision and Double Precision Formats: S 31 S 63 30 62 Exponent 23 22 Exponent MEMORY ALLOCATION $sp 7fff fffchex 52 51 Stack $gp 1000 8000hex 1000 0000hex pc 0040 0000hex Dynamic Data Static Data Text 0hex Reserved DATA ALIGNMENT 4 IEEE 754 Symbols Exponent Fraction 0 0 Object ± 0 ± Denorm 0 ≠ 0 1 to MAX - 1 anything ± Fl. Pt. Num. MAX MAX 0 ≠ 0 ±∞ NaN S.P. MAX = 255, D.P. MAX = 2047 Fraction Fraction STACK FRAME ... Argument 6 Argument 5 Saved Registers Local Variables $fp $sp 0 0 Higher Memory Addresses Stack Grows Lower Memory Addresses Double Word Word Halfword Byte Halfword Byte Byte Byte Byte Halfword Byte Word Halfword Byte Byte 0 1 2 3 4 5 6 7 Value of three least significant bits of byte address (Big Endian) EXCEPTION CONTROL REGISTERS: CAUSE AND STATUS B D 31 Interrupt Mask Exception Code 15 8 6 2 Pending Interrupt U M 4 E L 1 I E 0 15 8 BD = Branch Delay, UM = User Mode, EL = Exception Level, IE =Interrupt Enable EXCEPTION CODES Number Name Number Name 0 4 5 6 7 8 Int AdEL AdES IBE DBE Sys Cause of Exception Interrupt (hardware) Address Error Exception (load or instruction fetch) Address Error Exception (store) Bus Error on Instruction Fetch Bus Error on Load or Store Syscall Exception 9 10 11 12 13 15 Cause of Exception Breakpoint Exception Reserved Instruction Exception Coprocessor Unimplemented Arithmetic Overflow Exception Trap Bp RI CpU Ov Tr FPE Floating Point Exception SIZE PREFIXES (10x for Disk, Communication; 2x for Memory) SIZE PRE- SIZE FIX 103, 210 Kilo- 1015, 250 106, 220 Mega- 1018, 260 109, 230 Giga- 1012, 240 Tera- PRE- FIX Peta- Exa- 1021, 270 Zetta- 1024, 280 Yotta- PRE- PRE- FIX FIX SIZE SIZE 10-3 milli- 10-15 femto- 10-6 micro- 10-18 atto- 10-21 10-9 zepto- 10-24 yocto- 10-12 nano- pico- The symbol for each prefix is just its first letter, except μ is used for micro. Copyright 2009 by Elsevier, Inc., All rights reserved. From Patterson and Hennessy, Computer Organization and Design, 4th ed. r e h t e g o t ) 4 d n a 3 s n m u l o c ( e d i s m o t t o b d l o F . 2 d r a c e t a r a p e s o t n o i t a r o f r e p g n o l a l l u P . 1 ) ” d r a C n e e r G “ ( d r a C a t a D e c n e r e f e R S P I M
In Praise of Computer Organization and Design: The Hardware/ Software Interface, Revised Fourth Edition “Patterson and Hennessy not only improve the pedagogy of the traditional mate- rial on pipelined processors and memory hierarchies, but also greatly expand the multiprocessor coverage to include emerging multicore processors and GPUs. The fourth edition of Computer Organization and Design sets a new benchmark against which all other architecture books must be compared.” —David A. Wood, University of Wisconsin-Madison “Patterson and Hennessy have greatly improved what was already the gold stan- dard of textbooks. In the rapidly evolving field of computer architecture, they have woven an impressive number of recent case studies and contemporary issues into a framework of time-tested fundamentals.” —Fred Chong, University of California at Santa Barbara “Since the publication of the first edition in 1994, Computer Organization and Design has introduced a generation of computer science and engineering students to computer architecture. Now, many of those students have become leaders in the field. In academia, the tradition continues as faculty use the latest edition of the book that inspired them to engage the next generation. With the fourth edition, readers are prepared for the next era of computing.” —David I. August, Princeton University “The new coverage of multiprocessors and parallelism lives up to the standards of this well-written classic. It provides well-motivated, gentle introductions to the new topics, as well as many details and examples drawn from current hardware.” —John Greiner, Rice University “As computer hardware architecture moves from uniprocessor to multicores, the parallel programming environments used to take advantage of these cores will be a defining challenge to the success of these new systems. In the multicore systems, the interface between the hardware and software is of particular importance. This new edition of Computer Organization and Design is mandatory for any student who wishes to understand multicore architecture including the interface between programming it and its architecture.” —Jesse Fang, Director of Programming System Lab at Intel “The fourth edition of Computer Organization and Design continues to improve the high standards set by the previous editions. The new content, on trends that are reshaping computer systems including multicores, Flash memory, GPUs, etc., makes this edition a must read—even for all of those who grew up on previous editions of the book.” —Parthasarathy Ranganathan, Principal Research Scientist, HP Labs
This page intentionally left blank
R E V I S E D F O U R T H E D I T I O N Computer Organization and Design T H E H A R D W A R E / S O F T W A R E I N T E R F A C E
A C K N O W L E D G M E N T S Figures 1.7, 1.8 Courtesy of Other World Computing (www.macsales.com). Figure 1.10.6 Courtesy of the Computer History Museum. Figures 1.9, 1.19, 5.37 Courtesy of AMD. Figures 5.12.1, 5.12.2 Courtesy of Museum of Science, Boston. Figure 1.10 Courtesy of Storage Technology Corp. Figure 5.12.4 Courtesy of MIPS Technologies, Inc. Figures 1.10.1, 1.10.2, 4.15.2 Courtesy of the Charles Babbage Institute, University of Minnesota Libraries, Minneapolis. Figures 6.15, 6.16, 6.17 Courtesy of Sun Microsystems, Inc. Figure 6.4 © Peg Skorpinski. Figures 1.10.3, 4.15.1, 4.15.3, 5.12.3, 6.14.2 Courtesy of IBM. Figure 6.14.1 Courtesy of the Computer Museum of America. Figure 1.10.4 Courtesy of Cray Inc. Figure 6.14.3 Courtesy of the Commercial Computing Museum. Figure 1.10.5 Courtesy of Apple Computer, Inc. Figures 7.13.1 Courtesy of NASA Ames Research Center.
R E V I S E D F O U R T H E D I T I O N Computer Organization and Design T H E H A R D W A R E / S O F T W A R E I N T E R F A C E David A. Patterson University of California, Berkeley John L. Hennessy Stanford University With contributions by Perry Alexander The University of Kansas Peter J. Ashenden Ashenden Designs Pty Ltd David Kaeli Northeastern University Kevin Lim Hewlett-Packard Nicole Kaiyan University of Adelaide John Nickolls NVIDIA Javier Bruguera Universidade de Santiago de Compostela David Kirk NVIDIA John Oliver Cal Poly, San Luis Obispo Jichuan Chang Hewlett-Packard Matthew Farrens University of California, Davis James R. Larus Microsoft Research Jacob Leverich Hewlett-Packard Milos Prvulovic Georgia Tech Partha Ranganathan Hewlett-Packard AMSTERDAM • BOSTON • HEIDELBERG • LONDON NEW YORK • OXFORD • PARIS • SAN DIEGO SAN FRANCISCO • SINGAPORE • SYDNEY • TOKYO Morgan Kaufmann is an imprint of Elsevier
分享到:
收藏