Computer Organization and Design 4th Edition.pdf

发布时间：2022-06-09 发布人：admin 分类：说明书资料大小：16.91M 资料格式：pdf 举报版权申诉

u013601429-7891163-16359647930411976614.pdf-第1页.png

第1页 / 共919页

u013601429-7891163-16359647930411976614.pdf-第2页.png

第2页 / 共919页

u013601429-7891163-16359647930411976614.pdf-第3页.png

第3页 / 共919页

u013601429-7891163-16359647930411976614.pdf-第4页.png

第4页 / 共919页

u013601429-7891163-16359647930411976614.pdf-第5页.png

第5页 / 共919页

u013601429-7891163-16359647930411976614.pdf-第6页.png

第6页 / 共919页

u013601429-7891163-16359647930411976614.pdf-第7页.png

第7页 / 共919页

u013601429-7891163-16359647930411976614.pdf-第8页.png

第8页 / 共919页

Front Cover

MIPS Reference Data

In Praise of Computer Organization and Design: The Hardware/Software Interface, Revised Fourth Edition

Acknowledgments

Computer Organization and Design: The Hardware/Software Interface

Dedication Page

Contents

Preface

About This Book

About the Other Book

Changes for the Fourth Edition

Instructor Support

Concluding Remarks

Acknowledgments for the Fourth Edition

1 Computer Abstractions and Technology

1.1 Introduction

Classes of Computing Applications and Their Characteristics

What You Can Learn in This Book

1.2 Below Your Program

From a High-Level Language to the Language of Hardware

1.3 Under the Covers

Anatomy of a Mouse

Through the Looking Glass

Opening the Box

A Safe Place for Data

Communicating with Other Computers

Technologies for Building Processors and Memory

1.4 Performance

Defining Performance

Measuring Performance

CPU Performance and Its Factors

Instruction Performance

The Classic CPU Performance Equation

1.5 The Power Wall

1.6 The Sea Change: The Switch from Uniprocessors to Multiprocessors

1.7 Real Stuff: Manufacturing and Benchmarking the AMD Opteron X4

SPEC CPU Benchmark

SPEC Power Benchmark

1.8 Fallacies and Pitfalls

1.9 Concluding Remarks

Road Map for This Book

1.10 Historical Perspective and Further Reading

1.11 Exercises

Exercise 1.1

Exercise 1.2

Exercise 1.3

Exercise 1.4

Exercise 1.5

Exercise 1.6

Exercise 1.7

Exercise 1.8

Exercise 1.9

Exercise 1.10

Exercise 1.11

Exercise 1.12

Exercise 1.13

Exercise 1.14

Exercise 1.15

Exercise 1.16

2 Instructions: Language of the Computer

2.1 Introduction

2.2 Operations of the Computer Hardware

2.3 Operands of the Computer Hardware

Memory Operands

Constant or Immediate Operands

2.4 Signed and Unsigned Numbers

Summary

2.5 Representing Instructions in the Computer

MIPS Fields

2.6 Logical Operations

2.7 Instructions for Making Decisions

Loops

Case/Switch Statement

2.8 Supporting Procedures in Computer Hardware

Using More Registers

Nested Procedures

Allocating Space for New Data on the Stack

Allocating Space for New Data on the Heap

2.9 Communicating with People

Characters and Strings in Java

2.10 MIPS Addressing for 32-Bit Immediates and Addresses

32-Bit Immediate Operands

Addressing in Branches and Jumps

MIPS Addressing Mode Summary

Decoding Machine Language

2.11 Parallelism and Instructions: Synchronization

2.12 Translating and Starting a Program

Compiler

Assembler

Linker

Loader

Dynamically Linked Libraries

Starting a Java Program

2.13 A C Sort Example to Put It All Together

The Procedure swap

Code for the Body of the Procedure swap

The Full swap Procedure

The Procedure sort

Code for the Body of the Procedure sort

The Procedure Call in sort

Passing Parameters in sort

Preserving Registers in sort

The Full Procedure sort

2.14 Arrays versus Pointers

Array Version of Clear

Pointer Version of Clear

Comparing the Two Versions of Clear

2.15 Advanced Material: Compiling C and Interpreting Java

2.16 Real Stuff: ARM Instructions

Addressing Modes

Compare and Conditional Branch

Unique Features of ARM

2.17 Real Stuff: x86 Instructions

Evolution of the Intel x86

x86 Registers and Data Addressing Modes

x86 Integer Operations

x86 Instruction Encoding

x86 Conclusion

2.18 Fallacies and Pitfalls

2.19 Concluding Remarks

2.20 Historical Perspective and Further Reading

2.21 Exercises

Exercise 2.1

Exercise 2.2

Exercise 2.3

Exercise 2.4

Exercise 2.5

Exercise 2.6

Exercise 2.7

Exercise 2.8

Exercise 2.9

Exercise 2.10

Exercise 2.11

Exercise 2.12

Exercise 2.13

Exercise 2.14

Exercise 2.15

Exercise 2.16

Exercise 2.17

Exercise 2.18

Exercise 2.19

Exercise 2.20

Exercise 2.21

Exercise 2.22

Exercise 2.23

Exercise 2.24

Exercise 2.25

Exercise 2.26

Exercise 2.27

Exercise 2.28

Exercise 2.29

Exercise 2.30

Exercise 2.31

Exercise 2.32

Exercise 2.33

Exercise 2.34

Exercise 2.35

Exercise 2.36

Exercise 2.37

Exercise 2.38

Exercise 2.39

Exercise 2.40

3 Arithmetic for Computers

3.1 Introduction

3.2 Addition and Subtraction

Arithmetic for Multimedia

Summary

3.3 Multiplication

Sequential Version of the Multiplication Algorithmand Hardware

Signed Multiplication

Faster Multiplication

Multiply in MIPS

Summary

3.4 Division

A Division Algorithm and Hardware

Signed Division

Faster Division

Divide in MIPS

Summary

3.5 Floating Point

Floating-Point Representation

Floating-Point Addition

Floating-Point Multiplication

Floating-Point Instructions in MIPS

Accurate Arithmetic

Summary

3.6 Parallelism and Computer Arithmetic: Associativity

3.7 Real Stuff: Floating Point in the x86

The x86 Floating-Point Architecture

The Intel Streaming SIMD Extension 2 (SSE2)Floating-Point Architecture

3.8 Fallacies and Pitfalls

3.9 Concluding Remarks

3.10 Historical Perspective and Further Reading

3.11 Exercises

Exercise 3.1

Exercise 3.2

Exercise 3.3

Exercise 3.4

Exercise 3.5

Exercise 3.6

Exercise 3.7

Exercise 3.8

Exercise 3.9

Exercise 3.10

Exercise 3.11

Exercise 3.12

Exercise 3.13

Exercise 3.14

Exercise 3.15

4 The Processor

4.1 Introduction

A Basic MIPS Implementation

An Overview of the Implementation

4.2 Logic Design Conventions

Clocking Methodology

4.3 Building a Datapath

Creating a Single Datapath

4.4 A Simple Implementation Scheme

The ALU Control

Designing the Main Control Unit

Operation of the Datapath

Finalizing Control

Why a Single-Cycle Implementation Is Not Used Today

4.5 An Overview of Pipelining

Designing Instruction Sets for Pipelining

Pipeline Hazards

Structural Hazards

Data Hazards

Control Hazards

Pipeline Overview Summary

4.6 Pipelined Datapath and Control

Graphically Representing Pipelines

Pipelined Control

4.7 Data Hazards: Forwarding versus Stalling

Data Hazards and Stalls

4.8 Control Hazards

Assume Branch Not Taken

Reducing the Delay of Branches

Dynamic Branch Prediction

Pipeline Summary

4.9 Exceptions

How Exceptions Are Handled in the MIPS Architecture

Exceptions in a Pipelined Implementation

4.10 Parallelism and Advanced Instruction-Level Parallelism

The Concept of Speculation

Static Multiple Issue

An Example: Static Multiple Issue with the MIPS ISA

Dynamic Multiple-Issue Processors

Dynamic Pipeline Scheduling

Power Efficiency and Advanced Pipelining

4.11 Real Stuff: the AMD Opteron X4 (Barcelona) Pipeline

4.12 Advanced Topic: an Introduction to Digital Design Using a Hardware Design Language to Describe and Model a Pipeline and More Pipelining Illustrations

4.13 Fallacies and Pitfalls

4.14 Concluding Remarks

4.15 Historical Perspective and Further Reading

4.16 Exercises

Exercise 4.1

Exercise 4.2

Exercise 4.3

Exercise 4.4

Exercise 4.5

Exercise 4.6

Exercise 4.7

Exercise 4.8

Exercise 4.9

Exercise 4.10

Exercise 4.11

Exercise 4.12

Exercise 4.13

Exercise 4.14

Exercise 4.15

Exercise 4.16

Exercise 4.17

Exercise 4.18

Exercise 4.19

Exercise 4.20

Exercise 4.21

Exercise 4.22

Exercise 4.23

Exercise 4.24

Exercise 4.25

Exercise 4.26

Exercise 4.27

Exercise 4.28

Exercise 4.29

Exercise 4.30

Exercise 4.31

Exercise 4.32

Exercise 4.33

Exercise 4.34

Exercise 4.35

Exercise 4.36

Exercise 4.37

Exercise 4.38

Exercise 4.39

5 Large and Fast: Exploiting Memory Hierarchy

5.1 Introduction

5.2 The Basics of Caches

Accessing a Cache

Handling Cache Misses

Handling Writes

An Example Cache: The Intrinsity FastMATH Processor

Designing the Memory System to Support Caches

Summary

5.3 Measuring and Improving Cache Performance

Reducing Cache Misses by More Flexible Placement of Blocks

Locating a Block in the Cache

Choosing Which Block to Replace

Reducing the Miss Penalty Using Multilevel Caches

Summary

5.4 Virtual Memory

Placing a Page and Finding It Again

Page Faults

What about Writes?

Making Address Translation Fast: the TLB

The Intrinsity FastMATH TLB

Integrating Virtual Memory, TLBs, and Caches

Implementing Protection with Virtual Memory

Handling TLB Misses and Page Faults

Summary

5.5 A Common Framework for Memory Hierarchies

Question 1: Where Can a Block Be Placed?

Question 2: How Is a Block Found?

Question 3: Which Block Should Be Replaced on a Cache Miss?

Question 4: What Happens on a Write?

The Three Cs: An Intuitive Model for Understanding the Behavior of Memory Hierarchies

5.6 Virtual Machines

Requirements of a Virtual Machine Monitor

(Lack of) Instruction Set Architecture Support for Virtual Machines

Protection and Instruction Set Architecture

5.7 Using a Finite-State Machine to Control a Simple Cache

A Simple Cache

Finite-State Machines

FSM for a Simple Cache Controller

5.8 Parallelism and Memory Hierarchies: Cache Coherence

Basic Schemes for Enforcing Coherence

Snooping Protocols

5.9 Advanced Material: Implementing Cache Controllers

5.10 Real Stuff: the AMD Opteron X4 (Barcelona) and Intel Nehalem Memory Hierarchies

The Memory Hierarchies of the Nehalem and Opteron

Techniques to Reduce Miss Penalties

5.11 Fallacies and Pitfalls

5.12 Concluding Remarks

5.13 Historical Perspective and Further Reading

5.14 Exercises

Exercise 5.1

Exercise 5.2

Exercise 5.3

Exercise 5.4

Exercise 5.5

Exercise 5.6

Exercise 5.7

Exercise 5.8

Exercise 5.9

Exercise 5.10

Exercise 5.11

Exercise 5.12

Exercise 5.13

Exercise 5.14

Exercise 5.15

Exercise 5.16

Exercise 5.17

Exercise 5.18

6 Storage and Other I/O Topics

6.1 Introduction

6.2 Dependability, Reliability, and Availability

6.3 Disk Storage

6.4 Flash Storage

6.5 Connecting Processors, Memory, and I/O Devices

Connection Basics

The I/O Interconnects of the x86 Processors

6.6 Interfacing I/O Devices to the Processor, Memory, and Operating System

Giving Commands to I/O Devices

Communicating with the Processor

Interrupt Priority Levels

Transferring the Data between a Device and Memory

Direct Memory Access and the Memory System

6.7 I/O Performance Measures: Examples from Disk and File Systems

Transaction Processing I/O Benchmarks

File System and Web I/O Benchmarks

6.8 Designing an I/O System

6.9 Parallelism and I/O: Redundant Arrays of Inexpensive Disks

No Redundancy (RAID 0)

Mirroring (RAID 1)

Error Detecting and Correcting Code (RAID 2)

Bit-Interleaved Parity (RAID 3)

Block-Interleaved Parity (RAID 4)

Distributed Block-Interleaved Parity (RAID 5)

P + Q Redundancy (RAID 6)

RAID Summary

6.10 Real Stuff: Sun Fire x4150 Server

6.11 Advanced Topics: Networks

6.12 Fallacies and Pitfalls

6.13 Concluding Remarks

6.14 Historical Perspective and Further Reading

6.15 Exercises

Exercise 6.1

Exercise 6.2

Exercise 6.3

Exercise 6.4

Exercise 6.5

Exercise 6.6

Exercise 6.7

Exercise 6.8

Exercise 6.9

Exercise 6.10

Exercise 6.11

Exercise 6.12

Exercise 6.13

Exercise 6.14

Exercise 6.15

Exercise 6.16

Exercise 6.17

Exercise 6.18

Exercise 6.19

Exercise 6.20

7 Multicores, Multiprocessors, and Clusters

7.1 Introduction

7.2 The Difficulty of Creating Parallel Processing Programs

7.3 Shared Memory Multiprocessors

7.4 Clusters and Other Message-Passing Multiprocessors

7.5 Hardware Multithreading

7.6 SISD, MIMD, SIMD, SPMD, and Vector

SIMD in x86: Multimedia Extensions

Vector

Vector versus Scalar

Vector versus Multimedia Extensions

7.7 Introduction to Graphics Processing Units

An Introduction to the NVIDIA GPU Architecture

Putting GPUs into Perspective

7.8 Introduction to Multiprocessor Network Topologies

Implementing Network Topologies

7.9 Multiprocessor Benchmarks

7.10 Roofline: A Simple Performance Model

The Roofline Model

Comparing Two Generations of Opterons

7.11 Real Stuff: Benchmarking Four Multicores Using the Roofline Model

Four Multicore Systems

Sparse Matrix

Structured Grid

Productivity

7.12 Fallacies and Pitfalls

7.13 Concluding Remarks

7.14 Historical Perspective and Further Reading

7.15 Exercises

Exercise 7.1

Exercise 7.2

Exercise 7.3

Exercise 7.4

Exercise 7.5

Exercise 7.6

Exercise 7.7

Exercise 7.8

Exercise 7.9

Exercise 7.10

Exercise 7.11

Exercise 7.12

Exercise 7.13

Exercise 7.14

Exercise 7.15

Exercise 7.16

Exercise 7.17

Exercise 7.18

Exercise 7.19

Exercise 7.20

Exercise 7.21

Exercise 7.22

Exercise 7.23

Appendix A. Graphics and Computing GPUs

A.1 Introduction

A Brief History of GPU Evolution

GPU Graphics Trends

Heterogeneous System

GPU Evolves into Scalable Parallel Processor

Why CUDA and GPU Computing?

GPU Unifies Graphics and Computing

GPU Visual Computing Applications

A.2 GPU System Architectures

Heterogeneous CPU–GPU System Architecture

The Historical PC (circa 1990)

Game Consoles

GPU Interfaces and Drivers

Graphics Logical Pipeline

Mapping Graphics Pipeline to Unified GPU Processors

Basic Unified GPU Architecture

Processor Array

A.3 Programming GPUs

Programming Real-Time Graphics

Logical Graphics Pipeline

Graphics Shader Programs

Pixel Shader Example

Programming Parallel Computing Applications

Data Parallel Problem Decomposition

Scalable Parallel Programming with CUDA

The CUDA Paradigm

Restrictions

Implications for Architecture

A.4 Multithreaded Multiprocessor Architecture

Massive Multithreading

Multiprocessor Architecture

Single-Instruction Multiple-Thread (SIMT)

SIMT Warp Execution and Divergence

Managing Threads and Thread Blocks

Thread Instructions

Instruction Set Architecture (ISA)

Memory Access Instructions

Barrier Synchronization for Thread Communication

Streaming Processor (SP)

Special Function Unit (SFU)

Comparing with Other Multiprocessors

Multithreaded Multiprocessor Conclusion

A.5 Parallel Memory System

DRAM Considerations

Caches

MMU

Memory Spaces

Global memory

Shared memory

Local Memory

Constant Memory

Texture Memory

Surfaces

Load/Store Access

ROP

A.6 Floating-point Arithmetic

Supported Formats

Basic Arithmetic

Specialized Arithmetic

Texture Operations

Performance

Double precision

A.7 Real Stuff: The NVIDIA GeForce 8800

Streaming Processor Array (SPA)

Texture/Processor Cluster (TPC)

Streaming Multiprocessor (SM)

Instruction Set

Streaming Processor (SP)

Special Function Unit (SFU)

Rasterization

Raster Operations Processor (ROP) and Memory System

Scalability

Performance

Dense Linear Algebra Performance

FFT Performance

Sorting Performance

A.8 Real Stuff: Mapping Applications to GPUs

Sparse Matrices

Caching in Shared memory

Scan and Reduction

Radix Sort

N-Body Applications on a GPU1

N-Body Mathematics

Optimization for GPU Execution

Using Shared memory

Using Multiple Threads per Body

Performance Comparison

Results

A.9 Fallacies and Pitfalls

A.10 Concluding Remarks

Acknowledgments

A.11 Historical Perspective and Further Reading

Appendix B. Assemblers, Linkers, and the SPIM Simulator

B.1 Introduction

When to Use Assembly Language

Drawbacks of Assembly Language

B.2 Assemblers

Object File Format

Additional Facilities

B.3 Linkers

B.4 Loading

B.5 Memory Usage

B.6 Procedure Call Convention

Procedure Calls

Procedure Call Example

Another Procedure Call Example

B.7 Exceptions and Interrupts

B.8 Input and Output

B.9 SPIM

Simulation of a Virtual Machine

Getting Started with SPIM

Surprising Features

Byte Order

System Calls

B.10 MIPS R2000 Assembly Language

Addressing Modes

Assembler Syntax

Encoding MIPS Instructions

Instruction Format

Addition (with overflow)

Multiply (without overflow)

Arithmetic and Logical Instructions

Absolute value

Addition (with overflow)

Addition (without overflow)

Addition immediate (with overflow)

Addition immediate (without overflow)

AND

AND immediate

Count leading ones

Count leading zeros

Divide (with overflow)

Divide (without overflow)

Divide (with overflow)

Divide (without overflow)

Multiply

Unsigned multiply

Multiply (without overflow)

Multiply (with overflow)

Unsigned multiply (with overflow)

Multiply add

Unsigned multiply add

Multiply subtract

Unsigned multiply subtract

Negate value (with overflow)

Negate value (without overflow)

NOR

NOT

OR immediate

Remainder

Unsigned remainder

Shift left logical

Shift left logical variable

Shift right arithmetic

Shift right arithmetic variable

Shift right logical

Shift right logical variable

Rotate left

Rotate right

Subtract (with overflow)

Subtract (without overflow)

Exclusive OR

XOR immediate

Constant-Manipulating Instructions

Load upper immediate

Load immediate

Comparison Instructions

Set less than

Set less than unsigned

Set less than immediate

Set less than unsigned immediate

Set equal

Set greater than equal

Set greater than equal unsigned

Set greater than

Set greater than unsigned

Set less than equal

Set less than equal unsigned

Set not equal

Branch Instructions

Branch instruction

Branch coprocessor false

Branch coprocessor true

Branch on equal

Branch on greater than equal zero

Branch on greater than equal zero and link

Branch on greater than zero

Branch on less than equal zero

Branch on less than and link

Branch on less than zero

Branch on not equal

Branch on equal zero

Branch on greater than equal

Branch on greater than equal unsigned

Branch on greater than

Branch on greater than unsigned

Branch on less than equal

Branch on less than equal unsigned

Branch on less than

Branch on less than unsigned

Branch on not equal zero

Jump Instructions

Jump

Jump and link

Jump and link register

Jump register

Trap Instructions

Trap if equal

Trap if equal immediate

Trap if not equal

Trap if not equal immediate

Trap if greater equal

Unsigned trap if greater equal

Trap if greater equal immediate

Unsigned trap if greater equal immediate

Trap if less than

Unsigned trap if less than

Trap if less than immediate

Unsigned trap if less than immediate

Load Instructions

Load address

Load byte

Load unsigned byte

Load halfword

Load unsigned halfword

Load word

Load word coprocessor 1

Load word left

Load word right

Load doubleword

Unaligned load halfword

Unaligned load halfword unsigned

Unaligned load word

Load linked

Store Instructions

Store byte

Store halfword

Store word

Store word coprocessor 1

Store double coprocessor 1

Store word left

Store word right

Store doubleword

Unaligned store halfword

Unaligned store word

Store conditional

Data Movement Instructions

Move

Move from hi

Move from lo

Move to hi

Move to lo

Move from coprocessor 0

Move from coprocessor 1

Move double from coprocessor 1

Move to coprocessor 0

Move to coprocessor 1

Move conditional not zero

Move conditional zero

Move conditional on FP false

Floating-Point Instructions

Floating-point absolute value double

Floating-point absolute value single

Floating-point addition double

Floating-point addition single

Floating-point ceiling to word

Compare equal double

Compare equal single

Compare less than equal double

Compare less than equal single

Compare less than double

Compare less than single

Convert single to double

Convert integer to double

Convert double to single

Convert integer to single

Convert double to integer

Convert single to integer

Floating-point divide double

Floating-point divide single

Floating-point floor to word

Load floating-point double

Load floating-point single

Move floating-point double

Move floating-point single

Move conditional floating-point double false

Move conditional floating-point single false

Move conditional floating-point double true

Move conditional floating-point single true

Move conditional floating-point double not zero

Move conditional floating-point single not zero

Move conditional floating-point double zero

Move conditional floating-point single zero

Floating-point multiply double

Floating-point multiply single

Negate double

Negate single

Floating-point round to word

Square root double

Square root single

Store floating-point double

Store floating-point single

Floating-point subtract double

Floating-point subtract single

Floating-point truncate to word

Exception and Interrupt Instructions

Exception return

System call

Break

No operation

B.11 Concluding Remarks

Further Reading

B.12 Exercises

Index

r e h t e g o t ) 4 d n a 3 s n m u l o c ( e d i s m o t t o b d l o F . 2 d r a c e t a r a p e s o t n o i t a r o f r e p g n o l a l l u P . 1 ) ” d r a C n e e r G “ ( d r a C a t a D e c n e r e f e R S P I M M I P S Reference Data CORE INSTRUCTION SET NAME, MNEMONIC add Add Add Immediate Add Imm. Unsigned Add Unsigned And And Immediate addi addiu addu and andi Branch On Equal beq Branch On Not Equal bne Jump Jump And Link Jump Register j jal jr Load Byte Unsigned lbu Load Halfword Unsigned Load Linked Load Upper Imm. Load Word Nor Or Or Immediate Set Less Than Set Less Than Imm. Set Less Than Imm. Unsigned lhu ll lui lw nor or ori slt slti FOR- MAT OPERATION (in Verilog) R R[rd] = R[rs] + R[rt] I R[rt] = R[rs] + SignExtImm I R[rt] = R[rs] + SignExtImm R R[rd] = R[rs] + R[rt] R R[rd] = R[rs] & R[rt] I R[rt] = R[rs] & ZeroExtImm I I if(R[rs]==R[rt]) PC=PC+4+BranchAddr if(R[rs]!=R[rt]) PC=PC+4+BranchAddr PC=JumpAddr J J R[31]=PC+8;PC=JumpAddr R PC=R[rs] I I R[rt]={24’b0,M[R[rs] +SignExtImm](7:0)} R[rt]={16’b0,M[R[rs] +SignExtImm](15:0)} I R[rt] = M[R[rs]+SignExtImm] I R[rt] = {imm, 16’b0} I R[rt] = M[R[rs]+SignExtImm] R R[rd] = ~ (R[rs] | R[rt]) R R[rd] = R[rs] | R[rt] I R[rt] = R[rs] | ZeroExtImm R R[rd] = (R[rs] < R[rt]) ? 1 : 0 I R[rt] = (R[rs] < SignExtImm)? 1 : 0 (2) (2) (3) OPCODE / FUNCT (Hex) (1) 0 / 20 hex (1,2) (2) (3) (4) (4) (5) (5) (2) (2) (2,7) 8 hex 9 hex 0 / 21 0 / 24 hex hex c hex 4 hex 5 hex 2 hex 3 hex 0 / 08 hex 24 hex 25 hex hex hex 30 f 23 hex 0 / 27 0 / 25 hex hex d hex 0 / 2a hex a hex b hex sltiu I R[rt] = (R[rs] < SignExtImm) ? 1 : 0 (2,6) Set Less Than Unsig. Shift Left Logical Shift Right Logical sltu sll srl R R[rd] = (R[rs] < R[rt]) ? 1 : 0 R R[rd] = R[rt] << shamt R R[rd] = R[rt] > > shamt > (6) 0 / 2b 0 / 00 0 / 02 hex hex hex Store Byte Store Conditional Store Halfword Store Word Subtract Subtract Unsigned sb sw sh sc I I I hex hex 29 38 28 (2) sub (2,7) atomic M[R[rs]+SignExtImm](7:0) = R[rt](7:0) M[R[rs]+SignExtImm] = R[rt]; ) ? 1 : 0 R[rt] = ( M[R[rs]+SignExtImm](15:0) = R[rt](15:0) I M[R[rs]+SignExtImm] = R[rt] R R[rd] = R[rs] - R[rt] R R[rd] = R[rs] - R[rt] subu (1) May cause overflow exception (2) SignExtImm = { 16{immediate[15]}, immediate } (3) ZeroExtImm = { 16{1b’0}, immediate } (4) BranchAddr = { 14{immediate[15]}, immediate, 2’b0 } (5) JumpAddr = { PC+4[31:28], address, 2’b0 } (6) Operands considered unsigned numbers (vs. 2 s comp.) (7) Atomic test&set pair; R[rt] = 1 if pair atomic, 0 if not atomic (2) (2) hex (1) 0 / 22 0 / 23 2b hex hex hex ’ BASIC INSTRUCTION FORMATS rt rd shamt funct 21 20 16 15 rt 11 10 6 5 immediate 26 25 rs rs R I J opcode opcode opcode 31 31 31 26 25 21 20 16 15 address 26 25 0 0 0 Copyright 2009 by Elsevier, Inc., All rights reserved. From Patterson and Hennessy, Computer Organization and Design, 4th ed. 1 ARITHMETIC CORE INSTRUCTION SET 2 OPCODE / FMT /FT / FUNCT (Hex) 11/8/1/-- 11/8/0/-- 0/--/--/1a (6) 0/--/--/1b 11/10/--/0 FOR- MAT OPERATION NAME, MNEMONIC Branch On FP True Branch On FP False Divide Divide Unsigned FP Add Single FP Add Double FP Compare Single FP Compare Double * ( is x eq bc1t bc1f div divu add.s add.d c . . x s * c . . x d * op op F[ft]) ? 1 : 0 if(FPcond)PC=PC+4+BranchAddr (4) if(!FPcond)PC=PC+4+BranchAddr(4) {F[ft],F[ft+1]} {F[ft],F[ft+1]}) ? 1 : 0 FI FI R Lo=R[rs]/R[rt]; Hi=R[rs]%R[rt] R Lo=R[rs]/R[rt]; Hi=R[rs]%R[rt] FR F[fd ]= F[fs] + F[ft] FR {F[fd],F[fd+1]} = {F[fs],F[fs+1]} + FR FPcond = (F[fs] FR FPcond = ({F[fs],F[fs+1]} is ==, <, or <=) ( is 32, 3c, or 3e) y op FR F[fd] = F[fs] / F[ft] FR {F[fd],F[fd+1]} = {F[fs],F[fs+1]} / {F[ft],F[ft+1]} FR F[fd] = F[fs] * F[ft] FR {F[fd],F[fd+1]} = {F[fs],F[fs+1]} * FR F[fd]=F[fs] - F[ft] FR {F[fd],F[fd+1]} = {F[fs],F[fs+1]} - {F[ft],F[ft+1]} I F[rt]=M[R[rs]+SignExtImm] F[rt]=M[R[rs]+SignExtImm]; F[rt+1]=M[R[rs]+SignExtImm+4] {F[ft],F[ft+1]} , or lwc1 sub.d sub.s mul.d mul.s div.d ) ( le div.s , lt FP Divide Single FP Divide Double FP Multiply Single FP Multiply Double FP Subtract Single FP Subtract Double Load FP Single Load FP Double Move From Hi Move From Lo Move From Control Multiply Multiply Unsigned Shift Right Arith. Store FP Single Store FP Double FLOATING-POINT INSTRUCTION FORMATS mfhi mflo mfc0 mult multu sra swc1 ldc1 sdc1 I R R[rd] = Hi R R[rd] = Lo R R[rd] = CR[rs] R {Hi,Lo} = R[rs] * R[rt] R {Hi,Lo} = R[rs] * R[rt] R R[rd] = R[rt] >> shamt I M[R[rs]+SignExtImm] = F[rt] I M[R[rs]+SignExtImm] = F[rt]; M[R[rs]+SignExtImm+4] = F[rt+1] 11/11/--/0 11/10/--/ y 11/11/--/ y 11/10/--/3 11/11/--/3 11/10/--/2 11/11/--/2 11/10/--/1 11/11/--/1 (2) 31/--/--/-- (2) 35/--/--/-- 0 /--/--/10 0 /--/--/12 10 /0/--/0 0/--/--/18 (6) 0/--/--/19 0/--/--/3 (2) 39/--/--/-- (2) 3d/--/--/-- FR opcode 26 25 opcode FI 31 31 fmt fmt 21 20 ft ft fs fd funct 16 15 11 10 6 5 immediate 26 25 21 20 16 15 0 0 PSEUDOINSTRUCTION SET NAME Branch Less Than Branch Greater Than Branch Less Than or Equal Branch Greater Than or Equal Load Immediate Move MNEMONIC OPERATION blt bgt ble bge li move if(R[rs]R[rt]) PC = Label if(R[rs]<=R[rt]) PC = Label if(R[rs]>=R[rt]) PC = Label R[rd] = immediate R[rd] = R[rs] REGISTER NAME, NUMBER, USE, CALL CONVENTION NAME NUMBER USE $zero $at $v0-$v1 $a0-$a3 $t0-$t7 $s0-$s7 $t8-$t9 $k0-$k1 $gp $sp $fp $ra 0 1 2-3 4-7 8-15 16-23 24-25 26-27 28 29 30 31 The Constant Value 0 Assembler Temporary Values for Function Results and Expression Evaluation Arguments Temporaries Saved Temporaries Temporaries Reserved for OS Kernel Global Pointer Stack Pointer Frame Pointer Return Address PRESERVED ACROSS A CALL? N.A. No No No No Yes No No Yes Yes Yes Yes

(2) Binary funct (5:0) movz.f movn.f funct (5:0) sll Hexa- deci- mal (1) MIPS (2) MIPS mult multu div divu sync mfhi mthi mflo mtlo srl sra sllv srlv srav jr j jal beq bne blez bgtz addi addiu jalr slti movz sltiu movn andi ori xori lui add.f sub.f mul.f div.f sqrt.f abs.f mov.f neg.f OPCODES, BASE CONVERSION, ASCII SYMBOLS MIPS ASCII opcode Char- (31:26) acter 0 NUL (1) 1 SOH 2 STX 3 ETX 4 EOT 5 ENQ 6 ACK 7 BEL 8 BS 9 HT a LF b VT c FF d CR e SO SI f 10 DLE 11 DC1 12 DC2 13 DC3 14 DC4 15 NAK 16 SYN 17 ETB 18 CAN 19 EM 1a SUB 1b ESC 1c FS 1d GS 1e RS 1f US 20 Space ! 21 " 22 # 23 24 $ 25 % 26 & ’ 27 ( 28 ) 29 * 2a + 2b , 2c - 2d 2e . / 2f 0 30 1 31 2 32 3 33 4 34 35 5 6 36 7 37 8 38 9 39 : 3a ; 3b < 3c 3d = > 3e 3f ? 00 0000 00 0001 00 0010 00 0011 00 0100 00 0101 00 0110 00 0111 00 1000 00 1001 00 1010 00 1011 syscall round.w.f 00 1100 trunc.w.f 00 1101 break ceil.w.f 00 1110 floor.w.f 00 1111 01 0000 01 0001 01 0010 01 0011 01 0100 01 0101 01 0110 01 0111 01 1000 01 1001 01 1010 01 1011 01 1100 01 1101 01 1110 01 1111 10 0000 10 0001 10 0010 10 0011 10 0100 10 0101 10 0110 10 0111 10 1000 10 1001 10 1010 10 1011 10 1100 10 1101 10 1110 10 1111 11 0000 c.f.f 11 0001 c.un.f 11 0010 c.eq.f 11 0011 c.ueq.f 11 0100 c.olt.f 11 0101 c.ult.f 11 0110 c.ole.f 11 0111 c.ule.f c.sf.f 11 1000 c.ngle.f 11 1001 11 1010 c.seq.f 11 1011 c.ngl.f 11 1100 c.lt.f c.nge.f 11 1101 11 1110 c.le.f c.ngt.f 11 1111 Deci- mal 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 add addu sub subu and or xor nor swr cache ll lwc1 lwc2 pref lb lh lwl lw lbu lhu lwr tge tgeu tlt tltu teq cvt.s.f cvt.d.f sb sh swl sw ldc1 ldc2 sc swc1 swc2 sdc1 sdc2 slt sltu cvt.w.f tne 3 Hexa- deci- mal ASCII Char- acter 40 @ A 41 B 42 C 43 D 44 E 45 46 F G 47 H 48 I 49 J 4a K 4b 4c L 4d M 4e N O 4f P 50 Q 51 R 52 S 53 T 54 55 U 56 V 57 W X 58 Y 59 Z 5a [ 5b \ 5c ] 5d 5e ^ _ 5f ‘ 60 a 61 b 62 c 63 d 64 e 65 66 f g 67 h 68 i 69 j 6a k 6b l 6c m 6d 6e n o 6f p 70 q 71 r 72 s 73 t 74 75 u v 76 w 77 x 78 y 79 z 7a { 7b | 7c 7d } 7e ~ 7f DEL Deci- mal 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 (1) opcode(31:26) == 0 (2) opcode(31:26) == 17ten (11hex); if fmt(25:21)==16ten (10hex) f = s (single); if fmt(25:21)==17ten (11hex) f = d (double) IEEE 754 FLOATING-POINT STANDARD (-1)S × (1 + Fraction) × 2(Exponent - Bias) where Single Precision Bias = 127, Double Precision Bias = 1023. IEEE Single Precision and Double Precision Formats: S 31 S 63 30 62 Exponent 23 22 Exponent MEMORY ALLOCATION $sp 7fff fffchex 52 51 Stack $gp 1000 8000hex 1000 0000hex pc 0040 0000hex Dynamic Data Static Data Text 0hex Reserved DATA ALIGNMENT 4 IEEE 754 Symbols Exponent Fraction 0 0 Object ± 0 ± Denorm 0 ≠ 0 1 to MAX - 1 anything ± Fl. Pt. Num. MAX MAX 0 ≠ 0 ±∞ NaN S.P. MAX = 255, D.P. MAX = 2047 Fraction Fraction STACK FRAME ... Argument 6 Argument 5 Saved Registers Local Variables $fp $sp 0 0 Higher Memory Addresses Stack Grows Lower Memory Addresses Double Word Word Halfword Byte Halfword Byte Byte Byte Byte Halfword Byte Word Halfword Byte Byte 0 1 2 3 4 5 6 7 Value of three least significant bits of byte address (Big Endian) EXCEPTION CONTROL REGISTERS: CAUSE AND STATUS B D 31 Interrupt Mask Exception Code 15 8 6 2 Pending Interrupt U M 4 E L 1 I E 0 15 8 BD = Branch Delay, UM = User Mode, EL = Exception Level, IE =Interrupt Enable EXCEPTION CODES Number Name Number Name 0 4 5 6 7 8 Int AdEL AdES IBE DBE Sys Cause of Exception Interrupt (hardware) Address Error Exception (load or instruction fetch) Address Error Exception (store) Bus Error on Instruction Fetch Bus Error on Load or Store Syscall Exception 9 10 11 12 13 15 Cause of Exception Breakpoint Exception Reserved Instruction Exception Coprocessor Unimplemented Arithmetic Overflow Exception Trap Bp RI CpU Ov Tr FPE Floating Point Exception SIZE PREFIXES (10x for Disk, Communication; 2x for Memory) SIZE PRE- SIZE FIX 103, 210 Kilo- 1015, 250 106, 220 Mega- 1018, 260 109, 230 Giga- 1012, 240 Tera- PRE- FIX Peta- Exa- 1021, 270 Zetta- 1024, 280 Yotta- PRE- PRE- FIX FIX SIZE SIZE 10-3 milli- 10-15 femto- 10-6 micro- 10-18 atto- 10-21 10-9 zepto- 10-24 yocto- 10-12 nano- pico- The symbol for each prefix is just its first letter, except μ is used for micro. Copyright 2009 by Elsevier, Inc., All rights reserved. From Patterson and Hennessy, Computer Organization and Design, 4th ed. r e h t e g o t ) 4 d n a 3 s n m u l o c ( e d i s m o t t o b d l o F . 2 d r a c e t a r a p e s o t n o i t a r o f r e p g n o l a l l u P . 1 ) ” d r a C n e e r G “ ( d r a C a t a D e c n e r e f e R S P I M

In Praise of Computer Organization and Design: The Hardware/ Software Interface, Revised Fourth Edition “Patterson and Hennessy not only improve the pedagogy of the traditional mate- rial on pipelined processors and memory hierarchies, but also greatly expand the multiprocessor coverage to include emerging multicore processors and GPUs. The fourth edition of Computer Organization and Design sets a new benchmark against which all other architecture books must be compared.” —David A. Wood, University of Wisconsin-Madison “Patterson and Hennessy have greatly improved what was already the gold stan- dard of textbooks. In the rapidly evolving field of computer architecture, they have woven an impressive number of recent case studies and contemporary issues into a framework of time-tested fundamentals.” —Fred Chong, University of California at Santa Barbara “Since the publication of the first edition in 1994, Computer Organization and Design has introduced a generation of computer science and engineering students to computer architecture. Now, many of those students have become leaders in the field. In academia, the tradition continues as faculty use the latest edition of the book that inspired them to engage the next generation. With the fourth edition, readers are prepared for the next era of computing.” —David I. August, Princeton University “The new coverage of multiprocessors and parallelism lives up to the standards of this well-written classic. It provides well-motivated, gentle introductions to the new topics, as well as many details and examples drawn from current hardware.” —John Greiner, Rice University “As computer hardware architecture moves from uniprocessor to multicores, the parallel programming environments used to take advantage of these cores will be a defining challenge to the success of these new systems. In the multicore systems, the interface between the hardware and software is of particular importance. This new edition of Computer Organization and Design is mandatory for any student who wishes to understand multicore architecture including the interface between programming it and its architecture.” —Jesse Fang, Director of Programming System Lab at Intel “The fourth edition of Computer Organization and Design continues to improve the high standards set by the previous editions. The new content, on trends that are reshaping computer systems including multicores, Flash memory, GPUs, etc., makes this edition a must read—even for all of those who grew up on previous editions of the book.” —Parthasarathy Ranganathan, Principal Research Scientist, HP Labs

This page intentionally left blank

R E V I S E D F O U R T H E D I T I O N Computer Organization and Design T H E H A R D W A R E / S O F T W A R E I N T E R F A C E

A C K N O W L E D G M E N T S Figures 1.7, 1.8 Courtesy of Other World Computing (www.macsales.com). Figure 1.10.6 Courtesy of the Computer History Museum. Figures 1.9, 1.19, 5.37 Courtesy of AMD. Figures 5.12.1, 5.12.2 Courtesy of Museum of Science, Boston. Figure 1.10 Courtesy of Storage Technology Corp. Figure 5.12.4 Courtesy of MIPS Technologies, Inc. Figures 1.10.1, 1.10.2, 4.15.2 Courtesy of the Charles Babbage Institute, University of Minnesota Libraries, Minneapolis. Figures 6.15, 6.16, 6.17 Courtesy of Sun Microsystems, Inc. Figure 6.4 © Peg Skorpinski. Figures 1.10.3, 4.15.1, 4.15.3, 5.12.3, 6.14.2 Courtesy of IBM. Figure 6.14.1 Courtesy of the Computer Museum of America. Figure 1.10.4 Courtesy of Cray Inc. Figure 6.14.3 Courtesy of the Commercial Computing Museum. Figure 1.10.5 Courtesy of Apple Computer, Inc. Figures 7.13.1 Courtesy of NASA Ames Research Center.

R E V I S E D F O U R T H E D I T I O N Computer Organization and Design T H E H A R D W A R E / S O F T W A R E I N T E R F A C E David A. Patterson University of California, Berkeley John L. Hennessy Stanford University With contributions by Perry Alexander The University of Kansas Peter J. Ashenden Ashenden Designs Pty Ltd David Kaeli Northeastern University Kevin Lim Hewlett-Packard Nicole Kaiyan University of Adelaide John Nickolls NVIDIA Javier Bruguera Universidade de Santiago de Compostela David Kirk NVIDIA John Oliver Cal Poly, San Luis Obispo Jichuan Chang Hewlett-Packard Matthew Farrens University of California, Davis James R. Larus Microsoft Research Jacob Leverich Hewlett-Packard Milos Prvulovic Georgia Tech Partha Ranganathan Hewlett-Packard AMSTERDAM • BOSTON • HEIDELBERG • LONDON NEW YORK • OXFORD • PARIS • SAN DIEGO SAN FRANCISCO • SINGAPORE • SYDNEY • TOKYO Morgan Kaufmann is an imprint of Elsevier

分享到：

赞收藏

资料库

Computer Organization and Design 4th Edition.pdf

相关推荐

课程资源

热门标签

最新资料