Computer Organization and Design The Hardware/Software Interface....pdf

发布时间：2022-06-08 发布人：admin 分类：说明书资料大小：33.94M 资料格式：pdf 举报版权申诉

liyinuolinjunjie-9436101-4744302543346598293.pdf-第1页.png

第1页 / 共793页

liyinuolinjunjie-9436101-4744302543346598293.pdf-第2页.png

第2页 / 共793页

liyinuolinjunjie-9436101-4744302543346598293.pdf-第3页.png

第3页 / 共793页

liyinuolinjunjie-9436101-4744302543346598293.pdf-第4页.png

第4页 / 共793页

liyinuolinjunjie-9436101-4744302543346598293.pdf-第5页.png

第5页 / 共793页

liyinuolinjunjie-9436101-4744302543346598293.pdf-第6页.png

第6页 / 共793页

liyinuolinjunjie-9436101-4744302543346598293.pdf-第7页.png

第7页 / 共793页

liyinuolinjunjie-9436101-4744302543346598293.pdf-第8页.png

第8页 / 共793页

Front Cover

Computer Organization and Design

Acknowledgments

Contents

Preface

About This Book

About the Other Book

Changes for the Fifth Edition

Concluding Remarks

Acknowledgments for the Fifth Edition

1 Computer Abstractions and Technology

1.1 Introduction

Classes of Computing Applications and Their Characteristics

Welcome to the PostPC Era

What You Can Learn in This Book

1.2 Eight Great Ideas in Computer Architecture

Design for Moore’s Law

Use Abstraction to Simplify Design

Make the Common Case Fast

Performance via Parallelism

Performance via Pipelining

Performance via Prediction

Hierarchy of Memories

Dependability via Redundancy

1.3 Below Your Program

From a High-Level Language to the Language of Hardware

1.4 Under the Covers

Through the Looking Glass

Touchscreen

Opening the Box

A Safe Place for Data

Communicating with Other Computers

1.5 Technologies for Building Processors and Memory

1.6 Performance

Defining Performance

Measuring Performance

CPU Performance and Its Factors

Instruction Performance

The Classic CPU Performance Equation

1.7 The Power Wall

1.8 The Sea Change: The Switch from Uniprocessors to Multiprocessors

1.9 Real Stuff: Benchmarking the Intel Core i7

SPEC CPU Benchmark

SPEC Power Benchmark

1.10 Fallacies and Pitfalls

1.11 Concluding Remarks

Road Map for This Book

1.12 Historical Perspective and Further Reading

1.13 Exercises

2 Instructions: Language of the Computer

2.1 Introduction

2.2 Operations of the Computer Hardware

2.3 Operands of the Computer Hardware

Memory Operands

Constant or Immediate Operands

2.4 Signed and Unsigned Numbers

Summary

2.5 Representing Instructions in the Computer

MIPS Fields

2.6 Logical Operations

2.7 Instructions for Making Decisions

Loops

Case/Switch Statement

2.8 Supporting Procedures in Computer Hardware

Using More Registers

Nested Procedures

Allocating Space for New Data on the Stack

Allocating Space for New Data on the Heap

2.9 Communicating with People

Characters and Strings in Java

2.10 MIPS Addressing for 32-bit Immediates and Addresses

32-Bit Immediate Operands

Addressing in Branches and Jumps

MIPS Addressing Mode Summary

Decoding Machine Language

2.11 Parallelism and Instructions: Synchronization

2.12 Translating and Starting a Program

Compiler

Assembler

Linker

Loader

Dynamically Linked Libraries

Starting a Java Program

2.13 A C Sort Example to Put It All Together

The Procedure swap

Code for the Body of the Procedure swap

The Full swap Procedure

The Procedure sort

Code for the Body of the Procedure sort

The Procedure Call in sort

Passing Parameters in sort

Preserving Registers in sort

The Full Procedure sort

2.14 Arrays versus Pointers

Array Version of Clear

Pointer Version of Clear

Comparing the Two Versions of Clear

2.15 Advanced Material: Compiling C and Interpreting Java

2.16 Real Stuff: ARMv7 (32-bit) Instructions

Addressing Modes

Compare and Conditional Branch

Unique Features of ARM

2.17 Real Stuff: x86 Instructions

Evolution of the Intel x86

x86 Registers and Data Addressing Modes

x86 Integer Operations

x86 Instruction Encoding

x86 Conclusion

2.18 Real Stuff: ARMv8 (64-bit) Instructions

2.19 Fallacies and Pitfalls

2.20 Concluding Remarks

2.21 Historical Perspective and Further Reading

2.22 Exercises

3 Arithmetic for Computers

3.1 Introduction

3.2 Addition and Subtraction

Summary

3.3 Multiplication

Sequential Version of the Multiplication Algorithm and Hardware

Signed Multiplication

Faster Multiplication

Multiply in MIPS

Summary

3.4 Division

A Division Algorithm and Hardware

Signed Division

Faster Division

Divide in MIPS

Summary

3.5 Floating Point

Floating-Point Representation

Floating-Point Addition

Floating-Point Multiplication

Floating-Point Instructions in MIPS

Accurate Arithmetic

Summary

3.6 Parallelism and Computer Arithmetic: Subword Parallelism

3.7 Real Stuff: Streaming SIMD Extensions and Advanced Vector Extensions in x86

3.8 Going Faster: Subword Parallelism and Matrix Multiply

3.9 Fallacies and Pitfalls

3.10 Concluding Remarks

3.11 Historical Perspective and Further Reading

3.12 Exercises

4 The Processor

4.1 Introduction

A Basic MIPS Implementation

An Overview of the Implementation

Clocking Methodology

4.2 Logic Design Conventions

4.3 Building a Datapath

Creating a Single Datapath

4.4 A Simple Implementation Scheme

The ALU Control

Designing the Main Control Unit

Operation of the Datapath

Finalizing Control

Why a Single-Cycle Implementation Is Not Used Today

4.5 An Overview of Pipelining

Designing Instruction Sets for Pipelining

Pipeline Hazards

Hazards

Data Hazards

Control Hazards

Pipeline Overview Summary

4.6 Pipelined Datapath and Control

Graphically Representing Pipelines

Pipelined Control

4.7 Data Hazards: Forwarding versus Stalling

Data Hazards and Stalls

4.8 Control Hazards

Assume Branch Not Taken

Reducing the Delay of Branches

Dynamic Branch Prediction

Pipeline Summary

4.9 Exceptions

How Exceptions Are Handled in the MIPS Architecture

Exceptions in a Pipelined Implementation

4.10 Parallelism via Instructions

The Concept of Speculation

Static Multiple Issue

An Example: Static Multiple Issue with the MIPS ISA

Dynamic Multiple-Issue Processors

Dynamic Pipeline Scheduling

Energy Efficiency and Advanced Pipelining

4.11 Real Stuff: The ARM Cortex-A8 and Intel Core i7 Pipelines

The ARM Cortex-A8

The Intel Core i7 920

Performance of the Intel Core i7 920

4.12 Going Faster: Instruction-Level Parallelism and Matrix Multiply

4.13 Advanced Topic: an Introduction to Digital Design Using a Hardware Design Language to Describe and Model a Pipeline and Mo…

4.14 Fallacies and Pitfalls

4.15 Concluding Remarks

4.16 Historical Perspective and Further Reading

4.17 Exercises

5 Large and Fast: Exploiting Memory Hierarchy

5.1 Introduction

5.2 Memory Technologies

SRAM Technology

DRAM Technology

Flash Memory

Disk Memory

5.3 The Basics of Caches

Accessing a Cache

Handling Cache Misses

Handling Writes

An Example Cache: The Intrinsity FastMATH Processor

Summary

5.4 Measuring and Improving Cache Performance

Reducing Cache Misses by More Flexible Placement of Blocks

Locating a Block in the Cache

Choosing Which Block to Replace

Reducing the Miss Penalty Using Multilevel Caches

Software Optimization via Blocking

Summary

5.5 Dependable Memory Hierarchy

Defining Failure

The Hamming Single Error Correcting, Double Error Detecting Code (SEC/DED)

5.6 Virtual Machines

Requirements of a Virtual Machine Monitor

(Lack of) Instruction Set Architecture Support for Virtual Machines

Protection and Instruction Set Architecture

5.7 Virtual Memory

Placing a Page and Finding it Again

Page Faults

What about Writes?

Making Address Translation Fast: the TLB

The Intrinsity FastMATH TLB

Integrating Virtual Memory, TLBs, and Caches

Implementing Protection with Virtual Memory

Handling TLB Misses and Page Faults

Summary

5.8 A Common Framework for Memory Hierarchy

Question 1: Where Can a Block Be Placed?

Question 2: How is a Block Found?

Question 3: Which Block Should Be Replaced on a Cache Miss?

Question 4: What Happens on a Write?

The Three Cs: An Intuitive Model for Understanding the Behavior of Memory Hierarchies

5.9 Using a Finite-State Machine to Control a Simple Cache

A Simple Cache

Finite-State Machines

FSM for a Simple Cache Controller

5.10 Parallelism and Memory Hierarchy: Cache Coherence

Basic Schemes for Enforcing Coherence

Snooping Protocols

5.11 Parallelism and Memory Hierarchy: Redundant Arrays of Inexpensive Disks

5.12 Advanced Material: Implementing Cache Controllers

5.13 Real Stuff: The ARM Cortex-A8 and Intel Core i7 Memory Hierarchies

Performance of the A8 and Core i7 Memory Hierarchies

5.14 Going Faster: Cache Blocking and Matrix Multiply

5.15 Fallacies and Pitfalls

5.16 Concluding Remarks

5.17 Historica Perspective and Further Reading

5.18 Exercises

6 Parallel Processors from Client to Cloud

6.1 Introduction

6.2 The Difficulty of Creating Parallel Processing Programs

6.3 SISD, MIMD, SIMD, SPMD, and Vector

SIMD in x86: Multimedia Extensions

Vector

Vector versus Scalar

Vector versus Multimedia Extensions

6.4 Hardware Multithreading

6.5 Multicore and Other Shared Memory Multiprocessors

6.6 Introduction to Graphics Processing Units

An Introduction to the NVIDIA GPU Architecture

NVIDIA GPU Memory Structures

Putting GPUs into Perspective

6.7 Clusters, Warehouse Scale Computers, and Other Message-Passing Multiprocessors

Warehouse-Scale Computers

6.8 Introduction to Multiprocessor Network Topologies

Implementing Network Topologies

6.9 Communicating to the Outside World: Cluster Networking

6.10 Multiprocessor Benchmarks and Performance Models

Performance Models

The Roofline Model

Comparing Two Generations of Opterons

6.11 Real Stuff: Benchmarking and Rooflines of the Intel Core i7 960 and the NVIDIA Tesla GPU

6.12 Going Faster: Multiple Processors and Matrix Multiply

6.13 Fallacies and Pitfalls

6.14 Concluding Remarks

6.15 Historical Perspective and Further Reading

6.16 Exercises

Appendix A: Assemblers, Linkers, and the SPIM Simulator

A.1 Introduction

A.2 Assemblers

A.3 Linkers

A.4 Loading

A.5 Memory Usage

A.6 Procedure Call Convention

A.7 Exceptions and Interrupts

A.8 Input and Output

A.9 SPIM

A.10 MIPS R2000 Assembly Language

A.11 Concluding Remarks

A.12 Exercises

Appendix B: The Basics of Logic Design

B.1 Introduction

B.2 Gates, Truth Tables, and Logic Equations

B.3 Combinational Logic

B.4 Using a Hardware Description Language

B.5 Constructing a Basic Arithmetic Logic Unit

B.6 Faster Addition: Carry Lookahead

B.7 Clocks

B.8 Memory Elements: Flip-Flops, Latches, and Registers

B.9 Memory Elements: SRAMs and DRAMs

B.10 Finite-State Machines

B.11 Timing Methodologies

B.12 Field Programmable Devices

B.13 Concluding Remarks

B.14 Exercises

Index

In Praise of Computer Organization and Design: The Hardware/ Software Interface, Fifth Edition “Textbook selection is o ft en a frustrating act of compromise—pedagogy, content coverage, quality of exposition, level of rigor, cost. Computer Organization and Design is the rare book that hits all the right notes across the board, without compromise. It is not only the premier computer organization textbook, it is a shining example of what all computer science textbooks could and should be.” —Michael Goldweber, Xavier University “I have been using Computer Organization and Design for years, from the very fi rst edition. Th e new Fift h Edition is yet another outstanding improvement on an already classic text. Th e evolution from desktop computing to mobile computing to Big Data brings new coverage of embedded processors such as the ARM, new material on how soft ware and hardware interact to increase performance, and cloud computing. All this without sacrifi cing the fundamentals.” —Ed Harcourt, St. Lawrence University “To Millennials: Computer Organization and Design is the computer architecture book you should keep on your (virtual) bookshelf. Th e book is both old and new, because it develops venerable principles—Moore's Law, abstraction, common case fast, redundancy, memory hierarchies, parallelism, and pipelining—but illustrates them with contemporary designs, e.g., ARM Cortex A8 and Intel Core i7.” —Mark D. Hill, University of Wisconsin-Madison “ Th e new edition of Computer Organization and Design keeps pace with advances in emerging embedded and many-core (GPU) systems, where tablets and smartphones will are quickly becoming our new desktops. Th is text acknowledges these changes, but continues to provide a rich foundation of the fundamentals in computer organization and design which will be needed for the designers of hardware and soft ware that power this new class of devices and systems.” —Dave Kaeli, Northeastern University “ Th e Fift h Edition of Computer Organization and Design provides more than an introduction to computer architecture. It prepares the reader for the changes necessary to meet the ever-increasing performance needs of mobile systems and big data processing at a time that diffi culties in semiconductor scaling are making all systems power constrained. In this new era for computing, hardware and soft ware must be co- designed and system-level architecture is as critical as component-level optimizations.” —Christos Kozyrakis, Stanford University “Patterson and Hennessy brilliantly address the issues in ever-changing computer hardware architectures, emphasizing on interactions among hardware and soft ware components at various abstraction levels. By interspersing I/O and parallelism concepts with a variety of mechanisms in hardware and soft ware throughout the book, the new edition achieves an excellent holistic presentation of computer architecture for the PostPC era. Th is book is an essential guide to hardware and soft ware professionals facing energy effi ciency and parallelization challenges in Tablet PC to cloud computing.” —Jae C. Oh, Syracuse University

This page intentionally left blank

F I F T H E D I T I O N Computer Organization and Design T H E H A R D W A R E / S O F T W A R E I N T E R F A C E

David A. Patterson has been teaching computer architecture at the University of California, Berkeley, since joining the faculty in 1977, where he holds the Pardee Chair of Computer Science. His teaching has been honored by the Distinguished Teaching Award from the University of California, the Karlstrom Award from ACM, and the Mulligan Education Medal and Undergraduate Teaching Award from IEEE. Patterson received the IEEE Technical Achievement Award and the ACM Eckert-Mauchly Award for contributions to RISC, and he shared the IEEE Johnson Information Storage Award for contributions to RAID. He also shared the IEEE John von Neumann Medal and the C & C Prize with John Hennessy. Like his co-author, Patterson is a Fellow of the American Academy of Arts and Sciences, the Computer History Museum, ACM, and IEEE, and he was elected to the National Academy of Engineering, the National Academy of Sciences, and the Silicon Valley Engineering Hall of Fame. He served on the Information Technology Advisory Committee to the U.S. President, as chair of the CS division in the Berkeley EECS department, as chair of the Computing Research Association, and as President of ACM. Th is record led to Distinguished Service Awards from ACM and CRA. At Berkeley, Patterson led the design and implementation of RISC I, likely the fi rst VLSI reduced instruction set computer, and the foundation of the commercial SPARC architecture. He was a leader of the Redundant Arrays of Inexpensive Disks (RAID) project, which led to dependable storage systems from many companies. He was also involved in the Network of Workstations (NOW) project, which led to cluster technology used by Internet companies and later to cloud computing. Th ese projects earned three dissertation awards from ACM. His current research projects are Algorithm-Machine-People and Algorithms and Specializers for Provably Optimal Implementations with Resilience and Effi ciency. Th e AMP Lab is developing scalable machine learning algorithms, warehouse-scale-computer-friendly programming models, and crowd-sourcing tools to gain valuable insights quickly from big data in the cloud. Th e ASPIRE Lab uses deep hardware and soft ware co-tuning to achieve the highest possible performance and energy effi ciency for mobile and rack computing systems. John L. Hennessy is the tenth president of Stanford University, where he has been a member of the faculty since 1977 in the departments of electrical engineering and computer science. Hennessy is a Fellow of the IEEE and ACM; a member of the National Academy of Engineering, the National Academy of Science, and the American Philosophical Society; and a Fellow of the American Academy of Arts and Sciences. Among his many awards are the 2001 Eckert-Mauchly Award for his contributions to RISC technology, the 2001 Seymour Cray Computer Engineering Award, and the 2000 John von Neumann Award, which he shared with David Patterson. He has also received seven honorary doctorates. In 1981, he started the MIPS project at Stanford with a handful of graduate students. Aft er completing the project in 1984, he took a leave from the university to cofound MIPS Computer Systems (now MIPS Technologies), which developed one of the fi rst commercial RISC microprocessors. As of 2006, over 2 billion MIPS microprocessors have been shipped in devices ranging from video games and palmtop computers to laser printers and network switches. Hennessy subsequently led the DASH (Director Architecture for Shared Memory) project, which prototyped the fi rst scalable cache coherent multiprocessor; many of the key ideas have been adopted in modern multiprocessors. In addition to his technical activities and university responsibilities, he has continued to work with numerous start-ups both as an early-stage advisor and an investor.

F I F T H E D I T I O N Computer Organization and Design T H E H A R D W A R E / S O F T W A R E I N T E R F A C E David A. Patterson University of California, Berkeley John L. Hennessy Stanford University With contributions by Perry Alexander Th e University of Kansas Peter J. Ashenden Ashenden Designs Pty Ltd Jason D. Bakos University of South Carolina Javier Bruguera Universidade de Santiago de Compostela Jichuan Chang Hewlett-Packard Matthew Farrens University of California, Davis David Kaeli Northeastern University Nicole Kaiyan University of Adelaide David Kirk NVIDIA James R. Larus School of Computer and Communications Science at EPFL Jacob Leverich Hewlett-Packard Kevin Lim Hewlett-Packard John Nickolls NVIDIA John Oliver Cal Poly, San Luis Obispo Milos Prvulovic Georgia Tech Partha Ranganathan Hewlett-Packard AMSTERDAM • BOSTON • HEIDELBERG • LONDON NEW YORK • OXFORD • PARIS • SAN DIEGO SAN FRANCISCO • SINGAPORE • SYDNEY • TOKYO Morgan Kaufmann is an imprint of Elsevier

Acquiring Editor: Todd Green Development Editor: Nate McFadden Project Manager: Lisa Jones Designer: Russell Purdy Morgan Kaufmann is an imprint of Elsevier Th e Boulevard, Langford Lane, Kidlington, Oxford, OX5 1GB 225 Wyman Street, Waltham, MA 02451, USA Copyright © 2014 Elsevier Inc. All rights reserved No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording, or any information storage and retrieval system, without permission in writing from the publisher. Details on how to seek permission, further information about the Publisher’s permissions policies and our arrangements with organizations such as the Copyright Clearance Center and the Copyright Licensing Agency, can be found at our website: www.elsevier.com/permissions Th is book and the individual contributions contained in it are protected under copyright by the Publisher (other than as may be noted herein). Notices Knowledge and best practice in this fi eld are constantly changing. As new research and experience broaden our understanding, changes in research methods or professional practices, may become necessary. Practitioners and researchers must always rely on their own experience and knowledge in evaluating and using any information or methods described herein. In using such information or methods they should be mindful of their own safety and the safety of others, including parties for whom they have a professional responsibility. To the fullest extent of the law, neither the publisher nor the authors, contributors, or editors, assume any liability for any injury and/ or damage to persons or property as a matter of products liability, negligence or otherwise, or from any use or operation of any methods, products, instructions, or ideas contained in the material herein. Library of Congress Cataloging-in-Publication Data Patterson, David A. Computer organization and design: the hardware/soft ware interface/David A. Patterson, John L. Hennessy. — 5th ed. Rev. ed. of: Computer organization and design/John L. Hennessy, David A. Patterson. 1998. Summary: “Presents the fundamentals of hardware technologies, assembly language, computer arithmetic, pipelining, memory hierarchies and I/O”— Provided by publisher. ISBN 978-0-12-407726-3 (pbk.) 1. Computer organization. 2. Computer engineering. 3. Computer interfaces. I. Hennessy, John L. II. Hennessy, John L. Computer organization and design. III. Title. British Library Cataloguing-in-Publication Data A catalogue record for this book is available from the British Library ISBN: 978-0-12-407726-3 p. cm. — (Th e Morgan Kaufmann series in computer architecture and design) For information on all MK publications visit our website at www.mkp.com Printed and bound in the United States of America 13 14 15 16 10 9 8 7 6 5 4 3 2 1

To Linda, who has been, is, and always will be the love of my life

分享到：

赞收藏

资料库

Computer Organization and Design The Hardware/Software Interface....pdf

相关推荐

课程资源

热门标签

最新资料