logo资料库

Parallel Computer Organization and Design-Cambridge University Press (2012).pdf

第1页 / 共562页
第2页 / 共562页
第3页 / 共562页
第4页 / 共562页
第5页 / 共562页
第6页 / 共562页
第7页 / 共562页
第8页 / 共562页
资料共562页,剩余部分请下载后查看
Cover
Parallel Computer Organization and Design
Title
Copyright
Contents
Preface
Book outline
Acknowledgments
Michel Dubois
Murali Annavaram
Per Stenström
1 Introduction
1.1 WHAT IS COMPUTER ARCHITECTURE?
1.2 COMPONENTS OF A PARALLEL ARCHITECTURE
1.2.1 Processors
1.2.2 Memory
1.2.3 Interconnects
1.3 PARALLELISM IN ARCHITECTURES
1.3.1 Instruction-level parallelism (ILP)
1.3.2 Thread-level parallelism (TLP)
1.3.3 Vector and array processors
1.4 PERFORMANCE
1.4.1 Benchmarking
1.4.2 Amdahl's law
1.5 TECHNOLOGICAL CHALLENGES
1.5.1 Power and energy
1.5.2 Reliability
1.5.3 Wire delays
1.5.4 Design complexity
1.5.5 Limits of miniaturization and the CMOS endpoint
2 Impact of technology
2.1 CHAPTER OVERVIEW
2.2 BASIC LAWS OF ELECTRICITY
2.2.1 Ohm's law
2.2.2 Resistors
2.2.3 Capacitors
2.3 THE MOSFET TRANSISTOR AND CMOS INVERTER
2.4 TECHNOLOGY SCALING
2.5 POWER AND ENERGY
2.5.1 Dynamic power
2.5.2 Static power
2.5.3 Power and energy metrics
2.6 RELIABILITY
2.6.1 Faults versus errors
2.6.2 Reliability metrics
2.6.3 Failure rate and burn-in
2.6.4 Transient faults
2.6.5 Intermittent faults
2.6.6 Permanent faults
2.6.7 Process variations and their impact on faults
3 Processor architecture
3.1 CHAPTER OVERVIEW
3.2 INSTRUCTION SET ARCHITECTURE
3.2.1 Instruction types and opcodes
3.2.2 Instruction mixes
3.2.3 Instruction operands
3.2.4 Exceptions, traps, and interrupts
3.2.5 Memory-consistency model
3.2.6 Core ISA used in this book
3.2.7 CISC vs. RISC
3.3 STATICALLY SCHEDULED PIPELINES
3.3.1 The classic 5-stage pipeline
3.3.2 Out-of-order instruction completion
3.3.3 Superpipelined and superscalar CPUs
3.3.4 Branch prediction
3.3.5 Static instruction scheduling
3.3.6 Strengths and weaknesses of static pipelines
3.4 DYNAMICALLY SCHEDULED PIPELINES
3.4.1 Enforcing data dependencies: Tomasulo algorithm
3.4.2 Speculative execution: execution beyond unresolved branches
3.4.3 Dynamic branch prediction
3.4.4 Adding speculation to the Tomasulo algorithm
3.4.5 Dynamic memory disambiguation
3.4.6 Explicit register renaming
3.4.7 Register fetch after instruction issue
3.4.8 Speculative instruction scheduling
3.4.9 Beating the data-flow limit: value prediction
3.4.10 Multiple instructions per clock
3.4.11 Dealing with complex ISAs
3.5 VLIW MICROARCHITECTURES
3.5.1 Duality of dynamic and static techniques
3.5.2 VLIW architecture
3.5.3 Loop unrolling
3.5.4 Software pipelining
3.5.5 Non-cyclic VLIW scheduling
3.5.6 Predicated instructions
3.5.7 Speculative memory disambiguation
3.5.8 Exceptions
3.6 EPIC MICROARCHITECTURES
3.7 VECTOR MICROARCHITECTURES
3.7.1 Arithmetic/logic vector instructions
3.7.2 Memory vector instructions
3.7.3 Vector strip mining and chaining
3.7.4 Conditional statements
3.7.5 Scatter and gather
4 Memory hierarchies
4.1 CHAPTER OVERVIEW
4.2 THE PYRAMID OF MEMORY LEVELS
4.2.1 Memory-access locality
4.2.2 Memory hierarchy coherence
4.2.3 Memory inclusion
4.3 CACHE HIERARCHY
4.3.1 Cache mapping and organization
4.3.2 Replacement policies
4.3.3 Write policies
4.3.4 Cache hierarchy performance
4.3.5 Classification of cache misses
4.3.6 Non-blocking (lockup-free) caches
4.3.7 Cache prefetching and preloading
4.4 VIRTUAL MEMORY
4.4.1 Motivations for virtual memory
4.4.2 Operating system's view of virtual memory
4.4.3 Virtual address translation
4.4.4 Memory-access control
4.4.5 Hierarchical page tables
4.4.6 Inverted page table
4.4.7 Translation lookaside buffer
4.4.8 Virtual-address caches with physical tags
4.4.9 Virtual-address caches with virtual tags
5 Multiprocessor systems
5.1 CHAPTER OVERVIEW
5.2 PARALLEL-PROGRAMMING MODEL ABSTRACTIONS
5.2.1 Shared-memory systems
5.2.2 Message-passing systems
5.3 MESSAGE-PASSING MULTIPROCESSOR SYSTEMS
5.3.1 Message-passing primitives
5.3.2 Message-passing protocols
5.3.3 Hardware support for message-passing protocols
5.4 BUS-BASED SHARED-MEMORY SYSTEMS
5.4.1 Multiprocessor cache organizations
5.4.2 A simple snoopy cache protocol
5.4.3 Design space of snoopy cache protocols
5.4.4 Protocol variations
5.4.5 Design issues for multi-phase snoopy cache protocols
5.4.6 Classification of communication events
5.4.7 Translation lookaside buffer (TLB) consistency
5.5 SCALABLE SHARED-MEMORY SYSTEMS
5.5.1 Directory protocols: concepts and terminology
5.5.2 Implementation of a directory protocol
5.5.3 Scalability of directory protocols
5.5.4 Hierarchical systems
5.5.5 Page migration and replication
5.6 CACHE-ONLY SHARED-MEMORY SYSTEMS
5.6.1 Basic concepts, hardware structures, and protocols
5.6.2 Flat COMA
6 Interconnection networks
6.1 CHAPTER OVERVIEW
6.2 DESIGN SPACE OF INTERCONNECTION NETWORKS
6.2.1 Overview of design concepts
6.2.2 Latency and bandwidth models
6.3 SWITCHING STRATEGIES
6.4 TOPOLOGIES
6.4.1 Indirect networks
6.4.2 Direct networks
6.5 ROUTING TECHNIQUES
6.5.1 Routing algorithms
6.5.2 Deadlock avoidance and deterministic routing
6.5.3 Relaxing routing restrictions: virtual channels and the turn model
6.5.4 Relaxing routing further: adaptive routing
6.6 SWITCH ARCHITECTURE
7 Coherence, synchronization, and memory consistency
7.1 CHAPTER OVERVIEW
7.2 BACKGROUND
7.2.1 Shared-memory communication model
7.2.2 Hardware components
7.3 COHERENCE AND STORE ATOMICITY
7.3.1 Why is coherence in multiprocessors so hard?
7.3.2 Cache protocols
7.3.3 Store atomicity
7.3.4 Plain coherence
7.3.5 Store atomicity and memory interleaving
7.4 SEQUENTIAL CONSISTENCY
7.4.1 Formal model for sequential consistency
7.4.2 Access ordering rules for sequential consistency
7.4.3 Inbound message management
7.4.4 Store synchronization
7.5 SYNCHRONIZATION
7.5.1 Basic synchronization primitives
7.5.2 Hardware-based synchronization
7.5.3 Software-based synchronization
7.6 RELAXED MEMORY-CONSISTENCY MODELS
7.6.1 Relaxed models not relying on synchronization
7.6.2 Relaxed models relying on synchronization
7.7 SPECULATIVE VIOLATIONS OF MEMORY ORDERS
7.7.1 Conservative memory model enforcement in OoO processors
7.7.2 Speculative violations of memory orders
8 Chip multiprocessors
8.1 CHAPTER OVERVIEW
8.2 RATIONALE BEHIND CMPS
8.2.1 Technological trends
8.2.2 Opportunities
8.3 CORE MULTI-THREADING
8.3.1 Software-supported multi-threading
8.3.2 Hardware-supported multi-threading
8.3.3 Block (coarse-grain) multi-threading
8.3.4 Interleaved (fine-grain) multi-threading
8.3.5 Simultaneous multi-threading in OoO processors
8.4 CHIP MULTIPROCESSOR ARCHITECTURES
8.4.1 Homogeneous CMP architectures
8.4.2 CMPs with heterogeneous cores
8.4.3 Conjoined cores
8.5 PROGRAMMING MODELS
8.5.1 Independent processes
8.5.2 Explicit thread parallelization
8.5.3 Transactional memory
8.5.4 Thread-level speculation
8.5.5 Helper threads
8.5.6 Redundant execution to improve reliability
9 Quantitative evaluations
9.1 CHAPTER OVERVIEW
9.2 TAXONOMY OF SIMULATORS
9.2.1 User-level versus full-system simulators
9.2.2 Functional versus cycle-accurate simulators
9.2.3 Trace-driven, execution-driven and direct-execution simulators
9.3 INTEGRATING SIMULATORS
9.3.1 Functional-first simulator integration
9.3.2 Timing-first simulator integration
9.4 MULTIPROCESSOR SIMULATORS
9.4.1 Sequential multiprocessor simulators
9.4.2 Parallel multiprocessor simulators
9.5 POWER AND THERMAL SIMULATIONS
9.6 WORKLOAD SAMPLING
9.6.1 Sampling microarchitecture simulation
9.6.2 SimPoint
9.7 WORKLOAD CHARACTERIZATION
9.7.1 Understanding performance bottlenecks
9.7.2 Synthetic benchmarks
9.7.3 Projecting workload behavior
Index
#
A
B
C
D
E
F
G
H
I
J
K
L
M
N
O
P
Q
R
S
T
U
V
W
X
Z
Parallel Computer Organization and Design Teaching fundamental design concepts and the challenges of emerging technology, this textbook prepares students for a career designing the computer systems of the future. In-depth coverage of complexity, power, reliability, and performance, coupled with treatment of parallelism at all levels, including ILP and TLP, provides the state-of-the-art training that students need. The whole gamut of parallel architecture design options is explained, from core microarchitecture to chip multiprocessors to large-scale multiprocessor systems. All the chapters are self-contained, yet concise enough that the material can be taught in a couple of semesters, making it perfect for use in senior undergraduate and graduate computer architecture courses. The book is also teeming with practical examples to aid the learning process, showing concrete applications of definitions. With simple models and codes used throughout, all material is accessible to a broad range of computer engineering/science students with only a basic knowledge of hardware and software. Michel Dubois is a Professor in the Ming Hsieh Department of Electrical Engineering at the University of Southern California (USC) and part of the Computer Engineering Directorate. Before joining USC in 1984, he was a research engineer at the Central Research Laboratory of Thomson-CSF in Orsay, France. He has published more than 150 technical papers on computer architecture and has edited two books. He is a Fellow of the IEEE and of the ACM. Murali Annavaram is an Associate Professor and Robert G. and Mary G. Lane Early Career Chair in the Ming Hsieh Department of Electrical Engineering at USC and part of the Computer Engineering Directorate, where he has developed and taught advanced computer architecture courses. Prior to USC, he spent 6 years at Intel researching various aspects of future CMP designs. Per Stenstr ¨om is a Professor of Computer Engineering at Chalmers University of Technology, Sweden. He has published two textbooks and over 100 technical papers. He has been a visiting scientist at Carnegie-Mellon, Stanford, and USC, and also was engaged in research at Sun Microsystems on its chip multi-threading technology. He is a Fellow of the IEEE and of the ACM, and is a member of the Royal Swedish Academy of Engineering Sciences and the Academia Europaea.
“Parallel computers and multicore architectures are rapidly gaining importance because the performance of a single core is not improving at the same historical level. Professors Dubois, Annavaram, and Stenstrom have created an easily readable book on the intricacies of parallel architecture design that academicians and practitioners alike will find extremely useful.” Shubu Mukherjee, Cavium, Inc. “The book can help the readers to understand the principles of parallel systems crystally clear. A necessary book to read for the designers of parallel systems.” Yunji Chen, Institute of Computing Technology, Chinese Academy of Sciences “All future electronic systems will comprise of a built-in microprocessor, consequently the importance of computer architecture will surge. This book provides an excellent tutorial of computer architecture fundamentals from the basic technology via processor and memory architecture to chip multiprocessors. I found the book very educationally flow and readable – an excellent instructive book worth using.” Uri Weiser, Technion “This book really fulfils the need to understand the basic technological on-chip features and constraints in connection with their impact on computer architecture design choices. All computing systems students and developers should first master these single and multi core foundations in a platform independent way, as this comprehensive text does.” Mateo Valero, BSC “After the drastic shift towards multi-cores that processor architecture has experienced in the past few years, the domain was in dire need of a comprehensive and up-to-date book on the topic. Michel, Murali, and Per have crafted an excellent textbook which can serve both as an introduction to multi-core and parallel architectures, as well as a reference for engineers and researchers.” Olivier Temam, INRIA, France “Parallel Computer Organization and Design fills an urgent need for a comprehensive and authoritative yet approachable tutorial and reference text for advanced computer architecture topics. All of the key principles and concepts covered in Wisconsin’s three-course computer architecture sequence are addressed in a well-organized, thoughtful, and pedagogically appealing manner, without overwhelming the reader with distracting trivia or an excess of quantitative data. In particular, the coverage of chip multiprocessors in Chapter 8 is fully up to date with the state of the art in industry practice, while the final chapter on quantitative evaluation – a true gem! – is a unique and valuable asset that will clearly set this book apart from its competition.” Mikko Lipasti, University of Wisonsin-Madison “The book contains in-depth coverage of all the aspects of the computer systems. It is comprehensive, systematic, and in sync with the latest development in the field. The skillfully organized book uses self-contained chapters to allow the readers get a complete understanding of a topic without wandering through the whole book. Its content is rich, coherent and clear. Its questions are crafted to stimulate creative thinking. I recommend the book as a must read to all graduate students and young researchers and engineers designing the computers.” Lixin Zhang, Institute of Computing Technology, Chinese Academy of Sciences
“. . . parallel architectures are the key for high performance and high efficiency computing systems. This book tells the story of parallel architecture at all levels – from the single transistor to the full blown CMP – an unforgettable journey!” Ronny Ronen, Intel “Multicore chips have made parallel architectures ubiquitous and their understanding a necessity. This text provides a comprehensive treatment of parallel system architecture and the fundamentals of cache coherence and memory consistency in the most compact form to date. This is a perfect text for a one semester graduate course.” Lawrence Rauchwerger, Texas A&M University “It is the best of today’s books on the subject, and I plan to use it in my class. It is an up-to-date picture of parallel computing that is written in a style that is clear and accessible.” Trevor Mudge, Bredt Family Professor of Computer Engineering, University of Michigan “Parallelism, at multiple levels and in many different forms, is now a necessity for all future computer systems, and the new generation of computer scientists and engineers have to master it. To understand the complex interactions among the hundreds of existing ideas, options, and choices, one has to categorize them, put them in order, and then synthesize them. That is precisely what Dubois, Annavaram, and Stenstr¨om do, in a magnificent way, in this extremely contemporary and timely book. I want to particularly stress the uniquely clear way in which the authors explain the hardest among these topics: coherence, synchronization, and memory consistency.” Manolis Katevenis, Professor of Computer Science, University of Crete “This book is a truly comprehensive treatment of parallel computers, from some of the top experts in the field. Well grounded in technology yet remaining very accessible, it also includes important but often overlooked topics such reliability, power, and simulation.” Norm Jouppi, HP “This text takes a fresh cut at traditional computer architecture topics and considers basic principles from the perspective of multi-core and parallel systems. The need for such a high quality textbook written from this perspective is overdue, and the authors of this text have done a good job in organizing and revamping topics to provide the next generation of computer architects with the basic principles they will need to design multi-core and many-core systems.” David Kaeli, Director of the NU Computer Architecture Research Laboratory, NEU “An excellent book in an area that has long cried out for tutorial material – it will be an indispensable resource to students and educators in parallel computer architecture.” Josep Torrellas, University of Illinois
Parallel Computer Organization and Design MICHEL DUBOIS University of Southern California, USA MURALI ANNAVARAM University of Southern California, USA PER STENSTR ¨OM Chalmers University of Technology, Sweden
cambridge university press Cambridge, New York, Melbourne, Madrid, Cape Town, Singapore, S˜ao Paulo, Delhi, Mexico City Cambridge University Press The Edinburgh Building, Cambridge CB2 8RU, UK Published in the United States of America by Cambridge University Press, New York www.cambridge.org Information on this title: www.cambridge.org/9780521886758 C Cambridge University Press 2012 This publication is in copyright. Subject to statutory exception and to the provisions of relevant collective licensing agreements, no reproduction of any part may take place without the written permission of Cambridge University Press. First published 2012 Printed in the United Kingdom at the University Press, Cambridge A catalog record for this publication is available from the British Library Library of Congress Cataloging in Publication data Dubois, Michel, 1953– Parallel computer organization and design / Michel Dubois, Murali Annavaram, Per Stenström. cm pages Includes index. ISBN 978-0-521-88675-8 1. Parallel computers. 2. Computer organization. II. Stenström, Per. QA76.5.D754 005.2 2012010634 75 – dc23 III. Title. 2012 I. Annavaram, Murali. ISBN 978-0-521-88675-8 Hardback Additional resources for this publication at www.cambridge.org/9780521886758 Cambridge University Press has no responsibility for the persistence or accuracy of URLs for external or third-party internet websites referred to in this publication, and does not guarantee that any content on such websites is, or will remain, accurate or appropriate.
分享到:
收藏