Parallel Computer Architecture： A Hardware / Software Approach.pdf

发布时间：2022-05-30 发布人：admin 分类：说明书资料大小：4.00M 资料格式：pdf 举报版权申诉

wxzs18130-2955861-4744300845198498812.pdf-第1页.png

第1页 / 共877页

wxzs18130-2955861-4744300845198498812.pdf-第2页.png

第2页 / 共877页

wxzs18130-2955861-4744300845198498812.pdf-第3页.png

第3页 / 共877页

wxzs18130-2955861-4744300845198498812.pdf-第4页.png

第4页 / 共877页

wxzs18130-2955861-4744300845198498812.pdf-第5页.png

第5页 / 共877页

wxzs18130-2955861-4744300845198498812.pdf-第6页.png

第6页 / 共877页

wxzs18130-2955861-4744300845198498812.pdf-第7页.png

第7页 / 共877页

wxzs18130-2955861-4744300845198498812.pdf-第8页.png

第8页 / 共877页

Título

Prefacio

Índice

Capítulo 1

Chapter - CHAPTER 1 Introduction

1Heading - 1.1 Introduction

1Heading - 1.2 Why Parallel Architecture

Figure - Figure 1-1 Performance trends over time of micro, ...

2Heading - 1.2.1 Application Trends

Equation - .

3Heading - Scientific and Engineering Computing

Figure - Figure 1-2 Grand Challenge Application Requirement...

Figure - Figure 1-3 Speedup on Three Versions of a Parallel...

3Heading - Commercial Computing

Figure - Figure 1-4 TPC-C throughput versus number of proce...

Example - Example 1-1 The tpmC for the Tandem Himalaya and I...

2Heading - 1.2.2 Technology Trends

Figure - Figure 1-5 Improvement in logic density and clock ...

2Heading - 1.2.3 Architectural Trends

Figure - Figure 1-6 Number of transistors per processor chi...

Figure - Figure 1-7 Distribution of potential instruction-l...

Figure - Figure 1-8 Number of processors in fully configure...

Figure - Figure 1-9 Bandwidth of the shared memory bus in c...

2Heading - 1.2.4 Supercomputers

Figure - Figure 1-10 Uniprocessor performance of supercompu...

Figure - Figure 1-11 Performance of supercomputers and MPPs...

Figure - Figure 1-12 Type of systems used in 500 fastest co...

2Heading - 1.2.5 Summary

1Heading - 1.3 Convergence of Parallel Architectures

2Heading - 1.3.1 Communication Architecture

Figure - Figure 1-13 Layers of Abstraction in Parallel Comp...

2Heading - 1.3.2 Shared Memory

Figure - Figure 1-14 Typical memory model for shared-memory...

Figure - Figure 1-15 Extending a system into a shared-memor...

Figure - Figure 1-16 Typical Shared-Memory Multiprocessor I...

Figure - Figure 1-17 Physical and logical organization of t...

Figure - Figure 1-18 Physical and logical organization of t...

Figure - Figure 1-19 Non-uniform Memory Access (NUMA) Scala...

Figure - Figure 1-20 Cray T3E Scalable Shared Address Space...

2Heading - 1.3.3 Message-Passing

Figure - Figure 1-21 User level send/receive message passin...

Figure - Figure 1-22 Typical structure of an early message ...

Figure - Figure 1-23 IBM SP-2 Message Passing Machine

Figure - Figure 1-24 Intel Paragon

2Heading - 1.3.4 Convergence

2Heading - 1.3.5 Data Parallel Processing

Figure - Figure 1-25 Typical organization of a data paralle...

2Heading - 1.3.6 Other Parallel Architectures

3Heading - Dataflow Architecture

Figure - Figure 1-26 Dataflow graph and basic execution pip...

3Heading - Systolic Architectures

Figure - Figure 1-27 Systolic Array computation of an inner...

2Heading - 1.3.7 A Generic Parallel Architecture

Figure - Figure 1-28 Generic scalable multiprocessor organi...

1Heading - 1.4 Fundamental Design Issues

2Heading - 1.4.1 Communication Abstraction

2Heading - 1.4.2 Programming Model Requirements

2Heading - 1.4.3 Naming

3Heading - Operations

2Heading - 1.4.4 Ordering

2Heading - 1.4.5 Communication and Replication

2Heading - 1.4.6 Performance

Example - Example 1-2 Suppose a component can perform a spec...

3Heading - Data Transfer Time

Equation -

Equation - .

3Heading - Overhead and Occupancy

Equation -

3Heading - Communication Cost

Equation -

3Heading - Summary

1Heading - 1.5 Concluding Remarks

1Heading - 1.6 References

1Heading - 1.7 Exercises

Exercise - 1.1 Compute the annual growth rate in number of tr...

Exercise - 1.2 Compute the annual performance growth rates fo...

TableTitle - Table 1-1 Performance of leading workstations

Exercise - 1.3 Suppose you are given a program which does a f...

Exercise - 1.4 Given a histogram of available parallelism suc...

Exercise - 1.5 Locate the current TPC performance data on the...

Exercise - 1.6 In message passing models each process is prov...

Exercise - 1.7 To move an message along links in an unloaded ...

Exercise - 1.8 Consider a simple 2D finite difference scheme ...

ExerciseStep - All the values are 64-bit floating point numbers. ...

Exercise - 1.9 Consider the simple pipelined component descri...

Exercise - 1.10 Show that Equation1.4 follows from Equation...

Exercise - 1.11 What is the x-intercept of the line in Equati...

Exercise - 1.12 Suppose we have a machine with a 64-bit wide ...

Exercise - 1.13 Suppose this 32-byte line is transferred to a...

Exercise - 1.14 Suppose we have a machine with a message star...

Exercise - 1.15 Derive a general formula for the “half-power ...

Exercise - 1.16 Assuming that before transmitting a message t...

Exercise - 1.17 Consider a machine running at 100 MIPS on som...

Exercise - 1.18 Assume the instruction mix and miss rate as i...

Exercise - 1.19 Of course, no matter how many processors you ...

Capítulo 2

Parallel Computer Architecture A Hardware / Software Approach David Culler University of California, Berkeley Jaswinder Pal Singh Princeton University with Anoop Gupta Stanford University Morgan Kaufmann is pleased to present material from a preliminary draft of Parallel Computer Architecture; the material is (c) Copyright 1997 Morgan Kaufmann Publishers. This material may not be used or distributed for any commercial purpose without the express written consent of Morgan Kaufmann Publishers. Please note that this material is a draft of forthcoming publication, and as such neither Morgan Kaufmann nor the authors can be held liable for changes or alterations in the ﬁnal edition. 8/28/97 DRAFT: Parallel Computer Architecture 1

2 DRAFT: Parallel Computer Architecture 8/28/97

Preface Morgan Kaufmann is pleased to present material from a preliminary draft of Parallel Computer Architecture; the material is (c) Copyright 1997 Morgan Kaufmann Publishers. This material may not be used or distributed for any commercial purpose without the express written consent of Morgan Kaufmann Publishers. Please note that this material is a draft of forthcoming publication, and as such neither Morgan Kaufmann nor the authors can be held liable for changes or alterations in the ﬁnal edition. Motivation for the Book Parallel computing is a critical component of the computing technology of the 90s, and it is likely to have as much impact over the next twenty years as microprocessors have had over the past twenty. Indeed, the two technologies are closely linked, as the evolution of highly integrated microprocessors and memory chips is making multiprocessor systems increasingly attractive. Already multiprocessors represent the high performance end of almost every segment of the computing market, from the fastest supercomputers, to departmental compute servers, to the indi- vidual desktop. In the past, computer vendors employed a range of technologies to provide increasing performance across their product line. Today, the same state-of-the-art microprocessor is used throughout. To obtain a signiﬁcant range of performance, the simplest approach is to increase the number of processors, and the economies of scale makes this extremely attractive. Very soon, several processors will ﬁt on a single chip. Although parallel computing has a long and rich academic history, the close coupling with com- modity technology has fundamentally changed the discipline. The emphasis on radical architec- tures and exotic technology has given way to quantitative analysis and careful engineering trade- offs. Our goal in writing this book is to equip designers of the emerging class of multiprocessor systems, from modestly parallel personal computers to massively parallel supercomputers, with an understanding of the fundamental architectural issues and the available techniques for addressing design trade-offs. At the same time, we hope to provide designers of software systems for these machines with an understanding of the likely directions of architectural evolution and the forces that will determine the speciﬁc path that hardware designs will follow. The most exciting recent development in parallel computer architecture is the convergence of tra- ditionally disparate approaches, namely shared-memory, message-passing, SIMD, and dataﬂow, on a common machine structure. This is driven partly by common technological and economic forces, and partly by a better understanding of parallel software. This convergence allows us to focus on the overriding architectural issues and to develop a common framework in which to understand and evaluate architectural trade-offs. Moreover, parallel software has matured to the point where the popular parallel programming models are available on a wide range of machines and meaningful benchmarks exists. This maturing of the ﬁeld makes it possible to undertake a quantitative, as well as qualitative study of hardware/software interactions. In fact, it demands such an approach. The book follows a set of issues that are critical to all parallel architectures – communication latency, communication bandwidth, and coordination of cooperative work - across the full range of modern designs. It describes the set of techniques available in hardware and in software to address each issue and explores how the various techniques interact. Case studies provide a concrete illustration of the general principles and demonstrate speciﬁc interac- tions between mechanisms. Our ﬁnal motivation comes from the current lack of an adequate text book for our own courses at Stanford, Berkeley, and Princeton. Many existing text books cover the material in a cursory fash- ion, summarizing various architectures and research results, but not analyzing them in depth. Others focus on speciﬁc projects, but fail to recognize the principles that carry over to alternative approaches. The research reports in the area provide sizable body of empirical data, but it has not yet been distilled into a coherent picture. By focusing on the salient issues in the context of the technological convergence, rather than the rich and varied history that brought us to this point, we hope to provide a deeper and more coherent understanding of the ﬁeld. Intended Audience We believe the subject matter of this book is core material and should be relevant to graduate stu- dents and practicing engineers in the ﬁelds of computer architecture, systems software, and appli- cations. The relevance for computer architects is obvious, given the growing importance of multiprocessors. Chip designers must understand what constitutes a viable building block for multiprocessor systems, while computer system designers must understand how best to utilize modern microprocessor and memory technology in building multiprocessors. Systems software, including operating systems, compilers, programming languages, run-time systems, performance debugging tools, will need to address new issues and will provide new opportunities in parallel computers. Thus, an understanding of the evolution and the forces guid- ing that evolution is critical. Researchers in compilers and programming languages have 8/29/97 DRAFT: Parallel Computer Architecture 3 4 DRAFT: Parallel Computer Architecture 8/29/97

addressed aspects of parallel computing for some time. However, the new convergence with com- modity technology suggests that these aspects may need to be reexamined and perhaps addressed in very general terms. The traditional boundaries between hardware, operating system, and user program are also shifting in the context of parallel computing, where communication, schedul- ing, sharing, and resource management are intrinsic to the program. Applications areas, such as computer graphics and multimedia, scientiﬁc computing, computer aided design, decision support and transaction processing, are all likely to see a tremendous transformation as a result of the vast computing power available at low cost through parallel com- puting. However, developing parallel applications that are robust and provide good speed-up across current and future multiprocessors is a challenging task, and requires a deep understand- ing of forces driving parallel computers. The book seeks to provide this understanding, but also to stimulate the exchange between the applications ﬁelds and computer architecture, so that bet- ter architectures can be designed --- those that make the programming task easier and perfor- mance more robust. Organization of the Book The book is organized into twelve chapters. Chapter 1 begins with the motivation why parallel architectures are inevitable based on technology, architecture, and applications trends. It then brieﬂy introduces the diverse multiprocessor architectures we ﬁnd today (shared-memory, mes- sage-passing, data parallel, dataﬂow, and systolic), and it shows how the technology and architec- tural trends tell a strong story of convergence in the ﬁeld. The convergence does not mean the end to innovation, but on the contrary, it implies that we will now see a time of rapid progress in the ﬁeld, as designers start talking each other. Given this convergence, the last portion of the chapter introduces the fundamaental design issues for multiprocessors: naming, synchronization, latency, and bandwidth. These four issues form an underlying theme throughout the rest of this book. The chapter ends with a historical perspective, poviding a glimpse into the diverse and rich history of the ﬁeld. each other rather than past to Chapter 2 provides a brief introduction to the process of parallel programming and what are the basic components of popular programming models. It is intended to ensure that the reader has a clear understanding of hardware/software trade-offs, as well as what aspects of performance can be addressed through architectural means and what aspects much be addressed either by the com- piler or the programmer in providing to the hardware a well designed parallel program. The anal- ogy in sequential computing is that architecture cannot transform an algorithm into an ( algorithm, but it can improve the average access time for common memory reference O n patterns. The brief discussion of programming issues is not likely to turn you into an expert par- allel programmer, if you are not already, but it will familiarize you with the issues and what pro- grams look like in various programming models. Chapter 3 outlines the basic techniques used in programming for performance and presents a collection of application case studies, that serve as a basis for quantitative evaluation of design trade-offs throughout the book. O n2( log ) ) n Chapter 4 takes up the challenging task of performing a solid empirical evaluation of design trade-offs. Architectural evaluation is difﬁcult even for modern uni-processors where we typi- cally look at variations in pipeline design or memory system design against a ﬁxed set of pro- grams. In parallel architecture we have many more degrees of freedom to explore, the interactions between aspects of the design are more profound, the interactions between hardware and software are more signiﬁcant and of a wider scope. In general, we are looking at performance as the machine and the program scale. There is no way to scale one without the other example. Chapter 3 discusses how scaling interacts with various architectural parameters and presents a set of benchmarks that are used throughout the later chapters. Chapters 5 and 6 provide a complete understanding of the bus-based multiprocessors, SMPs, that form the bread-and-butter of modern commercial machines beyond the desktop, and even to some extent on the desktop. Chapter 5 presents the logical design of “snooping” bus protocols which ensure that automatically replicated data is conherent across multiple caches. This chapter provides an important discussion of memory consistency models, which allows us to come to terms with what shared memory really means to algorithm designers. It discusses the spectrum of design options and how machines are optimized against typical reference patterns occuring in user programs and in the operating system. Given this conceptual understanding of SMPs, it reﬂects on implications for parallel programming. Chapter 6 examines the physical design of bus-based multiprocessors. Itdigs down into the engi- neering issues that arise in supporting modern microprocessors with multilevel caches on modern busses, which are highly pipelined. Although some of this material is contained in more casual treatments of multiprocessor architecture, the presentation here provides a very complete under- standing of the design issues in this regime. It is especially important because these small-scale designs form a building block for large-scale designs and because many of the concepts will reappear later in the book on a larger scale with a broader set of concerns. Chapter 7 presents the hardware organization and architecture of a range of machines that are scalable to large or very large conﬁgurations. The key organizational concept is that of a network transaction, analogous to the bus transaction, which is the fundamental primitive for the designs in Chapters 5 and 6. However, in large scale machines the global information and global arbitra- tion of small-scale designs is lost. Also, a large number of transactions can be outstanding. We show how conventional programming models are realized in terms of network transactions and then study a spectrum of important design points, organized according to the level of direct hard- ware interpretation of the network transaction, including detailed case studies of important com- mercial machines. Chapter 8 puts the results of the previous chapters together to demonstrate how to realize a global shared physical address space with automatic replication on a large scale. It provides a complete treatment of directory based cache coherence protocols and hardware design alternatives. Chapter 9 examines a spectrum of alternatives that push the boundaries of possible hardware/ software trade-offs to obtain higher performance, to reduce hardware complexity, or both. It looks at relaxed memory consistency models, cache-only memory architectures, and software based cache coherency. This material is currently in the transitional phase from academic research to commercial product at the time of writing. Chapter 10 addresses the design of scable high-performance communication networks, which underlies all the large scale machines discussed in previous chapters, but was deferred in order to complete our understanding of the processor, memory system, and network interface design which drive these networks. The chapter builds a general framework for understanding where hardware costs, transfer delays, and bandwidth restrictions arise in networks. We then look at a variety of trade-offs in routing techniques, switch design, and interconnection topology with 8/29/97 DRAFT: Parallel Computer Architecture 5 6 DRAFT: Parallel Computer Architecture 8/29/97

respect to these cost-performance metrics. These trade-offs are made concrete through case stud- ies of recent designs. Given the foundation established by the ﬁrst ten chapters, Chapter 11 examines a set of cross-cut- ting issues involved in tolerating signiﬁcant communication delays without impeding perfor- mance. The techniques essentially exploit two basic capabilities: overlapping communication with useful computation and pipelinng the communication of a volume of data. The simplest of these are essentially bulk transfers, which pipeline the movement of a large regular sequence of data items and often can be off-loaded from the processor. The other techniques attempt to hide the latency incured in collections of individual loads and stores. Write latencies are hidden by exploiting weak consistency models, which recognize that ordering is convey by only a small set of the accesses to shared memory in a program. Read latencies are hidden by implicit or explicit prefetching of data, or by look-ahead techniques in modern dynamically scheduled processors. The chapter provides a thorough examination of these alternatives, the impact on compilation techniques, and quantitative evaluation of the effectiveness. Finally, Chapter 12 examines the trends in technology, architecture, software systems and applications that are likely to shape evo- lution of the ﬁeld. We believe parallel computer architecture is an exciting core ﬁeld whose importance is growing; it has reached a point of maturity that a textbook makes sense. From a rich diversity of ideas and approaches, there is now a dramatic convergence happening in the ﬁeld. It is time to go beyond surveying the machine landscape, to an understanding of fundamental design principles. We have intimately participated in the convergence of the ﬁeld; this text arises from this experience of ours and we hope it conveys some of the excitement that we feel for this dynamic and growing area. 8/29/97 DRAFT: Parallel Computer Architecture 7 8 DRAFT: Parallel Computer Architecture 8/29/97

CHAPTER 1 CHAPTER 2 1.1 Introduction 1.2 Why Parallel Architecture 1.3 Convergence of Parallel Architectures Introduction ..................................................................................19 ..............................................................................................................................19 ........................................................................................................22 1.2.1 Application Trends ...........................................................................................................23 1.2.2 Technology Trends ...........................................................................................................30 1.2.3 Architectural Trends.........................................................................................................32 1.2.4 Supercomputers................................................................................................................36 1.2.5 Summary ..........................................................................................................................38 .....................................................................................40 1.3.1 Communication Architecture ...........................................................................................40 1.3.2 Shared Memory ................................................................................................................44 1.3.3 Message-Passing ..............................................................................................................50 1.3.4 Convergence .....................................................................................................................53 1.3.5 Data Parallel Processing...................................................................................................56 1.3.6 Other Parallel Architectures .............................................................................................59 1.3.7 A Generic Parallel Architecture .......................................................................................62 ......................................................................................................63 1.4.1 Communication Abstraction.............................................................................................64 1.4.2 Programming Model Requirements .................................................................................64 1.4.3 Naming .............................................................................................................................66 1.4.4 Ordering ...........................................................................................................................67 1.4.5 Communication and Replication......................................................................................68 1.4.6 Performance .....................................................................................................................69 ................................................................................................................73 ................................................................................................................................76 ...................................................................................................................................85 1.5 Concluding Remarks 1.6 References 1.7 Exercises 1.4 Fundamental Design Issues 2.3 The Parallelization Process Parallel Programs 2.1 Introduction 2.2 Parallel Application Case Studies .........................................................................89 ..............................................................................................................................89 .............................................................................................90 2.2.1 Simulating Ocean Currents ..............................................................................................91 2.2.2 Simulating the Evolution of Galaxies ..............................................................................93 2.2.3 Visualizing Complex Scenes using Ray Tracing..............................................................94 2.2.4 Mining Data for Associations...........................................................................................94 .......................................................................................................95 2.3.1 Steps in the Process ..........................................................................................................96 2.3.2 Parallelizing Computation versus Data ..........................................................................102 2.3.3 Goals of the Parallelization Process ...............................................................................103 2.4 Parallelization of an Example Program ..................................................................................104 2.4.1 A Simple Example: The Equation Solver Kernel...........................................................104 2.4.2 Decomposition ...............................................................................................................105 2.4.3 Assignment.....................................................................................................................110 2.4.4 Orchestration under the Data Parallel Model .................................................................111 2.4.5 Orchestration under the Shared Address Space Model ..................................................111 9/4/97 DRAFT: Parallel Computer Architecture 9

CHAPTER 3 2.4.6 Orchestration under the Message Passing Model...........................................................119 ..............................................................................................................125 ...............................................................................................................................126 .................................................................................................................................127 2.5 Concluding Remarks 2.6 References 2.7 Exercises Programming for Performance 3.4 Orchestration for Performance 3.1 Introduction 3.2 Partitioning for Performance ...................................................131 .............................................................................................................................131 ..................................................................................................133 3.2.1 Load Balance and Synchronization Wait Time ..............................................................134 3.2.2 Reducing Inherent Communication................................................................................140 3.2.3 Reducing the Extra Work................................................................................................144 3.2.4 Summary.........................................................................................................................144 3.3 Data Access and Communication in a Multi-Memory System ..............................................145 3.3.1 A Multiprocessor as an Extended Memory Hierarchy ...................................................145 3.3.2 Artifactual Communication in the Extended Memory Hierarchy ..................................147 ...............................................................................................149 3.4.1 Reducing Artifactual Communication............................................................................149 3.4.2 Structuring Communication to Reduce Cost..................................................................156 ..........................................................161 ....................................................164 3.6.1 Ocean..............................................................................................................................165 3.6.2 Barnes-Hut......................................................................................................................170 3.6.3 Raytrace..........................................................................................................................175 3.6.4 Data Mining....................................................................................................................179 ...................................................................................182 ..............................................................................................................188 ...............................................................................................................................189 .................................................................................................................................191 3.7 Implications for Programming Models 3.8 Concluding Remarks 3.9 References 3.10 Exercises 3.5 Performance Factors from the Processors’ Perspective 3.6 The Parallel Application Case Studies: An In-Depth Look CHAPTER 4 Workload-Driven Evaluation 4.1 Introduction 4.2 Scaling Workloads and Machines ......................................................195 .............................................................................................................................195 ...........................................................................................198 4.2.1 Why Worry about Scaling?.............................................................................................199 4.2.2 Key Issues in Scaling......................................................................................................202 4.2.3 Scaling Models ...............................................................................................................202 4.2.4 Scaling Workload Parameters.........................................................................................208 .....................................................................................................209 4.3.1 Performance Isolation using Microbenchmarks.............................................................209 4.3.2 Choosing Workloads.......................................................................................................211 4.3.3 Evaluating a Fixed-Size Machine...................................................................................214 4.3.4 Varying Machine Size.....................................................................................................220 4.3.5 Choosing Performance Metrics ......................................................................................220 4.3 Evaluating a Real Machine 10 DRAFT: Parallel Computer Architecture 9/4/97

CHAPTER 5 5.3 Memory Consistency 5.4 Design Space for Snooping Protocols 5.6 Synchronization 4.4 Evaluating an Architectural Idea or Tradeoff 4.3.6 Presenting Results ..........................................................................................................224 .........................................................................225 4.4.1 Multiprocessor Simulation .............................................................................................226 4.4.2 Scaling Down Problem and Machine Parameters for Simulation..................................227 4.4.3 Dealing with the Parameter Space: An Example Evaluation .........................................230 4.4.4 Summary ........................................................................................................................235 ...................................................................................235 4.5.1 Workload Case Studies...................................................................................................236 4.5.2 Workload Characteristics ...............................................................................................243 ..............................................................................................................250 ..............................................................................................................................251 .................................................................................................................................253 4.6 Concluding Remarks 4.7 References 4.8 Exercises 4.5 Illustrating Workload Characterization 5.5 Assessing Protocol Design Tradeoffs 5.1 Introduction 5.2 Cache Coherence Shared Memory Multiprocessors ...............................................259 ............................................................................................................................259 ....................................................................................................................263 5.2.1 The Cache Coherence Problem ......................................................................................263 5.2.2 Cache Coherence Through Bus Snooping......................................................................266 ..............................................................................................................272 5.3.1 Sequential Consistency ..................................................................................................274 5.3.2 Sufﬁcient Conditions for Preserving Sequential Consistency........................................276 ....................................................................................279 5.4.1 A 3-state (MSI) Write-back Invalidation Protocol .........................................................280 5.4.2 A 4-state (MESI) Write-Back Invalidation Protocol......................................................285 5.4.3 A 4-state (Dragon) Write-back Update Protocol............................................................287 .....................................................................................290 5.5.1 Workloads.......................................................................................................................292 5.5.2 Impact of Protocol Optimizations ..................................................................................296 5.5.3 Tradeoffs in Cache Block Size .......................................................................................298 5.5.4 Update-based vs. Invalidation-based Protocols..............................................................310 ......................................................................................................................314 5.6.1 Components of a Synchronization Event .......................................................................315 5.6.2 Role of User, System Software and Hardware...............................................................316 5.6.3 Mutual Exclusion ...........................................................................................................317 5.6.4 Point-to-point Event Synchronization............................................................................328 5.6.5 Global (Barrier) Event Synchronization.........................................................................330 5.6.6 Synchronization Summary .............................................................................................334 .......................................................................................................335 ..............................................................................................................341 ..............................................................................................................................342 .................................................................................................................................346 5.7 Implications for Software 5.8 Concluding Remarks 5.9 References 5.10 Exercises 9/4/97 DRAFT: Parallel Computer Architecture 11

分享到：

赞收藏

资料库

Parallel Computer Architecture： A Hardware / Software Approach.pdf

相关推荐

开发技术

热门标签

最新资料