logo资料库

Computer Organization and Design:The Hardware/Software Interface.pdf

第1页 / 共912页
第2页 / 共912页
第3页 / 共912页
第4页 / 共912页
第5页 / 共912页
第6页 / 共912页
第7页 / 共912页
第8页 / 共912页
资料共912页,剩余部分请下载后查看
Computer Architecture Design
1. Fundamentals of Computer Design
1.1 Introduction
1.2 The Task of a Computer Designer
1.3 Technology and Computer Usage Trends
1.4 Cost and Trends in Cost
1.5 Measuring and Reporting Performance
1.6 Quantitative Principles of Computer Design
1.7 Putting It All Together: The Concept of Memory Hierarchy
1.8 Fallacies and Pitfalls
1.9 Concluding Remarks
1.10 Historical Perspective and References
- References
- Exercises
2. Instruction Set Principles and Examples
2.1 Introduction
2.2 Classifying Instruction Set Architectures
2.3 Memory Addressing
2.4 Operations in the Instruction Set
2.5 Type and Size of Operands
2.6 Encoding an Instruction Set
2.7 Crosscutting Issues: The Role of Compilers
2.8 Putting It All Together: The DLX Architecture
2.9 Fallacies and Pitfalls
2.10 Concluding Remarks
2.11 Historical Perspective and References
- References
- Exercises
3. Pipelining
3.1 What Is Pipelining?
3.2 The Basic Pipeline for DLX
3.3 The Major Hurdle of Pipelining. Pipeline Hazards
3.4 Data Hazards
3.5 Control Hazards
3.6 What Makes Pipelining Hard to Implement?
3.7 Extending the DLX Pipeline to Handle Multicycle Operations
3.8 Crosscutting Issues: Instruction Set Design and Pipelining
3.9 Putting It All Together: The MIPS R4000 Pipeline
3.10 Fallacies and Pitfalls
3.11 Concluding Remarks
3.12 Historical Perspective and References
- References
- Exercises
4. Advanced Pipelining and Instruction- Level Parallelism
4.1 Instruction-Level Parallelism: Concepts and Challenges
4.2 Overcoming Data Hazards with Dynamic Scheduling
4.3 Reducing Branch Penalties with Dynamic Hardware Prediction
4.4 Taking Advantage of More ILP with Multiple Issue
4.5 Compiler Support for Exploiting ILP
4.6 Hardware Support for Extracting More Parallelism
4.7 Studies of ILP
4.8 Putting It All Together: The PowerPC 620
4.9 Fallacies and Pitfalls
4.10 Concluding Remarks
4.11 Historical Perspective and References
- References
- Exercises
5. Memory-Hierarchy Design
5.1 Introduction
5.2 The ABCs of Caches
5.3 Reducing Cache Misses
5.4 Reducing Cache Miss Penalty
5.5 Reducing Hit Time
5.6 Main Memory
5.7 Virtual Memory
5.8 Protection and Examples of Virtual Memory
5.9 Crosscutting Issues in the Design of Memory Hierarchies
5.10 Putting It All Together: The Alpha AXP 21064 Memory Hierarchy
5.11 Fallacies and Pitfalls
5.12 Concluding Remarks
5.13 Historical Perspective and References
- References
- Exercises
6. Storage Systems
6.1 Introduction
6.2 Types of Storage Devices
6.3 Buses.Connecting I/O Devices to CPU/Memory
6.4 I/O Performance Measures
6.5 Reliability, Availability, and RAID
6.6 Crosscutting Issues: Interfacing to an Operating System
6.7 Designing an I/O System
6.8 Putting It All Together: UNIX File System Performance
6.9 Fallacies and Pitfalls
6.10 Concluding Remarks
6.11 Historical Perspective and References
- References
- Exercises
7. Interconnection Networks
7.1 Introduction
7.2 A Simple Network
7.3 Connecting the Interconnection Network to the Computer
7.4 Interconnection Network Media
7.5 Connecting More Than Two Computers
7.6 Practical Issues for Commercial Interconnection Networks
7.7 Examples of Interconnection Networks
7.8 Crosscutting Issues for Interconnection Networks
7.9 Internetworking
7.10 Putting It All Together: An ATM Network of Workstations
7.11 Fallacies and Pitfalls
7.12 Concluding Remarks
7.13 Historical Perspective and References
- References
- Exercises
8. Multiprocessors
8.1 Introduction
8.2 Characteristics of Application Domains
8.3 Centralized Shared-Memory Architectures
8.4 Distributed Shared-Memory Architectures
8.5 Synchronization
8.6 Models of Memory Consistency
8.7 Crosscutting Issues
8.8 Putting It All Together: The SGI Challenge Multiprocessor
8.9 Fallacies and Pitfalls
8.10 Concluding Remarks
8.11 Historical Perspective and References
- References
- Exercises
Appendix
A. Computer Arithmetic
A.1 Introduction
A.2 Basic Techniques of Integer Arithmetic
A.3 Floating Point
A.4 Floating-Point Multiplication
A.5 Floating-Point Addition
A.6 Division and Remainder
A.7 More on Floating-Point Arithmetic
A.8 Speeding Up Integer Addition
A.9 Speeding Up Integer Multiplication and Division
A.10 Putting It All Together
A.11 Fallacies and Pitfalls
A.12 Historical Perspective and References
- References
- Exercises
B . Vector Processors
B.1 Why Vector Processors?
B.2 Basic Vector Architecture
B.3 Two Real-World Issues: Vector Length and Stride
B.4 Effectiveness of Compiler Vectorization
B.5 Enhancing Vector Performance
B.6 Putting It All Together: Performance of Vector Processors
B.7 Fallacies and Pitfalls
B.8 Concluding Remarks
B.9 Historical Perspective and References
- References
- Exercises
C. Survey of RISC Architectures
C.1 Introduction
C.2 Addressing Modes and Instruction Formats
C.3 Instructions: The DLX Subset
C.4 Instructions: Common Extensions to DLX
C.5 Instructions Unique to MIPS
C.6 Instructions Unique to SPARC
C.7 Instructions Unique to PowerPC
C.8 Instructions Unique to PA-RISC
C.9 Concluding Remarks
C.10 References
1 Fundamentals of Computer Design 1 And now for something completely different. Monty Python’s Flying Circus
1.1 1.2 1.3 1.4 Introduction The Task of a Computer Designer Technology and Computer Usage Trends Cost and Trends in Cost 1.5 Measuring and Reporting Performance 1.6 1.7 1.8 1.9 Quantitative Principles of Computer Design Putting It All Together: The Concept of Memory Hierarchy Fallacies and Pitfalls Concluding Remarks 1.10 Historical Perspective and References Exercises 1 3 6 8 18 29 39 44 51 53 60 1.1 Introduction Computer technology has made incredible progress in the past half century. In 1945, there were no stored-program computers. Today, a few thousand dollars will purchase a personal computer that has more performance, more main memo- ry, and more disk storage than a computer bought in 1965 for $1 million. This rapid rate of improvement has come both from advances in the technology used to build computers and from innovation in computer design. While technological improvements have been fairly steady, progress arising from better computer architectures has been much less consistent. During the first 25 years of elec- tronic computers, both forces made a major contribution; but beginning in about 1970, computer designers became largely dependent upon integrated circuit tech- nology. During the 1970s, performance continued to improve at about 25% to 30% per year for the mainframes and minicomputers that dominated the industry. The late 1970s saw the emergence of the microprocessor. The ability of the microprocessor to ride the improvements in integrated circuit technology more closely than the less integrated mainframes and minicomputers led to a higher rate of improvement—roughly 35% growth per year in performance.
2 Chapter 1 Fundamentals of Computer Design This growth rate, combined with the cost advantages of a mass-produced microprocessor, led to an increasing fraction of the computer business being based on microprocessors. In addition, two significant changes in the computer marketplace made it easier than ever before to be commercially successful with a new architecture. First, the virtual elimination of assembly language program- ming reduced the need for object-code compatibility. Second, the creation of standardized, vendor-independent operating systems, such as UNIX, lowered the cost and risk of bringing out a new architecture. These changes made it possible to successively develop a new set of architectures, called RISC architectures, in the early 1980s. Since the RISC-based microprocessors reached the market in the mid 1980s, these machines have grown in performance at an annual rate of over 50%. Figure 1.1 shows this difference in performance growth rates. SPECint rating 350 300 250 200 150 100 50 0 DEC Alpha 1.58x per year DEC Alpha IBM Power2 DEC Alpha HP 9000 1.35x per year MIPS R3000 IBM Power1 MIPS R2000 SUN4 1984 1985 1986 1987 1988 1989 Year 1990 1991 1992 1993 1994 1995 This chart plots the performance as measured by the SPECint benchmarks. FIGURE 1.1 Growth in microprocessor performance since the mid 1980s has been substantially higher than in ear- lier years. Prior to the mid 1980s, micropro- cessor performance growth was largely technology driven and averaged about 35% per year. The increase in growth since then is attributable to more advanced architectural ideas. By 1995 this growth leads to more than a factor of five difference in performance. Performance for floating-point-oriented calculations has increased even faster.
1.2 The Task of a Computer Designer 3 The effect of this dramatic growth rate has been twofold. First, it has signifi- cantly enhanced the capability available to computer users. As a simple example, consider the highest-performance workstation announced in 1993, an IBM Power-2 machine. Compared with a CRAY Y-MP supercomputer introduced in 1988 (probably the fastest machine in the world at that point), the workstation of- fers comparable performance on many floating-point programs (the performance for the SPEC floating-point benchmarks is similar) and better performance on in- teger programs for a price that is less than one-tenth of the supercomputer! Second, this dramatic rate of improvement has led to the dominance of micro- processor-based computers across the entire range of the computer design. Work- stations and PCs have emerged as major products in the computer industry. Minicomputers, which were traditionally made from off-the-shelf logic or from gate arrays, have been replaced by servers made using microprocessors. Main- frames are slowly being replaced with multiprocessors consisting of small num- bers of off-the-shelf microprocessors. Even high-end supercomputers are being built with collections of microprocessors. Freedom from compatibility with old designs and the use of microprocessor technology led to a renaissance in computer design, which emphasized both ar- chitectural innovation and efficient use of technology improvements. This renais- sance is responsible for the higher performance growth shown in Figure 1.1—a rate that is unprecedented in the computer industry. This rate of growth has com- pounded so that by 1995, the difference between the highest-performance micro- processors and what would have been obtained by relying solely on technology is more than a factor of five. This text is about the architectural ideas and accom- panying compiler improvements that have made this incredible growth rate possi- ble. At the center of this dramatic revolution has been the development of a quantitative approach to computer design and analysis that uses empirical obser- vations of programs, experimentation, and simulation as its tools. It is this style and approach to computer design that is reflected in this text. Sustaining the recent improvements in cost and performance will require con- tinuing innovations in computer design, and the authors believe such innovations will be founded on this quantitative approach to computer design. Hence, this book has been written not only to document this design style, but also to stimu- late you to contribute to this progress. 1.2 The Task of a Computer Designer The task the computer designer faces is a complex one: Determine what attributes are important for a new machine, then design a machine to maximize performance while staying within cost constraints. This task has many aspects, including instruction set design, functional organization, logic design, and imple- mentation. The implementation may encompass integrated circuit design,
4 Chapter 1 Fundamentals of Computer Design packaging, power, and cooling. Optimizing the design requires familiarity with a very wide range of technologies, from compilers and operating systems to logic design and packaging. In the past, the term computer architecture often referred only to instruction set design. Other aspects of computer design were called often insinuating that implementation is uninteresting or less challenging. The authors believe this view is not only incorrect, but is even responsible for mistakes in the design of new instruction sets. The architect’s or designer’s job is much more than instruction set design, and the technical hurdles in the other aspects of the project are certainly as challenging as those encountered in doing instruction set design. This is particularly true at the present when the differences among in- struction sets are small (see Appendix C). implementation, organization In this book the term instruction set architecture refers to the actual programmer- visible instruction set. The instruction set architecture serves as the boundary be- tween the software and hardware, and that topic is the focus of Chapter 2. The im- plementation of a machine has two components: organization and hardware. The term includes the high-level aspects of a computer’s design, such as the memory system, the bus structure, and the internal CPU (central processing unit—where arithmetic, logic, branching, and data transfer are implemented) design. For example, two machines with the same instruction set architecture but different organizations are the SPARCstation-2 and SPARCstation-20. Hardware is used to refer to the specifics of a machine. This would include the detailed logic design and the packaging technology of the machine. Often a line of ma- chines contains machines with identical instruction set architectures and nearly identical organizations, but they differ in the detailed hardware implementation. For example, two versions of the Silicon Graphics Indy differ in clock rate and in detailed cache structure. In this book the word is intended to cover all three aspects of computer design—instruction set architecture, organization, and hardware. architecture Computer architects must design a computer to meet functional requirements as well as price and performance goals. Often, they also have to determine what the functional requirements are, and this can be a major task. The requirements may be specific features, inspired by the market. Application software often drives the choice of certain functional requirements by determining how the ma- chine will be used. If a large body of software exists for a certain instruction set architecture, the architect may decide that a new machine should implement an existing instruction set. The presence of a large market for a particular class of applications might encourage the designers to incorporate requirements that would make the machine competitive in that market. Figure 1.2 summarizes some requirements that need to be considered in designing a new machine. Many of these requirements and features will be examined in depth in later chapters. Once a set of functional requirements has been established, the architect must try to optimize the design. Which design choices are optimal depends, of course, on the choice of metrics. The most common metrics involve cost and perfor-
1.2 The Task of a Computer Designer 5 Functional requirements Typical features required or supported Application area General purpose Scientific Commercial Level of software compatibility At programming language Object code or binary compatible Operating system requirements Size of address space Memory management Protection Standards Floating point I/O bus Operating systems Networks Programming languages Target of computer Balanced performance for a range of tasks (Ch 2,3,4,5) High-performance floating point (App A,B) Support for COBOL (decimal arithmetic); support for databases and transaction processing (Ch 2,7) Determines amount of existing software for machine Most flexible for designer; need new compiler (Ch 2,8) Instruction set architecture is completely defined—little flexibility—but no in- vestment needed in software or porting programs Necessary features to support chosen OS (Ch 5,7) Very important feature (Ch 5); may limit applications Required for modern OS; may be paged or segmented (Ch 5) Different OS and application needs: page vs. segment protection (Ch 5) Certain standards may be required by marketplace Format and arithmetic: IEEE, DEC, IBM (App A) For I/O devices: VME, SCSI, Fiberchannel (Ch 7) UNIX, DOS, or vendor proprietary Support required for different networks: Ethernet, ATM (Ch 6) Languages (ANSI C, Fortran 77, ANSI COBOL) affect instruction set (Ch 2) . FIGURE 1.2 Summary of some of the most important functional requirements an architect faces The left-hand col- umn describes the class of requirement, while the right-hand column gives examples of specific features that might be needed. The right-hand column also contains references to chapters and appendices that deal with the specific issues. mance. Given some application domain, the architect can try to quantify the per- formance of the machine by a set of programs that are chosen to represent that application domain. Other measurable requirements may be important in some markets; reliability and fault tolerance are often crucial in transaction processing environments. Throughout this text we will focus on optimizing machine cost/ performance. In choosing between two designs, one factor that an architect must consider is design complexity. Complex designs take longer to complete, prolonging time to market. This means a design that takes longer will need to have higher perfor- mance to be competitive. The architect must be constantly aware of the impact of his design choices on the design time for both hardware and software. In addition to performance, cost is the other key parameter in optimizing cost/ performance. In addition to cost, designers must be aware of important trends in both the implementation technology and the use of computers. Such trends not only impact future cost, but also determine the longevity of an architecture. The next two sections discuss technology and cost trends.
6 Chapter 1 Fundamentals of Computer Design 1.3 Technology and Computer Usage Trends If an instruction set architecture is to be successful, it must be designed to survive changes in hardware technology, software technology, and application character- istics. The designer must be especially aware of trends in computer usage and in computer technology. After all, a successful new instruction set architecture may last decades—the core of the IBM mainframe has been in use since 1964. An ar- chitect must plan for technology changes that can increase the lifetime of a suc- cessful machine. Trends in Computer Usage The design of a computer is fundamentally affected both by how it will be used and by the characteristics of the underlying implementation technology. Changes in usage or in implementation technology affect the computer design in different ways, from motivating changes in the instruction set to shifting the payoff from important techniques such as pipelining or caching. Trends in software technology and how programs will use the machine have a long-term impact on the instruction set architecture. One of the most important software trends is the increasing amount of memory used by programs and their data. The amount of memory needed by the average program has grown by a fac- tor of 1.5 to 2 per year! This translates to a consumption of address bits at a rate of approximately 1/2 bit to 1 bit per year. This rapid rate of growth is driven both by the needs of programs as well as by the improvements in DRAM technology that continually improve the cost per bit. Underestimating address-space growth is often the major reason why an instruction set architecture must be abandoned. (For further discussion, see Chapter 5 on memory hierarchy.) Another important software trend in the past 20 years has been the replace- ment of assembly language by high-level languages. This trend has resulted in a larger role for compilers, forcing compiler writers and architects to work together closely to build a competitive machine. Compilers have become the primary interface between user and machine. In addition to this interface role, compiler technology has steadily improved, taking on newer functions and increasing the efficiency with which a program can be run on a machine. This improvement in compiler technology has included traditional optimizations, which we discuss in Chapter 2, as well as transforma- tions aimed at improving pipeline behavior (Chapters 3 and 4) and memory sys- tem behavior (Chapter 5). How to balance the responsibility for efficient execution in modern processors between the compiler and the hardware contin- ues to be one of the hottest architecture debates of the 1990s. Improvements in compiler technology played a major role in making vector machines (Appendix B) successful. The development of compiler technology for parallel machines is likely to have a large impact in the future.
1.3 Technology and Computer Usage Trends 7 Trends in Implementation Technology To plan for the evolution of a machine, the designer must be especially aware of rapidly occurring changes in implementation technology. Three implementation technologies, which change at a dramatic pace, are critical to modern implemen- tations: n n n —Transistor density increases by about Integrated circuit logic technology 50% per year, quadrupling in just over three years. Increases in die size are less predictable, ranging from 10% to 25% per year. The combined effect is a growth rate in transistor count on a chip of between 60% and 80% per year. De- vice speed increases nearly as fast; however, metal technology used for wiring does not improve, causing cycle times to improve at a slower rate. We discuss this further in the next section. —Density increases by just under 60% per year, quadru- Semiconductor DRAM pling in three years. Cycle time has improved very slowly, decreasing by about one-third in 10 years. Bandwidth per chip increases as the latency decreases. In addition, changes to the DRAM interface have also improved the bandwidth; these are discussed in Chapter 5. In the past, DRAM (dynamic random-access memory) technology has improved faster than logic technology. This differ- ence has occurred because of reductions in the number of transistors per DRAM cell and the creation of specialized technology for DRAMs. As the im- provement from these sources diminishes, the density growth in logic technol- ogy and memory technology should become comparable. —Recently, disk density has been improving by Magnetic disk technology about 50% per year, almost quadrupling in three years. Prior to 1990, density increased by about 25% per year, doubling in three years. It appears that disk technology will continue the faster density growth rate for some time to come. Access time has improved by one-third in 10 years. This technology is central to Chapter 6. These rapidly changing technologies impact the design of a microprocessor that may, with speed and technology enhancements, have a lifetime of five or more years. Even within the span of a single product cycle (two years of design and two years of production), key technologies, such as DRAM, change suffi- ciently that the designer must plan for these changes. Indeed, designers often de- sign for the next technology, knowing that when a product begins shipping in volume that next technology may be the most cost-effective or may have perfor- mance advantages. Traditionally, cost has decreased very closely to the rate at which density increases. These technology changes are not continuous but often occur in discrete steps. For example, DRAM sizes are always increased by factors of four because of the basic design structure. Thus, rather than doubling every 18 months, DRAM tech- nology quadruples every three years. This stepwise change in technology leads to
分享到:
收藏