Efficient C++ Performance Programming Techniques.pdf

发布时间：2022-05-30 发布人：admin 分类：说明书资料大小：1.67M 资料格式：pdf 举报版权申诉

wysheng75-360144-4744300845154180861.pdf-第1页.png

第1页 / 共205页

wysheng75-360144-4744300845154180861.pdf-第2页.png

第2页 / 共205页

wysheng75-360144-4744300845154180861.pdf-第3页.png

第3页 / 共205页

wysheng75-360144-4744300845154180861.pdf-第4页.png

第4页 / 共205页

wysheng75-360144-4744300845154180861.pdf-第5页.png

第5页 / 共205页

wysheng75-360144-4744300845154180861.pdf-第6页.png

第6页 / 共205页

wysheng75-360144-4744300845154180861.pdf-第7页.png

第7页 / 共205页

wysheng75-360144-4744300845154180861.pdf-第8页.png

第8页 / 共205页

sample.pdf

sterling.com

Welcome to Sterling Software

Efficient C++ Performance Programming Techniques.pdf

Table of Content

Dedication

Preface

Introduction

Roots of Software Inefficiency

Figure 1. High-level classification of software performance.

Figure 2. Refinement of the design performance view.

Figure 3. Refinement of the coding performance view.

Our Goal

Software Efficiency: Does It Matter?

Terminology

Organization of This Book

Chapter 1. The Tracing War Story

Our Initial Trace Implementation

What Went Wrong

Figure 1.1. The performance cost of the Trace object.

The Recovery Plan

Figure 1.2. Impact of eliminating one string object.

Figure 1.3. Impact of conditional creation of the string member.

Key Points

Chapter 2. Constructors and Destructors

Inheritance

Figure 2.1. The cost of inheritance in this example.

Composition

Lazy Construction

Redundant Construction

Figure 2.2. Overhead of a silent initialization is negligible in this particular scenario.

Figure 2.3. More significant impact of silent initialization.

Key Points

Chapter 3. Virtual Functions

Virtual Function Mechanics

Templates and Inheritance

Hard Coding

Inheritance

Templates

Key Points

Chapter 4. The Return Value Optimization

The Mechanics of Return-by-Value

The Return Value Optimization

Figure 4.1. The speed-up of RVO.

Computational Constructors

Key Points

Chapter 5. Temporaries

Object Definition

Type Mismatch

Pass by Value

Return by Value

Eliminate Temporaries with op=()

Key Points

Chapter 6. Single-Threaded Memory Pooling

Version 0: The Global new() and delete()

Version 1: Specialized Rational Memory Manager

Figure 6.1. A free list of Rational objects.

Figure 6.2. Global new() and delete() compared to a Rational memory pool.

Version 2: Fixed-Size Object Memory Pool

Figure 6.3. Adding a template memory pool for generic objects.

Version 3: Single-Threaded Variable-Size Memory Manager

Figure 6.4. Variable-size memory free list.

Figure 6.5. A variable-size memory pool is naturally slower than fixed-size.

Key Points

Chapter 7. Multithreaded Memory Pooling

Version 4: Implementation

Figure 7.1. Comparing multithreaded to single-threaded memory pooling.

Version 5: Faster Locking

Figure 7.2. Multithreaded memory pool using faster locks.

Figure 7.3. Comparing the various flavors of memory pooling.

Key Points

Chapter 8. Inlining Basics

What Is Inlining?

Method Invocation Costs

Figure 8.1. Call frame register mapping.

Why Inline?

Inlining Details

Inlining Virtual Methods

Performance Gains from Inlining

Key Points

Chapter 9. Inlining—Performance Considerations

Cross-Call Optimization

Why Not Inline?

Development and Compile-Time Inlining Considerations

Profile-Based Inlining

Table 9.1. The Inlining Decision Matrix

Inlining Rules

Singletons

Trivials

Key Points

Chapter 10. Inlining Tricks

Conditional Inlining

Selective Inlining

Recursive Inlining

Inlining with Static Local Variables

Architectural Caveat: Multiple Register Sets

Key Points

Chapter 11. Standard Template Library

Asymptotic Complexity

Insertion

Figure 11.1. Speed of insertion.

Figure 11.2. Object insertion speed.

Figure 11.3. Comparing object to pointer insertion.

Figure 11.4. Comparing list to vector insertion.

Figure 11.5. Vector insertion with and without capacity reservation.

Figure 11.6. Inserting at the front.

Deletion

Figure 11.7. Comparing list to vector deletion.

Figure 11.8. Deleting elements at the front.

Traversal

Figure 11.9. Container traversal speed.

Find

Figure 11.10. Container search speed.

Figure 11.11. Comparing generic find() to member find().

Function Objects

Figure 11.12. Comparing function objects to function pointers.

Better than STL?

Figure 11.13. Comparing STL speed to home-grown code.

Key Points

Chapter 12. Reference Counting

Figure 12.1. Duplicating resources.

Implementation Details

Figure 12.2. A simple design for a reference-counted Widget class.

Figure 12.3. Adding inheritance and smart pointer to the reference-counting design.

Figure 12.4. BigInt assignment speed.

Figure 12.5. BigInt creation speed.

Preexisting Classes

Figure 12.6. Reference-counting a pre-existing BigInt.

Figure 12.7. BigInt creation speed.

Concurrent Reference Counting

Figure 12.8. Multithreaded BigInt assignment speed.

Figure 12.9. Multithreaded BigInt creation speed.

Key Points

Chapter 13. Coding Optimizations

Caching

Precompute

Reduce Flexibility

80-20 Rule: Speed Up the Common Path

Lazy Evaluation

Useless Computations

System Architecture

Memory Management

Library and System Calls

Compiler Optimization

Key Points

Chapter 14. Design Optimizations

Design Flexibility

Caching

Web Server Timestamps

Data Expansion

The Common Code Trap

Efficient Data Structures

Lazy Evaluation

getpeername()

Useless Computations

Obsolete Code

Key Points

Chapter 15. Scalability

Figure 15.1. Single processor architecture.

Figure 15.2. Threads are the scheduling entities.

The SMP Architecture

Figure 15.3. The SMP Architecture.

Amdahl's Law

Figure 15.4. Potential speedup is limited.

Figure 15.5. A specific design of one Web server.

Multithreaded and Synchronization Terminology

Break Up a Task into Multiple Subtasks

Cache Shared Data

Share Nothing

Partial Sharing

Figure 15.6. A single shared resource.

Figure 15.7. Breaking up a single shared resource.

Lock Granularity

False Sharing

Thundering Herd

Reader/Writer Locks

Key Points

Chapter 16. System Architecture Dependencies

Memory Hierarchies

Table 16.1. Memory Access Speed

Registers: Kings of Memory

Disk and Memory Structures

Cache Effects

Cache Thrash

Avoid Branching

Prefer Simple Calculations to Small Branches

Threading Effects

Context Switching

Kernel Crossing

Threading Choices

Key Points

Bibliography

Efficient C++ Performance Programming Techniques By Dov Bulka, David Mayhew Table of Contents Publisher : Addison Wesley Pub Date : November 03, 1999 ISBN : 0-201-37950-3 Pages : 336 Far too many programmers and software designers consider efficient C++ to be an oxymoron. They regard C++ as inherently slow and inappropriate for performance- critical applications. Consequently, C++ has had little success penetrating domains such as networking, operating system kernels, device drivers, and others. Efficient C++ explodes that myth. Written by two authors with first-hand experience wringing the last ounce of performance from commercial C++ applications, this book demonstrates the potential of C++ to produce highly efficient programs. The book reveals practical, everyday object-oriented design principles and C++ coding techniques that can yield large performance improvements. It points out common pitfalls in both design and code that generate hidden operating costs. This book focuses on combining C++'s power and flexibility with high performance and scalability, resulting in the best of both worlds. Specific topics include temporary objects, memory management, templates, inheritance, virtual functions, inlining, reference- counting, STL, and much more. With this book, you will have a valuable compendium of the best performance techniques at your fingertips. TEAMFLY Team-Fly®

Table of Content Table of Content .................................................................................................................. i Copyright.............................................................................................................................. v Dedication ...................................................................................................................... vi Preface................................................................................................................................ vi Introduction.......................................................................................................................viii Roots of Software Inefficiency...................................................................................viii Our Goal......................................................................................................................... xi Software Efficiency: Does It Matter?.......................................................................... xi Terminology .................................................................................................................. xii Organization of This Book .........................................................................................xiii Chapter 1. The Tracing War Story................................................................................... 1 Our Initial Trace Implementation.................................................................................. 2 Key Points ....................................................................................................................... 7 Chapter 2. Constructors and Destructors....................................................................... 9 Inheritance....................................................................................................................... 9 Composition .................................................................................................................. 18 Lazy Construction ........................................................................................................ 19 Redundant Construction ............................................................................................. 21 Key Points ..................................................................................................................... 25 Chapter 3. Virtual Functions ........................................................................................... 26 Virtual Function Mechanics ........................................................................................ 26 Templates and Inheritance......................................................................................... 28 Key Points ..................................................................................................................... 31 Chapter 4. The Return Value Optimization .................................................................. 32 The Mechanics of Return-by-Value........................................................................... 32 The Return Value Optimization.................................................................................. 33 Computational Constructors....................................................................................... 35 Key Points ..................................................................................................................... 36 Chapter 5. Temporaries................................................................................................... 37 Object Definition........................................................................................................... 37 Type Mismatch ............................................................................................................. 38 Pass by Value............................................................................................................... 40 Return by Value............................................................................................................ 40 Eliminate Temporaries with op=() ........................................................................... 42 Key Points ..................................................................................................................... 43 Chapter 6. Single-Threaded Memory Pooling.............................................................. 44 Version 0: The Global new() and delete()......................................................... 44 Version 1: Specialized Rational Memory Manager................................................. 45 Version 2: Fixed-Size Object Memory Pool ............................................................. 49 Version 3: Single-Threaded Variable-Size Memory Manager............................... 52 Key Points ..................................................................................................................... 58 Chapter 7. Multithreaded Memory Pooling................................................................... 59 Version 4: Implementation.......................................................................................... 59 Version 5: Faster Locking ........................................................................................... 61 Key Points ..................................................................................................................... 64 Chapter 8. Inlining Basics ............................................................................................... 66 What Is Inlining?........................................................................................................... 66 Method Invocation Costs ............................................................................................ 69 Why Inline? ................................................................................................................... 72 Inlining Details .............................................................................................................. 73 Inlining Virtual Methods............................................................................................... 73 Performance Gains from Inlining ............................................................................... 74 ii

Key Points ..................................................................................................................... 75 Chapter 9. Inlining—Performance Considerations...................................................... 76 Cross-Call Optimization .............................................................................................. 76 Why Not Inline? ............................................................................................................ 80 Development and Compile-Time Inlining Considerations...................................... 82 Profile-Based Inlining................................................................................................... 82 Inlining Rules ................................................................................................................ 85 Key Points ..................................................................................................................... 86 Chapter 10. Inlining Tricks .............................................................................................. 87 Conditional Inlining....................................................................................................... 87 Selective Inlining .......................................................................................................... 88 Recursive Inlining......................................................................................................... 89 Inlining with Static Local Variables............................................................................ 92 Architectural Caveat: Multiple Register Sets ........................................................... 94 Key Points ..................................................................................................................... 94 Chapter 11. Standard Template Library ....................................................................... 96 Asymptotic Complexity................................................................................................ 96 Insertion......................................................................................................................... 96 Deletion........................................................................................................................ 103 Traversal...................................................................................................................... 105 Find............................................................................................................................... 106 Function Objects ........................................................................................................ 108 Better than STL? ........................................................................................................ 110 Key Points ................................................................................................................... 112 Chapter 12. Reference Counting ................................................................................. 113 Implementation Details.............................................................................................. 114 Preexisting Classes ................................................................................................... 123 Concurrent Reference Counting.............................................................................. 126 Key Points ................................................................................................................... 129 Chapter 13. Coding Optimizations............................................................................... 131 Caching........................................................................................................................ 132 Precompute................................................................................................................. 133 Reduce Flexibility....................................................................................................... 134 80-20 Rule: Speed Up the Common Path.............................................................. 134 Lazy Evaluation .......................................................................................................... 137 Useless Computations .............................................................................................. 139 System Architecture................................................................................................... 140 Memory Management ............................................................................................... 140 Library and System Calls.......................................................................................... 142 Compiler Optimization............................................................................................... 143 Key Points ................................................................................................................... 144 Chapter 14. Design Optimizations............................................................................... 145 Design Flexibility ........................................................................................................ 145 Caching........................................................................................................................ 148 Efficient Data Structures ........................................................................................... 150 Lazy Evaluation .......................................................................................................... 151 Useless Computations .............................................................................................. 153 Obsolete Code............................................................................................................ 154 Key Points ................................................................................................................... 155 Chapter 15. Scalability................................................................................................... 156 The SMP Architecture ............................................................................................... 158 Amdahl's Law.............................................................................................................. 160 Multithreaded and Synchronization Terminology.................................................. 161 Break Up a Task into Multiple Subtasks................................................................. 162 iii

Cache Shared Data ................................................................................................... 163 Share Nothing............................................................................................................. 164 Partial Sharing ............................................................................................................ 166 Lock Granularity ......................................................................................................... 167 False Sharing.............................................................................................................. 169 Thundering Herd ........................................................................................................ 170 Reader/Writer Locks.................................................................................................. 171 Key Points ................................................................................................................... 172 Chapter 16. System Architecture Dependencies ...................................................... 173 Memory Hierarchies................................................................................................... 173 Registers: Kings of Memory ..................................................................................... 174 Disk and Memory Structures.................................................................................... 177 Cache Effects ............................................................................................................. 179 Cache Thrash ............................................................................................................. 180 Avoid Branching ......................................................................................................... 181 Prefer Simple Calculations to Small Branches...................................................... 182 Threading Effects....................................................................................................... 183 Context Switching ...................................................................................................... 184 Kernel Crossing.......................................................................................................... 186 Threading Choices..................................................................................................... 187 Key Points ................................................................................................................... 189 Bibliography..................................................................................................................... 190 iv

Copyright Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book and Addison-Wesley was aware of a trademark claim, the designations have been printed in initial caps or all caps. The authors and publishers have taken care in the preparation of this book, but make no expressed or implied warranty of any kind and assume no responsibility for errors or omissions. No liability is assumed for incidental or consequential damages in connection with or arising out of the use of the information or programs contained herein. The publisher offers discounts on this book when ordered in quantity for special sales. For more information, please contact: Corporate Government and Special Sales Addison Wesley Longman, Inc. One Jacob Way Reading, Massachusetts 01867 Library of Congress Cataloging-in-Publication Data Bulka, Dov. Efficient C++ : performance programming techniques / Dov Bulka, David Mayhew. p. m. Includes bibliographical references (p. ). ISBN 0-201-37950-3 1. C++ (Computer program language) I. Mayhew, David. II. Title. QA76.73.C153B85 1999 005.13 ‘ 3—dc21 99-39175 CIP Copyright © 2000 by Addison Wesley Longman, Inc. All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form, or by any means, electronic, mechanical, photocopying, recording, or otherwise, without the prior consent of the publisher. Printed in the United States of America. Published simultaneously in Canada. Text printed on recycled and acid-free paper. v

1 2 3 4 5 6 7 8 9 10 —CRS—03 02 01 00 99 First printing, October 1999 Dedication To my mother, Rivka Bulka and to the memory of my father Yacov Bulka, survivor of the Auschwitz concentration camp. They could not take away his kindness, compassion and optimism, which was his ultimate triumph. He passed away during the writing of this book. D.B To Ruth, the love of my life, who made time for me to write this. To the boys, Austin, Alex, and Steve, who missed their dad for a while. To my parents, Mom and Dad, who have always loved and supported me D.M. Preface If you conducted an informal survey of software developers on the issue of C++ performance, you would undoubtedly find that the vast majority of them view performance issues as the Achilles’ heel of an otherwise fine language. We have heard it repeatedly ever since C++ burst on the corporate scene: C++ is a poor choice for implementing performance-critical applications. In the mind of developers, this particular application domain was ruled by plain C and, occasionally, even assembly language. As part of that software community we had the opportunity to watch that myth develop and gather steam. Years ago, we participated in the wave that embraced C++ with enthusiasm. All around us, many development projects plunged in headfirst. Some time later, software solutions implemented in C++ began rolling out. Their performance was typically less than optimal, to put it gently. Enthusiasm over C++ in performance-critical domains has cooled. We were in the business of supplying networking software whose execution speed was not up for negotiation—speed was top priority. Since networking software is pretty low on the software food-chain, its performance is crucial. Large numbers of applications were going to sit on top of it and depend on it. Poor performance in the low levels ripples all the way up to higher level applications. Our experience was not unique. All around, early adopters of C++ had difficulties with the resulting performance of their C++ code. Instead of attributing the difficulties to the steep learning curve of the new object-oriented software development paradigm, we blamed it on C++, the dominant language for the expression of the paradigm. Even though C++ compilers were still essentially in their infancy, the language was branded as inherently slow. This belief spread quickly and is now widely accepted as fact. Software organizations that passed on C++ frequently pointed to performance as their key concern. That concern was rooted in the perception that C++ cannot match the performance delivered by its C counterpart. Consequently, C++ has had little success penetrating software domains that view performance as top priority: operating system kernels, device drivers, networking systems (routers, gateways, protocol stacks), and more. We have spent years dissecting large systems of C and C++ code trying to squeeze every ounce of performance out of them. It is through our experience of slugging it out in the trenches that we have come to appreciate the potential of C++ to produce highly efficient programs. We’ve seen it done in practice. This book is our attempt to share that experience and document the many lessons we have learned in our own pursuit of C++ efficiency. Writing efficient C++ is not trivial, nor is it rocket science. It takes the vi

understanding of some performance principles, as well as information on C++ performance traps and pitfalls. The 80-20 rule is an important principle in the world of software construction. We adopt it in the writing of this book as well: 20% of all performance bugs will show up 80% of the time. We therefore chose to concentrate our efforts where it counts the most. We are interested in those performance issues that arise frequently in industrial code and have significant impact. This book is not an exhaustive discussion of the set of all possible performance bugs and their solutions; hence, we will not cover what we consider esoteric and rare performance pitfalls. Our point of view is undoubtedly biased by our practical experience as programmers of server-side, performance-critical communications software. This bias impacts the book in several ways: • The profile of performance issues that we encounter in practice may be slightly different in nature than those found in scientific computing, database applications, and other domains. That’s not a problem. Generic performance principles transcend distinct domains, and apply equally well in domains other than networking software. • At times, we invented contrived examples to drive a point home, although we tried to minimize this. We have made enough coding mistakes in the past to have a sizable collection of samples taken from real production-level code that we have worked on. Our expertise was earned the hard way—by learning from our own mistakes as well as those of our colleagues. As much as possible, we illustrated our points with real code samples. • We do not delve into the asymptotic complexity of algorithms, data structures, and the latest and greatest techniques for accessing, sorting, searching, and compressing data. These are important topics, but they have been extensively covered elsewhere [Knu73, BR95, KP74]. Instead, we focus on simple, practical, everyday coding and design principles that yield large performance improvements. We point out common design and coding practices that lead to poor performance, whether it be through the unwitting use of language features that carry high hidden costs or through violating any number of subtle (and not so subtle) performance principles. So how do we separate myth from reality? Is C++ performance truly inferior to that of C? It is our contention that the common perception of inferior C++ performance is invalid. We concede that in general, when comparing a C program to a C++ version of what appears to be the same thing, the C program is generally faster. However, we also claim that the apparent similarity of the two programs typically is based on their data handling functionality, not their correctness, robustness, or ease of maintenance. Our contention is that when C programs are brought up to the level of C++ programs in these regards, the speed differences disappear, or the C++ versions are faster. Thus C++ is inherently neither slower nor faster. It could be either, depending on how it is used and what is required from it. It’s the way it is used that matters: If used properly, C++ can yield software systems exhibiting not just acceptable performance, but yield superior software performance. We would like to thank the many people who contributed to this work. The toughest part was getting started and it was our editor, Marina Lang, who was instrumental in getting this project off the ground. Julia Sime made a significant contribution to the early draft and Yomtov Meged contributed many valuable suggestions as well. He also was the one who pointed out to us the subtle difference between our opinions and the absolute truth. Although those two notions may coincide at times, they are still distinct. Many thanks to the reviewers hired by Addison-Wesley; their feedback was extremely valuable. Thanks also to our friends and colleagues who reviewed portions of the manuscript. They are, in no particular order, Cyndy Ross, Art Francis, Scott Snyder, Tricia York, Michael Fraenkel, Carol Jones, Heather Kreger, Kathryn Britton, Ruth Willenborg, David Wisler, Bala Rajaraman, Don “Spike” Washburn, and Nils Brubaker. Last but not least, we would like to thank our wives, Cynthia Powers Bulka and Ruth Washington Mayhew. vii

Introduction In the days of assembler language programming, experienced programmers estimated the execution speed of their source code by counting the number of assembly language instructions. On some architectures, such as RISC, most assembler instructions executed in one clock cycle each. Other architectures featured wide variations in instruction to instruction execution speed, but experienced programmers were able to develop a good feel for average instruction latency. If you knew how many instructions your code fragment contained, you could estimate with accuracy the number of clock cycles their execution would consume. The mapping from source code to assembler was trivially one-to-one. The assembler code was the source code. On the ladder of programming languages, C is one step higher than assembler language. C source code is not identical to the corresponding compiler-generated assembler code. It is the compiler’s task to bridge the gap from source code to assembler. The mapping of source-to-assembler code is no longer the one-to- one identity mapping. It remains, however, a linear relationship: Each source level statement in C corresponds to a small number of assembler instructions. If you estimate that each C statement translates into five to eight assembler instructions, chances are you will be in the ballpark. C++ has shattered this nice linear relationship between the number of source level statements and compiler-generated assembly statement count. Whereas the cost of C statements is largely uniform, the cost of C++ statements fluctuates wildly. One C++ statement can generate three assembler instructions, whereas another can generate 300. Implementing high-performance C++ code has placed a new and unexpected demand on programmers: the need to navigate through a performance minefield, trying to stay on a safe three-instruction-per-statement path and to avoid usage of routes that contain 300-instruction land mines. Programmers must identify language constructs likely to generate large overhead and know how to code or design around them. These are considerations that C and assembler language programmers have never had to worry about. The only exception may be the use of macros in C, but those are hardly as frequent as the invocations of constructors and destructors in C++ code. The C++ compiler might also insert code into the execution flow of your program “behind your back.” This is news to the unsuspecting C programmer migrating to C++ (which is where many of us are coming from). The task of writing efficient C++ programs requires C++ developers to acquire new performance skills that are specific to C++ and that transcend the generic software performance principles. In C programming, you are not likely to be blindsided by hidden overhead, so it is possible to stumble upon good performance in a C program. In contrast, this is unlikely to happen in C++: You are not going to achieve good performance accidentally, without knowing the pitfalls lurking about. To be fair, we have seen many examples of poor performance that were rooted in inefficient object- oriented (OO) design. The ideas of software flexibility and reuse have been promoted aggressively ever since OO moved into the mainstream. However, flexibility and reuse seldom go hand-in-hand with performance and efficiency. In mathematics, it would be painful to reduce every theorem back to basic principles. Mathematicians try to reuse results that have already been proven. Outside mathematics, however, it often makes sense to leverage special circumstances and to take shortcuts. In software design, it is acceptable under some circumstances to place higher priority on performance than reuse. When you implement the read() or write() function of a device driver, the known performance requirements are generally much more important to your software’s success than the possibility that at some point in the future it might be reused. Some performance problems in OO design are due to putting the emphasis on the wrong place at the wrong time. Programmers should focus on solving the problem they have, not on making their current solution amenable to some unidentified set of possible future requirements. Roots of Software Inefficiency Silent C++ overhead is not the root of all performance evil. Even eliminating compiler-generated overhead would not always be sufficient. If that were the case, then every C program would enjoy automatic awesome performance due to the lack of silent overhead. Additional factors affect software performance in viii

分享到：

赞收藏

资料库

Efficient C++ Performance Programming Techniques.pdf

相关推荐

开发技术

热门标签

最新资料