logo资料库

Intel Threading Building Blocks.pdf

第1页 / 共334页
第2页 / 共334页
第3页 / 共334页
第4页 / 共334页
第5页 / 共334页
第6页 / 共334页
第7页 / 共334页
第8页 / 共334页
资料共334页,剩余部分请下载后查看
Intel Threading Building Blocks
Table of Contents
Foreword
Note from the Lead Developer of Intel Threading Building Blocks
Preface
Assumptions This Book Makes
Contents of This Book
Conventions Used in This Book
Informal Class Declarations
Using Code Examples
How to Contact Us
Acknowledgments
Why Threading Building Blocks?
Overview
Benefits
Comparison with Raw Threads and MPI
Comparison with OpenMP
Recursive Splitting, Task Stealing, and Algorithms
Thinking Parallel
Elements of Thinking Parallel
Decomposition
Data Parallelism
Task Parallelism
Pipelining (Task and Data Parallelism Together)
Mixed Solutions
Achieving Parallelism
Scaling and Speedup
How Much Parallelism Is There in an Application?
Amdahl’s Law
Gustafson’s observations regarding Amdahl’s Law
What did they really say?
Serial versus parallel algorithms
What Is a Thread?
Programming Threads
Safety in the Presence of Concurrency
Mutual Exclusion and Locks
Correctness
Abstraction
Patterns
Intuition
Basic Algorithms
Initializing and Terminating the Library
Loop Parallelization
parallel_for
Grain size
Automatic grain size
Notes on automatic grain size
parallel_for with partitioner
parallel_reduce
Advanced example
Parallel_reduce with partitioner
Advanced Topic: Other Kinds of Iteration Spaces
Notes on blocked_range2d
parallel_scan
Parallel_scan with partitioner
Recursive Range Specifications
Splittable Concept
Model Types: Splittable Ranges
split Class
Range Concept
Model Types
blocked_range Template Class
blocked_range2d Template Class
Partitioner Concept
Model Types: Partitioners
simple_partitioner Class
auto_partitioner Class
parallel_for Template Function
parallel_reduce Template Function
parallel_scan Template Function
pre_scan_tag and final_scan_tag Classes
Summary of Loops
Advanced Algorithms
Parallel Algorithms for Streams
Cook Until Done: parallel_while
Notes on parallel_while scaling
parallel_while Template Class
Working on the Assembly Line: Pipeline
Throughput of pipeline
Nonlinear pipelines
pipeline Class
filter Class
parallel_sort
parallel_sort Template Function
Containers
concurrent_queue
Iterating over a concurrent_queue for Debugging
When Not to Use Queues
concurrent_queue Template Class
concurrent_vector
concurrent_vector Template Class
Whole Vector Operations
Concurrent Operations
Parallel Iteration
Capacity
Iterators
concurrent_hash_map
More on HashCompare
concurrent_hash_map Template Class
Whole-Table Operations
Concurrent Access
const_accessor
accessor class
Concurrent Operations: find, insert, erase
Parallel Iteration
Capacity
Iterators
Scalable Memory Allocation
Limitations
Problems in Memory Allocation
Memory Allocators
Which Library to Link into Your Application
Using the Allocator Argument to C++ STL Template Classes
Replacing malloc, new, and delete
Replace malloc, free, realloc, and calloc
Replace new and delete
Allocator Concept
Model Types
scalable_allocator Template Class
cache_aligned_allocator Template Class
aligned_space Template Class
Mutual Exclusion
When to Use Mutual Exclusion
Mutexes
Mutex Flavors
Reader-Writer Mutexes
Upgrade/Downgrade
Lock Pathologies
Deadlock
Convoying and priority inversion
Mutexes
Mutex Concept
mutex Class
spin_mutex Class
queuing_mutex Class
ReaderWriterMutex Concept
Model Types
spin_rw_mutex Class
queuing_rw_mutex Class
Atomic Operations
Why atomic Has No Constructors
Memory Consistency and Fences
atomic Template Class
Timing
tick_count Class
tick_count::interval_t Class
Task Scheduler
When Task-Based Programming Is Inappropriate
Much Better Than Raw Native Threads
Oversubscription
Fair Scheduling
High Coding Overhead
Load Imbalance
Portability
Initializing the Library Is Your Job
Example Program for Fibonacci Numbers
Task Scheduling Overview
How Task Scheduling Works
Recommended Task Recurrence Patterns
Blocking Style with Children
Continuation-Passing Style with Children
Recycling the parent as the continuation
Recycling the parent as a child
Making Best Use of the Scheduler
Recursive Chain Reaction
Continuation Passing
Scheduler bypass
Recycling
Empty tasks
Lazy copying
Task Scheduler Interfaces
task_scheduler_init Class
task Class
Task derivation
Processing of execute()
Task allocation
Explicit task destruction
Recycling tasks
Task depth
Synchronization
Task context
Task debugging
empty_task Class
task_list Class
Task Scheduler Summary
Keys to Success
Key Steps to Success
Relaxed Sequential Execution
Safe Concurrency for Methods and Libraries
Debug Versus Release
For Efficiency’s Sake
Enabling Debugging Features
The TBB_DO_ASSERT Macro
Do Not Ship Your Program Built with TBB_DO_ASSERT
The TBB_DO_THREADING_TOOLS Macro
Debug Versus Release Libraries
Mixing with Other Threading Packages
Naming Conventions
The tbb Namespace
The tbb::internal Namespace
The __TBB Prefix
Examples
The Aha! Factor
A Few Other Key Points
parallel_for Examples
ParallelAverage
Seismic
Matrix Multiply
ParallelMerge
SubstringFinder
The Game of Life
Implementation
Automaton
Automata: Implementation
Extending the Application
Futher Reading
Parallel_reduce Examples
ParallelSum
ParallelSum without Having to Specify a Grain Size
ParallelPrime
CountStrings: Using concurrent_hash_map
Switching from an STL map
Quicksort: Visualizing Task Stealing
A Better Matrix Multiply (Strassen)
Advanced Task Programming
Start a Large Task in Parallel with the Main Program
Two Mouths: Feeding Two from the Same Task in a Pipeline
Packet Processing Pipeline
Parallel Programming for an Internet Device
Local Network Router Example
Pipelined Components for the Local Network Router
Network address translation (NAT)
Application Layer Gateway (ALG)
Packet forwarding
Our example
Implementation
The Threading Building Blocks pipeline
Synchronization with the pipeline item and concurrent hash maps
Filter Classes
Class get_next_packet : public tbb::filter
Class output_packet : public tbb::filter
Class translator : public tbb::filter
Class gateway : public tbb::filter
Class forwarding : public tbb::filter
Additional reading
Memory Allocation
Replacing new and delete
Replacing malloc, calloc, realloc, and free
Game Threading Example
Threading Architecture: Physics + Rendering
Overview of Keys to Scalability
A Frame Loop
Domain Decomposition Data Structure Needs
Think Tasks, Not Threads
Load Balancing Versus Task Stealing
Synchronization Between Physics Threads
Integrating the Example into a Real Game
How to Measure Performance
Physics Interaction and Update Code
Open Dynamics Engine
Look for Hotspots
Improving on the First Solution
The Code
History and Related Projects
Libraries
Languages
Pragmas
Generic Programming
Concepts in Generic Programming
Pseudosignatures in Generic Programming
Models in Generic Programming
Caches
Costs of Time Slicing
Quick Introduction to Lambda Functions
Further Reading
Index
Praise for Intel Threading Building Blocks “The Age of Serial Computing is over. With the advent of multi-core processors, parallel- computing technology that was once relegated to universities and research labs is now emerging as mainstream. Intel Threading Building Blocks updates and greatly expands the ‘work-stealing’ technology pioneered by the MIT Cilk system of 15 years ago, providing a modern industrial-strength C++ library for concurrent programming. “Not only does this book offer an excellent introduction to the library, it furnishes novices and experts alike with a clear and accessible discussion of the complexities of concurrency.” — Charles E. Leiserson, MIT Computer Science and Artificial Intelligence Laboratory “We used to say make it right, then make it fast. We can’t do that anymore. TBB lets us design for correctness and speed up front for Maya. This book shows you how to extract the most benefit from using TBB in your code.” — Martin Watt, Senior Software Engineer, Autodesk “TBB promises to change how parallel programming is done in C++. This book will be extremely useful to any C++ programmer. With this book, James achieves two important goals: • Presents an excellent introduction to parallel programming, illustrating the most com- mon parallel programming patterns and the forces governing their use. • Documents the Threading Building Blocks C++ library—a library that provides generic algorithms for these patterns. “TBB incorporates many of the best ideas that researchers in object-oriented parallel computing developed in the last two decades.” — Marc Snir, Head of the Computer Science Department, University of Illinois at Urbana-Champaign “This book was my first introduction to Intel Threading Building Blocks. Thanks to the easy-to-follow discussion of the features implemented and the reasons behind the choices made, the book makes clear that Intel’s Threading Building Blocks are an excellent synthesis of some of the best current parallel programming ideas. The judicious choice of a small but powerful set of patterns and strategies makes the system easy to learn and use. I found the numerous code segments and complete parallel applications presented in the book of great help to understand the main features of the library and illustrate the different ways it can be used in the development of efficient parallel programs.” — David Padua, University of Illinois
“The arrival of the multi-core chip architecture has brought great challenges in parallel programming and there is a tremendous need to have good books that help and guide the users to cope with such challenges. “This book on Intel Threading Building Blocks provides an excellent solution in this direc- tion and is likely to be an important text to teach its readers on parallel programming for multi-cores. “The book illustrates a unique path for readers to follow in using a C++-based parallel programming paradigm—a powerful and practical approach for parallel programming. It is carefully designed and written, and can be used both as a textbook for classroom training, or a cookbook for field engineers.” — Professor Guang R. Gao, University of Delaware “I enjoyed reading this book. It addresses the need for new ways for software developers to create the new generation of parallel programs. In the presence of one of the ‘largest disruptions that information technology has seen’ (referring to the introduction of multi- core architectures), this was desperately needed. “This book also fills an important need for instructional material, educating software engi- neers of the new opportunities and challenges. “The library-based approach, taken by the Threading Building Blocks, could be a signifi- cant new step, as it complements approaches that rely on advanced compiler technology.” — Rudolf Eigenmann, Purdue University, Professor of ECE and Interim Director of Computing Research Institute “Multi-core systems have arrived. Parallel programming models are needed to enable the creation of programs that exploit them. A good deal of education is needed to help sequential programmers adapt to the requirements of this new technology. This book represents progress on both of these fronts. “Threading Building Blocks (TBB) is a flexible, library-based approach to constructing parallel programs that can interoperate with other programming solutions. “This book introduces TBB in a manner that makes it fun and easy to read. Moreover, it is packed full of information that will help beginners as well as experienced parallel programmers to apply TBB to their own programming problems.” — Barbara Chapman, CEO of cOMPunity, Professor of Computer Science at the University of Houston
“Future generations of chips will provide dozens or even hundreds of cores. Writing appli- cations that benefit from the massive computational power offered by these chips is not going to be an easy task for mainstream programmers who are used to sequential algo- rithms rather than parallel ones. “Intel’s TBB is providing a big step forward into this long path, and what is better, all in the C++ framework.” — Eduard Ayguade, Barcelona Supercomputer Center, Technical University of Catalunya “Intel’s TBB is to parallel programming what STL was to plain C++. Generic program- ming with STL dramatically improved C++ programming productivity. TBB offers a generic parallel programming model that hides the complexity of concurrency control. It lowers the barrier to parallel code development, enabling efficient use of ‘killer’ multi- cores.” — Lawrence Rauchwerger, Texas A&M University, Inventor of STAPL “For the last eighteen years the denizens of the thinly populated world of supercomputers have been looking for a way to write really pretty and practical parallel programs in C++. We knew templates and generic programming had to be part of the answer, but it took the arrival of multi-core (and soon many-core) processors to create a fundamental change in the computing landscape. Parallelism is now going to be everyday stuff. “Every C++ programmer is going to need to think about concurrency and parallelism and Threading Building Blocks provides the right abstractions for them to do it correctly. “This book is not just a discussion of a C++ template library. It provides a lovely and in- depth overview of much of what we have learned about parallel computing in the last 25 years. It could be a great textbook for a course on parallel programming.” — Dennis Gannon, Science Director, Pervasive Technology Labs at Indiana University, former head of DARPA’s High Performance Computing (HPC++) project, and steering committee member of the Global Grid Forum “TBB hits the application developer’s sweet spot with such advantages as uniprocessor performance, parallel scalability, C++ programming well beyond OpenMP, compatibility with OpenMP and hand threads, Intel Threading Tools support for performance and confidence, and openness to the software development community. TBB avoids several constraints surrounding the sweet spot: language extension risks, specific compiler depen- dences and hand-threading complexities. “This book should make developers productive without a steep training curve, and the applications they produce should be of high quality and performance.” — David Kuck, Intel Fellow, founder of KAI and former director of the Center for Supercomputing Research and Development
Intel Threading Building Blocks Outfitting C++ for Multi-Core Processor Parallelism
Other resources from O’Reilly Related titles C++ Cookbook™ Practical C++ Programming C++ in a Nutshell Pthreads Programming: A POSIX Standard for Better Multiprocessing Secure Programming Cook- book for C and C++ High Performance Linux Clusters with OSCAR, Rocks, OpenMosix, and MPI oreilly.com oreilly.com is more than a complete catalog of O’Reilly books. You’ll also find links to news, events, articles, weblogs, sample chapters, and code examples. oreillynet.com is the essential portal for developers interested in open and emerging technologies, including new platforms, pro- gramming languages, and operating systems. Conferences O’Reilly brings diverse innovators together to nurture the ideas that spark revolutionary industries. We specialize in document- ing the latest tools and systems, translating the innovator’s knowledge into useful skills for those in the trenches. Visit con- ferences.oreilly.com for our upcoming events. Safari Bookshelf (safari.oreilly.com) is the premier online refer- ence library for programmers and IT professionals. Conduct searches across more than 1,000 books. Subscribers can zero in on answers to time-critical questions in a matter of seconds. Read the books on your Bookshelf from cover to cover or sim- ply flip to the page you need. Try it today for free.
Intel Threading Building Blocks Outfitting C++ for Multi-Core Processor Parallelism James Reinders Beijing • Cambridge • Farnham • Köln • Paris • Sebastopol • Taipei • Tokyo
分享到:
收藏