logo资料库

[2010] Hardware Acceleration of EDA Algorithms - Custom ICs, FPGAs and GPUs.pdf

第1页 / 共207页
第2页 / 共207页
第3页 / 共207页
第4页 / 共207页
第5页 / 共207页
第6页 / 共207页
第7页 / 共207页
第8页 / 共207页
资料共207页,剩余部分请下载后查看
1441909435
Hardware Acceleration of EDA Algorithms
Foreword
Preface
Acknowledgments
Contents
List of Tables
List of Figures
Part I Alternative Hardware Platforms
1 Introduction
1.1 Hardware Platforms Considered in This Research Monograph
1.2 EDA Algorithms Studied in This Research Monograph
1.2.1 Control-Dominated Applications
1.2.2 Control Plus Data Parallel Applications
1.3 Automated Approach for GPU-Based Software Acceleration
1.4 Chapter Summary
References
2 Hardware Platforms
2.1 Chapter Overview
2.2 Introduction
2.3 Hardware Platforms Studied in This Research Monograph
2.3.1 Custom ICs
2.3.2 FPGAs
2.3.3 Graphics Processors
2.4 General Overview and Architecture
2.5 Programming Model and Environment
2.6 Scalability
2.7 Design Turn-Around Time
2.8 Performance
2.9 Cost of Hardware
2.10 Floating Point Operations
2.11 Security and Real-Time Applications
2.12 Applications
2.13 Chapter Summary
References
3 GPU Architecture and the CUDA Programming Model
3.1 Chapter Overview
3.2 Introduction
3.3 Hardware Model
3.4 Memory Model
3.5 Programming Model
3.6 Chapter Summary
References
Part II Control-Dominated Category
4 Accelerating Boolean Satisfiability on a Custom IC
4.1 Chapter Overview
4.2 Introduction
4.3 Previous Work
4.4 Hardware Architecture
4.4.1 Abstract Overview
4.4.2 Hardware Overview
4.4.3 Hardware Details
4.4.3.1 Decision Engine
4.4.3.2 Clause Cell
4.4.3.3 Base Cell
4.4.3.4 Partitioning the Hardware
4.4.3.5 Inter-bank Communication
4.5 An Example of Conflict Clause Generation
4.6 Partitioning the CNF Instance
4.7 Extraction of the Unsatisfiable Core
4.8 Experimental Results
4.9 Chapter Summary
References
5 Accelerating Boolean Satisfiability on an FPGA
5.1 Chapter Overview
5.2 Introduction
5.3 Previous Work
5.4 Hardware Architecture
5.4.1 Architecture Overview
5.5 Solving a CNF Instance Which Is Partitioned into Several Bins
5.6 Partitioning the CNF Instance
5.7 Hardware Details
5.8 Experimental Results
5.8.1 Current Implementation
5.8.2 Performance Model
5.8.2.1 FPGA Resources
5.8.2.2 Clauses/Variable Ratio
5.8.2.3 Cycles Versus Bin Size
5.8.2.4 Bins Touched Versus Bin Size
5.8.2.5 Bin Size
5.8.3 Projections
5.9 Chapter Summary
References
6 Accelerating Boolean Satisfiability on a Graphics Processing Unit
6.1 Chapter Overview
6.2 Introduction
6.3 Related Previous Work
6.4 Our Approach
6.4.1 SurveySAT and the GPU
6.4.1.1 SurveySAT
6.4.1.2 SurveySAT on the GPU
6.4.1.3 SurveySAT Results on the GPU
6.4.2 MiniSAT Enhanced with Survey Propagation (MESP)
6.5 Experimental Results
6.6 Chapter Summary
References
Part III Control Plus Data Parallel Applications
7 Accelerating Statistical Static Timing Analysis Using Graphics Processors
7.1 Chapter Overview
7.2 Introduction
7.3 Previous Work
7.4 Our Approach
7.4.1 Static Timing Analysis (STA) at a Gate
7.4.2 Statistical Static Timing Analysis (SSTA) at a Gate
7.5 Experimental Results
7.6 Chapter Summary
References
8 Accelerating Fault Simulation Using Graphics Processors
8.1 Chapter Overview
8.2 Introduction
8.3 Previous Work
8.4 Our Approach
8.4.1 Logic Simulation at a Gate
8.4.2 Fault Injection at a Gate
8.4.3 Fault Detection at a Gate
8.4.4 Fault Simulation of a Circuit
8.5 Experimental Results
8.6 Chapter Summary
References
9 Fault Table Generation Using Graphics Processors
9.1 Chapter Overview
9.2 Introduction
9.3 Previous Work
9.4 Our Approach
9.4.1 Definitions
9.4.2 Algorithms: FSIM* and GFTABLE
9.4.2.1 Generating Vectors (Line 9)
9.4.2.2 Fault-Free Simulation (Line 10)
9.4.2.3 Computing Detectabilities and Cumulative Detectabilities (Lines 13, 14)
9.4.2.4 Fault Simulation of SR(s) (Lines 15, 16)
9.4.2.5 Generating the Fault Table (Lines 22--31)
9.5 Experimental Results
9.6 Chapter Summary
References
10 Accelerating Circuit Simulation Using Graphics Processors
10.1 Chapter Overview
10.2 Introduction
10.3 Previous Work
10.4 Our Approach
10.4.1 Parallelizing BSIM3 Model Computations on a GPU
10.4.1.1 Inlining if--then--else Code
10.4.1.2 Partitioning the BSIM3 Code into Kernels
10.4.1.3 Efficient Use of GPU Memory Model
10.4.1.4 Thread Scheduler and Code Statistics
10.5 Experiments
10.6 Chapter Summary
References
Part IV Automated Generation of GPU Code
11 Automated Approach for Graphics Processor Based Software Acceleration
11.1 Chapter Overview
11.2 Introduction
11.3 Our Approach
11.3.1 Problem Definition
11.3.2 GPU Constraints on the Kernel Generation Engine
11.3.3 Automatic Kernel Generation Engine
11.3.3.1 Node Duplication
11.3.3.2 Cost of a Partitioning Solution
11.4 Experimental Results
11.4.1 Evaluation Methodology
11.5 Chapter Summary
References
12 Conclusions
References
Index
Hardware Acceleration of EDA Algorithms
Kanupriya Gulati · Sunil P. Khatri Hardware Acceleration of EDA Algorithms Custom ICs, FPGAs and GPUs 123
Kanupriya Gulati 109 Branchwood Trl Coppell TX 75019 USA kgulati@tamu.edu Sunil P. Khatri Department of Electrical & Computer Engineering Texas A & M University College Station TX 77843-3128 214 Zachry Engineering Center USA sunilkhatri@tamu.edu ISBN 978-1-4419-0943-5 DOI 10.1007/978-1-4419-0944-2 Springer New York Dordrecht Heidelberg London e-ISBN 978-1-4419-0944-2 Library of Congress Control Number: 2010920238 c Springer Science+Business Media, LLC 2010 All rights reserved. This work may not be translated or copied in whole or in part without the written permission of the publisher (Springer Science+Business Media, LLC, 233 Spring Street, New York, NY 10013, USA), except for brief excerpts in connection with reviews or scholarly analysis. Use in connection with any form of information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed is forbidden. The use in this publication of trade names, trademarks, service marks, and similar terms, even if they are not identified as such, is not to be taken as an expression of opinion as to whether or not they are subject to proprietary rights. Printed on acid-free paper Springer is part of Springer Science+Business Media (www.springer.com)
To our parents and our teachers
Foreword Single-threaded software applications have ceased to see significant gains in per- formance on a general-purpose CPU, even with further scaling in very large scale integration (VLSI) technology. This is a significant problem for electronic design automation (EDA) applications, since the design complexity of VLSI integrated circuits (ICs) is continuously growing. In this research monograph, we evaluate custom ICs, field-programmable gate arrays (FPGAs), and graphics processors as platforms for accelerating EDA algorithms, instead of the general-purpose single- threaded CPU. We study applications which are used in key time-consuming steps of the VLSI design flow. Further, these applications also have different degrees of inherent parallelism in them. We study both control-dominated EDA applications and control plus data parallel EDA applications. We accelerate these applications on these different hardware platforms. We also present an automated approach for accelerating certain uniprocessor applications on a graphics processor. This monograph compares custom ICs, FPGAs, and graphics processing units (GPUs) as potential platforms to accelerate EDA algorithms. It also provides details of the programming model used for interfacing with the GPUs. As an example of a control-dominated EDA problem, Boolean satisfiability (SAT) is accelerated using the following hardware implementations: (i) a custom IC-based hardware approach in which the traversal of the implication graph and conflict clause generation are performed in hardware, in parallel, (ii) an FPGA-based hardware approach to accel- erate SAT in which the entire SAT search algorithm is implemented in the FPGA, and (iii) a complete SAT approach which employs a new GPU-enhanced variable ordering heuristic. In this monograph, several EDA problems with varying degrees of control and data parallelisms are accelerated using a general-purpose graphics processor. In par- ticular we accelerate Monte Carlo based statistical static timing analysis, device model evaluation (for accelerating circuit simulation), fault simulation, and fault table generation on a graphics processor, with speedups of up to 800×. Addition- ally, an automated approach is presented that accelerates (on a graphics proces- sor) uniprocessor code that is executed multiple times on independent data sets in an application. The key idea here is to partition the software into kernels in an automated fashion, such that multiple independent instances of these kernels, when vii
分享到:
收藏