[2010] Hardware Acceleration of EDA Algorithms - Custom ICs, FPGAs and GPUs.pdf

发布时间：2022-06-16 发布人：admin 分类：说明书资料大小：1.34M 资料格式：pdf 举报版权申诉

5dfbf89f-9bec-46d1-bbfd-4bf8d0114b35.pdf-第1页.png

第1页 / 共207页

5dfbf89f-9bec-46d1-bbfd-4bf8d0114b35.pdf-第2页.png

第2页 / 共207页

5dfbf89f-9bec-46d1-bbfd-4bf8d0114b35.pdf-第3页.png

第3页 / 共207页

5dfbf89f-9bec-46d1-bbfd-4bf8d0114b35.pdf-第4页.png

第4页 / 共207页

5dfbf89f-9bec-46d1-bbfd-4bf8d0114b35.pdf-第5页.png

第5页 / 共207页

5dfbf89f-9bec-46d1-bbfd-4bf8d0114b35.pdf-第6页.png

第6页 / 共207页

5dfbf89f-9bec-46d1-bbfd-4bf8d0114b35.pdf-第7页.png

第7页 / 共207页

5dfbf89f-9bec-46d1-bbfd-4bf8d0114b35.pdf-第8页.png

第8页 / 共207页

1441909435

Hardware Acceleration of EDA Algorithms

Foreword

Preface

Acknowledgments

Contents

List of Tables

List of Figures

Part I Alternative Hardware Platforms

1 Introduction

1.1 Hardware Platforms Considered in This Research Monograph

1.2 EDA Algorithms Studied in This Research Monograph

1.2.1 Control-Dominated Applications

1.2.2 Control Plus Data Parallel Applications

1.3 Automated Approach for GPU-Based Software Acceleration

1.4 Chapter Summary

References

2 Hardware Platforms

2.1 Chapter Overview

2.2 Introduction

2.3 Hardware Platforms Studied in This Research Monograph

2.3.1 Custom ICs

2.3.2 FPGAs

2.3.3 Graphics Processors

2.4 General Overview and Architecture

2.5 Programming Model and Environment

2.6 Scalability

2.7 Design Turn-Around Time

2.8 Performance

2.9 Cost of Hardware

2.10 Floating Point Operations

2.11 Security and Real-Time Applications

2.12 Applications

2.13 Chapter Summary

References

3 GPU Architecture and the CUDA Programming Model

3.1 Chapter Overview

3.2 Introduction

3.3 Hardware Model

3.4 Memory Model

3.5 Programming Model

3.6 Chapter Summary

References

Part II Control-Dominated Category

4 Accelerating Boolean Satisfiability on a Custom IC

4.1 Chapter Overview

4.2 Introduction

4.3 Previous Work

4.4 Hardware Architecture

4.4.1 Abstract Overview

4.4.2 Hardware Overview

4.4.3 Hardware Details

4.4.3.1 Decision Engine

4.4.3.2 Clause Cell

4.4.3.3 Base Cell

4.4.3.4 Partitioning the Hardware

4.4.3.5 Inter-bank Communication

4.5 An Example of Conflict Clause Generation

4.6 Partitioning the CNF Instance

4.7 Extraction of the Unsatisfiable Core

4.8 Experimental Results

4.9 Chapter Summary

References

5 Accelerating Boolean Satisfiability on an FPGA

5.1 Chapter Overview

5.2 Introduction

5.3 Previous Work

5.4 Hardware Architecture

5.4.1 Architecture Overview

5.5 Solving a CNF Instance Which Is Partitioned into Several Bins

5.6 Partitioning the CNF Instance

5.7 Hardware Details

5.8 Experimental Results

5.8.1 Current Implementation

5.8.2 Performance Model

5.8.2.1 FPGA Resources

5.8.2.2 Clauses/Variable Ratio

5.8.2.3 Cycles Versus Bin Size

5.8.2.4 Bins Touched Versus Bin Size

5.8.2.5 Bin Size

5.8.3 Projections

5.9 Chapter Summary

References

6 Accelerating Boolean Satisfiability on a Graphics Processing Unit

6.1 Chapter Overview

6.2 Introduction

6.3 Related Previous Work

6.4 Our Approach

6.4.1 SurveySAT and the GPU

6.4.1.1 SurveySAT

6.4.1.2 SurveySAT on the GPU

6.4.1.3 SurveySAT Results on the GPU

6.4.2 MiniSAT Enhanced with Survey Propagation (MESP)

6.5 Experimental Results

6.6 Chapter Summary

References

Part III Control Plus Data Parallel Applications

7 Accelerating Statistical Static Timing Analysis Using Graphics Processors

7.1 Chapter Overview

7.2 Introduction

7.3 Previous Work

7.4 Our Approach

7.4.1 Static Timing Analysis (STA) at a Gate

7.4.2 Statistical Static Timing Analysis (SSTA) at a Gate

7.5 Experimental Results

7.6 Chapter Summary

References

8 Accelerating Fault Simulation Using Graphics Processors

8.1 Chapter Overview

8.2 Introduction

8.3 Previous Work

8.4 Our Approach

8.4.1 Logic Simulation at a Gate

8.4.2 Fault Injection at a Gate

8.4.3 Fault Detection at a Gate

8.4.4 Fault Simulation of a Circuit

8.5 Experimental Results

8.6 Chapter Summary

References

9 Fault Table Generation Using Graphics Processors

9.1 Chapter Overview

9.2 Introduction

9.3 Previous Work

9.4 Our Approach

9.4.1 Definitions

9.4.2 Algorithms: FSIM* and GFTABLE

9.4.2.1 Generating Vectors (Line 9)

9.4.2.2 Fault-Free Simulation (Line 10)

9.4.2.3 Computing Detectabilities and Cumulative Detectabilities (Lines 13, 14)

9.4.2.4 Fault Simulation of SR(s) (Lines 15, 16)

9.4.2.5 Generating the Fault Table (Lines 22--31)

9.5 Experimental Results

9.6 Chapter Summary

References

10 Accelerating Circuit Simulation Using Graphics Processors

10.1 Chapter Overview

10.2 Introduction

10.3 Previous Work

10.4 Our Approach

10.4.1 Parallelizing BSIM3 Model Computations on a GPU

10.4.1.1 Inlining if--then--else Code

10.4.1.2 Partitioning the BSIM3 Code into Kernels

10.4.1.3 Efficient Use of GPU Memory Model

10.4.1.4 Thread Scheduler and Code Statistics

10.5 Experiments

10.6 Chapter Summary

References

Part IV Automated Generation of GPU Code

11 Automated Approach for Graphics Processor Based Software Acceleration

11.1 Chapter Overview

11.2 Introduction

11.3 Our Approach

11.3.1 Problem Definition

11.3.2 GPU Constraints on the Kernel Generation Engine

11.3.3 Automatic Kernel Generation Engine

11.3.3.1 Node Duplication

11.3.3.2 Cost of a Partitioning Solution

11.4 Experimental Results

11.4.1 Evaluation Methodology

11.5 Chapter Summary

References

12 Conclusions

References

Index

Hardware Acceleration of EDA Algorithms

Kanupriya Gulati · Sunil P. Khatri Hardware Acceleration of EDA Algorithms Custom ICs, FPGAs and GPUs 123

Kanupriya Gulati 109 Branchwood Trl Coppell TX 75019 USA kgulati@tamu.edu Sunil P. Khatri Department of Electrical & Computer Engineering Texas A & M University College Station TX 77843-3128 214 Zachry Engineering Center USA sunilkhatri@tamu.edu ISBN 978-1-4419-0943-5 DOI 10.1007/978-1-4419-0944-2 Springer New York Dordrecht Heidelberg London e-ISBN 978-1-4419-0944-2 Library of Congress Control Number: 2010920238 c Springer Science+Business Media, LLC 2010 All rights reserved. This work may not be translated or copied in whole or in part without the written permission of the publisher (Springer Science+Business Media, LLC, 233 Spring Street, New York, NY 10013, USA), except for brief excerpts in connection with reviews or scholarly analysis. Use in connection with any form of information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed is forbidden. The use in this publication of trade names, trademarks, service marks, and similar terms, even if they are not identiﬁed as such, is not to be taken as an expression of opinion as to whether or not they are subject to proprietary rights. Printed on acid-free paper Springer is part of Springer Science+Business Media (www.springer.com)

To our parents and our teachers

Foreword Single-threaded software applications have ceased to see signiﬁcant gains in per- formance on a general-purpose CPU, even with further scaling in very large scale integration (VLSI) technology. This is a signiﬁcant problem for electronic design automation (EDA) applications, since the design complexity of VLSI integrated circuits (ICs) is continuously growing. In this research monograph, we evaluate custom ICs, ﬁeld-programmable gate arrays (FPGAs), and graphics processors as platforms for accelerating EDA algorithms, instead of the general-purpose single- threaded CPU. We study applications which are used in key time-consuming steps of the VLSI design ﬂow. Further, these applications also have different degrees of inherent parallelism in them. We study both control-dominated EDA applications and control plus data parallel EDA applications. We accelerate these applications on these different hardware platforms. We also present an automated approach for accelerating certain uniprocessor applications on a graphics processor. This monograph compares custom ICs, FPGAs, and graphics processing units (GPUs) as potential platforms to accelerate EDA algorithms. It also provides details of the programming model used for interfacing with the GPUs. As an example of a control-dominated EDA problem, Boolean satisﬁability (SAT) is accelerated using the following hardware implementations: (i) a custom IC-based hardware approach in which the traversal of the implication graph and conﬂict clause generation are performed in hardware, in parallel, (ii) an FPGA-based hardware approach to accel- erate SAT in which the entire SAT search algorithm is implemented in the FPGA, and (iii) a complete SAT approach which employs a new GPU-enhanced variable ordering heuristic. In this monograph, several EDA problems with varying degrees of control and data parallelisms are accelerated using a general-purpose graphics processor. In par- ticular we accelerate Monte Carlo based statistical static timing analysis, device model evaluation (for accelerating circuit simulation), fault simulation, and fault table generation on a graphics processor, with speedups of up to 800×. Addition- ally, an automated approach is presented that accelerates (on a graphics proces- sor) uniprocessor code that is executed multiple times on independent data sets in an application. The key idea here is to partition the software into kernels in an automated fashion, such that multiple independent instances of these kernels, when vii

分享到：

赞收藏

资料库

[2010] Hardware Acceleration of EDA Algorithms - Custom ICs, FPGAs and GPUs.pdf

相关推荐

行业

热门标签

最新资料