logo资料库

hls_bluebook.pdf

第1页 / 共256页
第2页 / 共256页
第3页 / 共256页
第4页 / 共256页
第5页 / 共256页
第6页 / 共256页
第7页 / 共256页
第8页 / 共256页
资料共256页,剩余部分请下载后查看
Chapter 1: Making the Case for High-Level Synthesis
A broken design flow
Keeping up with the pace
Benefits of High-Level Synthesis
Reducing design and verification efforts
More effective reuse
Investing R&D resources where it really matters
Seizing the opportunity
Chapter 2: General C++ Style
2.1. File Organization
2.2. Building an Executable Using Makefiles
2.2.1. Makefile Naming
2.2.2. Comments
2.2.3. Macros
2.2.4. Targets
2.2.5. Phony Targets
2.2.6. Simple Makefile Example
2.3. Header/Include Files
2.4. Test Benches
2.5. Creating a Golden Reference Design
2.5.1. Make Sure You're Fully Testing the DUT
2.5.2. Uninitialized Variables
Chapter 3: Bit Accurate Data Types
3.1. Compilation, Debug, and Simulation Speed
3.1.1. Header Files and Typedefs
3.2. Integer Data Types
3.2.1. Unsigned integer
3.2.2. Signed Integer
3.3. Fixed Point Data Types
3.3.1. Unsigned Fixed Point
3.3.2. Signed Fixed Point
3.3.3. Quantization and Overflow
3.3.4. Truncation and Rounding
3.3.5. Saturation and Overflow
3.4. Operators
3.4.1. Bitwise Arithmetic Operators: *, +, -, /, &, |, ^,%
3.4.2. Bit Select Operator: []
3.4.3. Shift Operators: <<, >>
3.4.4. Shift Right Operator: >>
Unsigned Shift Right
Signed Shift Right
Shift Left Operator: <<
Unsigned Shift Left
Signed Shift Left
Unexpected Loss of Precision
3.5. Methods
3.5.1. Slice Read: slc
3.5.2. Slice Write: set_slc
3.5.3. Explicit Conversion Functions
3.5.4. Implicit Conversion Functions
3.6. Helper/Utility Functions
3.6.1. Array Uninitialization: ac::init_array
3.6.2. ceil, floor, and nbits
3.7. Complex Data Types
Chapter 4: Fundamentals of High Level Synthesis
4.1. The Top-level Design Module
4.1.1. Registered Outputs
4.1.2. Control Ports
4.1.3. Port Width
4.1.4. Port Direction
Input ports
Output ports
Inout Ports
4.2. High-level C++ Synthesis
4.2.1. Data Flow Graph Analysis
4.2.2. Resource Allocation
4.2.3. Scheduling
4.2.4. Classic RISC Pipelining
4.2.5. Loop Pipelining
4.3. for/while/do Loops
4.3.1. What's in a Loop?
"for" Loop
"while" Loop
"do" Loop
4.3.2. Rolled Loops
4.3.3. Loop Unrolling
Partial Loop Unrolling
Fully Unrolled Loops
Dependencies Between Loop Iterations
Loops with Constant Bounds
4.3.4. Loops with Conditional Bounds
4.3.5. Optimizing the Loop Counter
4.3.6. Optimizing the Loop Control
4.3.7. Nested Loops
Unconstrained Nested Loops
Pipelined Nested Loops
Unrolling Nested Loops
4.3.8. Sequential Loops
Simple Independent Sequential Loops
Effects of Unmerged Sequential Loops
Manual merging of sequential loops
4.4. Pipeline Feedback
4.4.1. Data Feedback
4.4.2. Control Feedback
4.5. Conditions
4.5.1. Sharing
if-else statement
switch statement
Keep it Simple
4.5.2. Functions and Multiple Conditional Returns
Replacing Conditional Returns with Flags
4.5.3. References
Chapter 5: Scheduling of IO and Memories
5.1. Unconditional IO
5.1.1. Pass by Reference
5.1.2. Pass by Value
5.2. Conditional IO
5.2.1. Pass by Reference
5.2.2. Pass by Value
5.2.3. Ready/acknowledge Behavior (wait)
5.2.4. Stalling the Pipeline
5.2.5. Manually Flushing the Pipeline
5.2.6. Writing IO for Throughput
Making IO Mergable
5.3. Memories
5.3.1. Automatic Mapping of Arrays to Memories
5.3.2. Automatic Memory Merging
5.3.3. Designing for Throughput When Using Memories
Non-Mutually Exclusive Memory Accesses
Making Memory Accesses Mutually Exclusive
Manually Merging Non-Mutually Exclusive Memory Accesses
Chapter 6: Sequential and Combinational Hardware
6.1. Shift Registers
6.1.1. Basic Shift Register
6.1.2. Shift Register with Enable
6.1.3. Shift Register with Synchronous Clear
6.1.4. Shift Register with Load
6.1.5. Shift Register Template Function
6.1.6. Class Based Shift Register
6.1.7. Helper Classes for Design Reuse
Log2Ceil
NextPow2
6.2. Multiplexors
6.2.1. Binary MUX
6.2.2. Automatic Binary to Onehot MUX Optimizations
6.2.3. Manual Optimization of Binary Selection MUXes
6.2.4. One Hot MUX
6.2.5. Priority Search Hardware
6.2.6. Finding Leading 1s in a Bit-vector
6.2.7. Improved Performance and Area Using the Brute Force Approach
6.2.8. Log2(N) Based Search
6.2.9. Recursive Template Search
6.2.10. Finding the Maximum Value in an Array
6.2.11. Algorithmic Coding Style
6.2.12. Recursive Template Search
6.3. Absolute Value (abs)
6.4. Linear Feedback Shift Register (LFSR)
6.5. Accumulator
6.6. Shifters
6.6.1. Barrel shifter
Logical
Arithmetic
Bi-directional
Rotating
6.6.2. Constant Shifts
Transforming Barrel Shifters into Constant Shifts
6.6.3. Transforming Dynamic Bit Masking
6.7. Adder Trees
6.7.1. Preventing Automatic Tree Balancing
6.7.2. Coding to Facilitate Automatic Tree Balancing
6.8. Lookup Tables (LUT)
Chapter 7: Memory Architectures
7.1. Memory-based Shift Register
7.1.1. Classic Shift Register Description mapped to Memories
7.1.2. Circular Buffer
7.1.3. Initialization loops
7.2. Memory Organization
7.2.1. Interleaving Memories
7.2.2. Automatic Interleaving
7.2.3. Manual Interleaving with Random Access
7.2.4. Manual Interleaving with Sequential Access
7.3. Widening the Word Width of Memories
7.3.1. Manually Increasing Word Width with Sequential Access
7.4. Caching
7.4.1. "Windowing" of 1-D Data Streams
7.4.2. Pure Algorithmic Description with Poor Memory Architecture
7.4.3. Analyzing Array Access Patterns
7.4.4. Shift Register Sliding Window Implementation
7.4.5. Boundary Conditions
7.4.6. 2-D Windowing
7.4.7. Pure Algorithmic Description with Poor Memory Architecture
7.4.8. Analyzing Array Access Patterns
7.4.9. Circular Line Buffer Sliding Window Implementation
Chapter 8: Hierarchical Design
8.1. Arrays Shared Between Blocks
8.1.1. Out-of-order Array Access
8.1.2. Algorithmic C Channel Class
8.1.3. Using Explicit Channels
8.1.4. Using Channels at the Top-level Interface and Testbench
8.1.5. Arrays Inside of Channels
8.1.6. Arrays Mapped to Registers
8.1.7. Arrays Mapped to Memories
8.2. Blocks with Common Interface Control Variables
8.2.1. Passing Control Variables Between Blocks
8.2.2. Connecting Interface Control Variables to Multiple Blocks
8.2.8. Duplicating Control IO
8.3. Reconvergence: Balancing the Latency Between Blocks
8.3.1. Deadlock
8.3.2. Automatic Pipeline Flushing
8.3.3. Manually Setting FIFO Depths
Chapter 9: Advanced Hierarchical Design
9.1. ac_channel Methods
9.1.1. Channel size: int size()
9.1.2. Non-blocking Read: bool nb_read(T &val)
9.2. Recommended Coding Style
9.3. Feedback
9.3.1. C++ Assertion
9.3.2. Preloading the Channels/FIFOs
9.3.3. Deadlock
9.3.4. Variable Rate or Data Dependent Feedback
Chapter 10: Digital Filters
10.1. FIR Filters
10.2. Register Based Filters
10.2.1. External Coefficients
10.2.2. Constant Coefficients
10.2.3. Loadable Coefficients
10.2.4. Symmetric Coefficients
10.2.5. Even Symmetric
10.2.6. Odd Symmetric
10.2.7. Transposed
10.2.8. Systolic
10.3. Multi-rate Filtering
10.4. Using Decimation in Filters
10.4.1. Algorithmic Decimation
10.4.2. Manual Decimation
10.5. Using Interpolation in Filters
10.5.1. Algorithmic Interpolation
10.5.2. Manual Interpolation
10.6. Multi-stage Decimation
10.6.1. Multi-block
10.6.2. Single-block
Chapter 11: FFT Transform
11.1. Radix-2 FFT
11.2. Floating Point Radix-2 In-place FFT
11.3. Some Final Thoughts
11.3.1. References
HLS Bluebook Software Version v10.4a September 2019 This document contains information that is proprietary to Mentor Graphics Corporation. The original recipient of this document may duplicate this document in whole or in part for internal business purposes only, provided that this entire notice appears in all copies. In duplicating any part of this document, the recipient agrees to make every reasonable effort to prevent the unau- thorized use and distribution of the proprietary information.
This document is for information and instruction purposes. Mentor Graphics reserves the right to make changes in specifications and other information contained in this publication without prior notice, and the reader should, in all cases, consult Mentor Graphics to determine whether any changes have been made. The terms and conditions governing the sale and licensing of Mentor Graphics products are set forth in written agreements between Mentor Graphics and its customers. No representation or other affirmation of fact con- tained in this publication shall be deemed to be a warranty or give rise to any liability of Mentor Graphics what- soever. MENTOR GRAPHICS MAKES NO WARRANTY OF ANY KIND WITH REGARD TO THIS MATERIAL INCLUD- ING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. MENTOR GRAPHICS SHALL NOT BE LIABLE FOR ANY INCIDENTAL, INDIRECT, SPECIAL, OR CONSE- QUENTIAL DAMAGES WHATSOEVER (INCLUDING BUT NOT LIMITED TO LOST PROFITS) ARISING OUT OF OR RELATED TO THIS PUBLICATION OR THE INFORMATION CONTAINED IN IT, EVEN IF MENTOR GRAPHICS HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. U.S. GOVERNMENT LICENSE RIGHTS: The software and documentation were developed entirely at private expense and are commercial computer software and commercial computer software documentation within the meaning of the applicable acquisition regulations. Accordingly, pursuant to FAR 48 CFR 12.212 and DFARS 48 CFR 227.7202, use, duplication and disclosure by or for the U.S. Government or a U.S. Government subcon- tractor is subject solely to the terms and conditions set forth in the license agreement provided with the soft- ware, except for provisions which are contrary to applicable mandatory federal laws. TRADEMARKS: The trademarks, logos and service marks ("Marks") used herein are the property of Mentor Graphics Corporation or other parties. No one is permitted to use these Marks without the prior written consent of Mentor Graphics or the owner of the Mark, as applicable. The use herein of a third- party Mark is not an at- tempt to indicate Mentor Graphics as a source of a product, but is intended to indicate a product from, or asso- ciated with, a particular third party. A current list of Mentor Graphics' trademarks may be viewed at: www.mentor.com/trademarks. The registered trademark Linux® is used pursuant to a sublicense from LMI, the exclusive licensee of Linus Torvalds, owner of the mark on a world-wide basis. End-User License Agreement: You can print a copy of the End-User License Agreement from: www.mentor.com/eula. 8005 S.W. Boeckman Road, Wilsonville, Oregon 97070-7777. Mentor Graphics Corporation Telephone: 503.685.7000 Toll-Free Telephone: 800.592.2210 Website: www.mentor.com SupportNet: supportnet.mentor.com/
Table of Contents Table of Contents Chapter 1: Making the Case for High-Level Synthesis.....................................2 Chapter 2: General C++ Style........................................................................5 2.1. File Organization......................................................................................................5 2.2. Building an Executable Using Makefiles..................................................................5 2.2.1. Makefile Naming................................................................................................6 2.2.2. Comments.........................................................................................................6 2.2.3. Macros...............................................................................................................6 2.2.4. Targets...............................................................................................................6 2.2.5. Phony Targets....................................................................................................6 2.2.6. Simple Makefile Example..................................................................................6 2.3. Header/Include Files................................................................................................7 2.4. Test Benches............................................................................................................9 2.5. Creating a Golden Reference Design.......................................................................9 2.5.1. Make Sure You're Fully Testing the DUT...........................................................11 2.5.2. Uninitialized Variables.....................................................................................11 Chapter 3: Bit Accurate Data Types..............................................................13 3.1. Compilation, Debug, and Simulation Speed...........................................................13 3.1.1. Header Files and Typedefs...............................................................................14 3.2. Integer Data Types.................................................................................................14 3.2.1. Unsigned integer.............................................................................................14 3.2.2. Signed Integer.................................................................................................16 3.3. Fixed Point Data Types..........................................................................................17 3.3.1. Unsigned Fixed Point.......................................................................................18 3.3.2. Signed Fixed Point...........................................................................................19 3.3.3. Quantization and Overflow..............................................................................20 3.3.4. Truncation and Rounding.................................................................................20 3.3.5. Saturation and Overflow..................................................................................21 3.4. Operators...............................................................................................................22 3.4.1. Bitwise Arithmetic Operators: *, +, -, /, &, |, ^,%............................................23 3.4.2. Bit Select Operator: [].....................................................................................23 3.4.3. Shift Operators: <<, >>.................................................................................23 3.4.4. Shift Right Operator: >>.................................................................................23 3.5. Methods.................................................................................................................27 3.5.1. Slice Read: slc.................................................................................................27 3.5.2. Slice Write: set_slc...........................................................................................28 3.5.3. Explicit Conversion Functions..........................................................................28 3.5.4. Implicit Conversion Functions..........................................................................28 3.6. Helper/Utility Functions..........................................................................................29 3.6.1. Array Uninitialization: ac::init_array................................................................29 3.6.2. ceil, floor, and nbits.........................................................................................29 HLS Bluebook v10.4a September 2019 iii
Table of Contents 3.7. Complex Data Types..............................................................................................29 Chapter 4: Fundamentals of High Level Synthesis.........................................30 4.1. The Top-level Design Module.................................................................................30 4.1.1. Registered Outputs..........................................................................................31 4.1.2. Control Ports....................................................................................................31 4.1.3. Port Width........................................................................................................31 4.1.4. Port Direction...................................................................................................31 4.2. High-level C++ Synthesis......................................................................................32 4.2.1. Data Flow Graph Analysis................................................................................32 4.2.2. Resource Allocation.........................................................................................32 4.2.3. Scheduling.......................................................................................................33 4.2.4. Classic RISC Pipelining.....................................................................................34 4.2.5. Loop Pipelining................................................................................................35 4.3. for/while/do Loops..................................................................................................38 4.3.1. What's in a Loop?............................................................................................39 4.3.2. Rolled Loops....................................................................................................40 4.3.3. Loop Unrolling.................................................................................................41 4.3.4. Loops with Conditional Bounds........................................................................45 4.3.5. Optimizing the Loop Counter...........................................................................46 4.3.6. Optimizing the Loop Control............................................................................47 4.3.7. Nested Loops...................................................................................................48 4.3.8. Sequential Loops.............................................................................................63 4.4. Pipeline Feedback..................................................................................................67 4.4.1. Data Feedback.................................................................................................67 4.4.2. Control Feedback.............................................................................................72 4.5. Conditions..............................................................................................................74 4.5.1. Sharing............................................................................................................74 4.5.2. Functions and Multiple Conditional Returns.....................................................76 4.5.3. References.......................................................................................................78 Chapter 5: Scheduling of IO and Memories...................................................79 5.1. Unconditional IO....................................................................................................79 5.1.1. Pass by Reference...........................................................................................79 5.1.2. Pass by Value...................................................................................................82 5.2. Conditional IO........................................................................................................84 5.2.1. Pass by Reference...........................................................................................84 5.2.2. Pass by Value...................................................................................................89 5.2.3. Ready/acknowledge Behavior (wait)...............................................................91 5.2.4. Stalling the Pipeline.........................................................................................93 5.2.5. Manually Flushing the Pipeline........................................................................96 5.2.6. Writing IO for Throughput................................................................................97 5.3. Memories.............................................................................................................101 5.3.1. Automatic Mapping of Arrays to Memories....................................................102 5.3.2. Automatic Memory Merging..........................................................................104 HLS Bluebook v10.4a September 2019 iv
Table of Contents 5.3.3. Designing for Throughput When Using Memories.........................................107 Chapter 6: Sequential and Combinational Hardware...................................111 6.1. Shift Registers.....................................................................................................111 6.1.1. Basic Shift Register........................................................................................111 6.1.2. Shift Register with Enable.............................................................................113 6.1.3. Shift Register with Synchronous Clear...........................................................113 6.1.4. Shift Register with Load.................................................................................114 6.1.5. Shift Register Template Function...................................................................115 6.1.6. Class Based Shift Register.............................................................................116 6.1.7. Helper Classes for Design Reuse...................................................................119 6.2. Multiplexors.........................................................................................................121 6.2.1. Binary MUX....................................................................................................121 6.2.2. Automatic Binary to Onehot MUX Optimizations...........................................122 6.2.3. Manual Optimization of Binary Selection MUXes...........................................123 6.2.4. One Hot MUX.................................................................................................124 6.2.5. Priority Search Hardware...............................................................................125 6.2.6. Finding Leading 1s in a Bit-vector.................................................................125 6.2.7. Improved Performance and Area Using the Brute Force Approach................126 6.2.8. Log2(N) Based Search...................................................................................128 6.2.9. Recursive Template Search...........................................................................130 6.2.10. Finding the Maximum Value in an Array......................................................131 6.2.11. Algorithmic Coding Style.............................................................................132 6.2.12. Recursive Template Search.........................................................................133 6.3. Absolute Value (abs)............................................................................................135 6.4. Linear Feedback Shift Register (LFSR).................................................................138 6.5. Accumulator.........................................................................................................139 6.6. Shifters................................................................................................................140 6.6.1. Barrel shifter..................................................................................................141 6.6.2. Constant Shifts..............................................................................................144 6.6.3. Transforming Dynamic Bit Masking...............................................................145 6.7. Adder Trees..........................................................................................................146 6.7.1. Preventing Automatic Tree Balancing............................................................147 6.7.2. Coding to Facilitate Automatic Tree Balancing..............................................148 6.8. Lookup Tables (LUT).............................................................................................149 Chapter 7: Memory Architectures..............................................................153 7.1. Memory-based Shift Register...............................................................................153 7.1.1. Classic Shift Register Description mapped to Memories...............................153 7.1.2. Circular Buffer...............................................................................................154 7.1.3. Initialization loops.........................................................................................156 7.2. Memory Organization..........................................................................................156 7.2.1. Interleaving Memories...................................................................................157 7.2.2. Automatic Interleaving..................................................................................157 HLS Bluebook v10.4a September 2019 v
Table of Contents 7.2.3. Manual Interleaving with Random Access.....................................................158 7.2.4. Manual Interleaving with Sequential Access.................................................161 7.3. Widening the Word Width of Memories...............................................................163 7.3.1. Manually Increasing Word Width with Sequential Access..............................164 7.4. Caching................................................................................................................167 7.4.1. "Windowing" of 1-D Data Streams.................................................................171 7.4.2. Pure Algorithmic Description with Poor Memory Architecture.......................171 7.4.3. Analyzing Array Access Patterns...................................................................173 7.4.4. Shift Register Sliding Window Implementation..............................................174 7.4.5. Boundary Conditions.....................................................................................175 7.4.6. 2-D Windowing..............................................................................................176 7.4.7. Pure Algorithmic Description with Poor Memory Architecture.......................176 7.4.8. Analyzing Array Access Patterns...................................................................177 7.4.9. Circular Line Buffer Sliding Window Implementation.....................................178 Chapter 8: Hierarchical Design..................................................................181 8.1. Arrays Shared Between Blocks............................................................................181 8.1.1. Out-of-order Array Access.............................................................................181 8.1.2. Algorithmic C Channel Class..........................................................................183 8.1.3. Using Explicit Channels.................................................................................184 8.1.4. Using Channels at the Top-level Interface and Testbench.............................185 8.1.5. Arrays Inside of Channels..............................................................................187 8.1.6. Arrays Mapped to Registers...........................................................................187 8.1.7. Arrays Mapped to Memories..........................................................................190 8.2. Blocks with Common Interface Control Variables................................................192 8.2.1. Passing Control Variables Between Blocks....................................................192 8.2.2. Connecting Interface Control Variables to Multiple Blocks............................194 8.2.8. Duplicating Control IO...................................................................................196 8.3. Reconvergence: Balancing the Latency Between Blocks.....................................199 8.3.1. Deadlock.......................................................................................................199 8.3.2. Automatic Pipeline Flushing..........................................................................202 8.3.3. Manually Setting FIFO Depths.......................................................................202 Chapter 9: Advanced Hierarchical Design...................................................205 9.1. ac_channel Methods............................................................................................205 9.1.1. Channel size: int size()..................................................................................205 9.1.2. Non-blocking Read: bool nb_read(T &val).....................................................205 9.2. Recommended Coding Style................................................................................206 9.3. Feedback.............................................................................................................207 9.3.1. C++ Assertion...............................................................................................208 9.3.2. Preloading the Channels/FIFOs......................................................................209 9.3.3. Deadlock.......................................................................................................209 9.3.4. Variable Rate or Data Dependent Feedback..................................................210 Chapter 10: Digital Filters.........................................................................212 HLS Bluebook v10.4a September 2019 vi
Table of Contents 10.1. FIR Filters...........................................................................................................212 10.2. Register Based Filters........................................................................................213 10.2.1. External Coefficients...................................................................................213 10.2.2. Constant Coefficients..................................................................................215 10.2.3. Loadable Coefficients..................................................................................215 10.2.4. Symmetric Coefficients...............................................................................216 10.2.5. Even Symmetric..........................................................................................216 10.2.6. Odd Symmetric...........................................................................................217 10.2.7. Transposed..................................................................................................218 10.2.8. Systolic........................................................................................................220 10.3. Multi-rate Filtering.............................................................................................222 10.4. Using Decimation in Filters................................................................................222 10.4.1. Algorithmic Decimation...............................................................................222 10.4.2. Manual Decimation......................................................................................225 10.5. Using Interpolation in Filters..............................................................................229 10.5.1. Algorithmic Interpolation.............................................................................229 10.5.2. Manual Interpolation...................................................................................231 10.6. Multi-stage Decimation......................................................................................233 10.6.1. Multi-block...................................................................................................233 10.6.2. Single-block.................................................................................................234 Chapter 11: FFT Transform........................................................................240 11.1. Radix-2 FFT........................................................................................................240 11.2. Floating Point Radix-2 In-place FFT....................................................................241 11.3. Some Final Thoughts.........................................................................................249 11.3.1. References...................................................................................................249 HLS Bluebook v10.4a September 2019 vii
HLS Bluebook v10.4a September 2019 1
分享到:
收藏