Chapter 1: Making the Case for High-Level Synthesis
A broken design flow
Keeping up with the pace
Benefits of High-Level Synthesis
Reducing design and verification efforts
More effective reuse
Investing R&D resources where it really matters
Seizing the opportunity
Chapter 2: General C++ Style
2.1. File Organization
2.2. Building an Executable Using Makefiles
2.2.1. Makefile Naming
2.2.2. Comments
2.2.3. Macros
2.2.4. Targets
2.2.5. Phony Targets
2.2.6. Simple Makefile Example
2.3. Header/Include Files
2.4. Test Benches
2.5. Creating a Golden Reference Design
2.5.1. Make Sure You're Fully Testing the DUT
2.5.2. Uninitialized Variables
Chapter 3: Bit Accurate Data Types
3.1. Compilation, Debug, and Simulation Speed
3.1.1. Header Files and Typedefs
3.2. Integer Data Types
3.2.1. Unsigned integer
3.2.2. Signed Integer
3.3. Fixed Point Data Types
3.3.1. Unsigned Fixed Point
3.3.2. Signed Fixed Point
3.3.3. Quantization and Overflow
3.3.4. Truncation and Rounding
3.3.5. Saturation and Overflow
3.4. Operators
3.4.1. Bitwise Arithmetic Operators: *, +, -, /, &, |, ^,%
3.4.2. Bit Select Operator: []
3.4.3. Shift Operators: <<, >>
3.4.4. Shift Right Operator: >>
Unsigned Shift Right
Signed Shift Right
Shift Left Operator: <<
Unsigned Shift Left
Signed Shift Left
Unexpected Loss of Precision
3.5. Methods
3.5.1. Slice Read: slc
3.5.2. Slice Write: set_slc
3.5.3. Explicit Conversion Functions
3.5.4. Implicit Conversion Functions
3.6. Helper/Utility Functions
3.6.1. Array Uninitialization: ac::init_array
3.6.2. ceil, floor, and nbits
3.7. Complex Data Types
Chapter 4: Fundamentals of High Level Synthesis
4.1. The Top-level Design Module
4.1.1. Registered Outputs
4.1.2. Control Ports
4.1.3. Port Width
4.1.4. Port Direction
Input ports
Output ports
Inout Ports
4.2. High-level C++ Synthesis
4.2.1. Data Flow Graph Analysis
4.2.2. Resource Allocation
4.2.3. Scheduling
4.2.4. Classic RISC Pipelining
4.2.5. Loop Pipelining
4.3. for/while/do Loops
4.3.1. What's in a Loop?
"for" Loop
"while" Loop
"do" Loop
4.3.2. Rolled Loops
4.3.3. Loop Unrolling
Partial Loop Unrolling
Fully Unrolled Loops
Dependencies Between Loop Iterations
Loops with Constant Bounds
4.3.4. Loops with Conditional Bounds
4.3.5. Optimizing the Loop Counter
4.3.6. Optimizing the Loop Control
4.3.7. Nested Loops
Unconstrained Nested Loops
Pipelined Nested Loops
Unrolling Nested Loops
4.3.8. Sequential Loops
Simple Independent Sequential Loops
Effects of Unmerged Sequential Loops
Manual merging of sequential loops
4.4. Pipeline Feedback
4.4.1. Data Feedback
4.4.2. Control Feedback
4.5. Conditions
4.5.1. Sharing
if-else statement
switch statement
Keep it Simple
4.5.2. Functions and Multiple Conditional Returns
Replacing Conditional Returns with Flags
4.5.3. References
Chapter 5: Scheduling of IO and Memories
5.1. Unconditional IO
5.1.1. Pass by Reference
5.1.2. Pass by Value
5.2. Conditional IO
5.2.1. Pass by Reference
5.2.2. Pass by Value
5.2.3. Ready/acknowledge Behavior (wait)
5.2.4. Stalling the Pipeline
5.2.5. Manually Flushing the Pipeline
5.2.6. Writing IO for Throughput
Making IO Mergable
5.3. Memories
5.3.1. Automatic Mapping of Arrays to Memories
5.3.2. Automatic Memory Merging
5.3.3. Designing for Throughput When Using Memories
Non-Mutually Exclusive Memory Accesses
Making Memory Accesses Mutually Exclusive
Manually Merging Non-Mutually Exclusive Memory Accesses
Chapter 6: Sequential and Combinational Hardware
6.1. Shift Registers
6.1.1. Basic Shift Register
6.1.2. Shift Register with Enable
6.1.3. Shift Register with Synchronous Clear
6.1.4. Shift Register with Load
6.1.5. Shift Register Template Function
6.1.6. Class Based Shift Register
6.1.7. Helper Classes for Design Reuse
Log2Ceil
NextPow2
6.2. Multiplexors
6.2.1. Binary MUX
6.2.2. Automatic Binary to Onehot MUX Optimizations
6.2.3. Manual Optimization of Binary Selection MUXes
6.2.4. One Hot MUX
6.2.5. Priority Search Hardware
6.2.6. Finding Leading 1s in a Bit-vector
6.2.7. Improved Performance and Area Using the Brute Force Approach
6.2.8. Log2(N) Based Search
6.2.9. Recursive Template Search
6.2.10. Finding the Maximum Value in an Array
6.2.11. Algorithmic Coding Style
6.2.12. Recursive Template Search
6.3. Absolute Value (abs)
6.4. Linear Feedback Shift Register (LFSR)
6.5. Accumulator
6.6. Shifters
6.6.1. Barrel shifter
Logical
Arithmetic
Bi-directional
Rotating
6.6.2. Constant Shifts
Transforming Barrel Shifters into Constant Shifts
6.6.3. Transforming Dynamic Bit Masking
6.7. Adder Trees
6.7.1. Preventing Automatic Tree Balancing
6.7.2. Coding to Facilitate Automatic Tree Balancing
6.8. Lookup Tables (LUT)
Chapter 7: Memory Architectures
7.1. Memory-based Shift Register
7.1.1. Classic Shift Register Description mapped to Memories
7.1.2. Circular Buffer
7.1.3. Initialization loops
7.2. Memory Organization
7.2.1. Interleaving Memories
7.2.2. Automatic Interleaving
7.2.3. Manual Interleaving with Random Access
7.2.4. Manual Interleaving with Sequential Access
7.3. Widening the Word Width of Memories
7.3.1. Manually Increasing Word Width with Sequential Access
7.4. Caching
7.4.1. "Windowing" of 1-D Data Streams
7.4.2. Pure Algorithmic Description with Poor Memory Architecture
7.4.3. Analyzing Array Access Patterns
7.4.4. Shift Register Sliding Window Implementation
7.4.5. Boundary Conditions
7.4.6. 2-D Windowing
7.4.7. Pure Algorithmic Description with Poor Memory Architecture
7.4.8. Analyzing Array Access Patterns
7.4.9. Circular Line Buffer Sliding Window Implementation
Chapter 8: Hierarchical Design
8.1. Arrays Shared Between Blocks
8.1.1. Out-of-order Array Access
8.1.2. Algorithmic C Channel Class
8.1.3. Using Explicit Channels
8.1.4. Using Channels at the Top-level Interface and Testbench
8.1.5. Arrays Inside of Channels
8.1.6. Arrays Mapped to Registers
8.1.7. Arrays Mapped to Memories
8.2. Blocks with Common Interface Control Variables
8.2.1. Passing Control Variables Between Blocks
8.2.2. Connecting Interface Control Variables to Multiple Blocks
8.2.8. Duplicating Control IO
8.3. Reconvergence: Balancing the Latency Between Blocks
8.3.1. Deadlock
8.3.2. Automatic Pipeline Flushing
8.3.3. Manually Setting FIFO Depths
Chapter 9: Advanced Hierarchical Design
9.1. ac_channel Methods
9.1.1. Channel size: int size()
9.1.2. Non-blocking Read: bool nb_read(T &val)
9.2. Recommended Coding Style
9.3. Feedback
9.3.1. C++ Assertion
9.3.2. Preloading the Channels/FIFOs
9.3.3. Deadlock
9.3.4. Variable Rate or Data Dependent Feedback
Chapter 10: Digital Filters
10.1. FIR Filters
10.2. Register Based Filters
10.2.1. External Coefficients
10.2.2. Constant Coefficients
10.2.3. Loadable Coefficients
10.2.4. Symmetric Coefficients
10.2.5. Even Symmetric
10.2.6. Odd Symmetric
10.2.7. Transposed
10.2.8. Systolic
10.3. Multi-rate Filtering
10.4. Using Decimation in Filters
10.4.1. Algorithmic Decimation
10.4.2. Manual Decimation
10.5. Using Interpolation in Filters
10.5.1. Algorithmic Interpolation
10.5.2. Manual Interpolation
10.6. Multi-stage Decimation
10.6.1. Multi-block
10.6.2. Single-block
Chapter 11: FFT Transform
11.1. Radix-2 FFT
11.2. Floating Point Radix-2 In-place FFT
11.3. Some Final Thoughts
11.3.1. References