高级FPGA系统设计Ver.5.0.0908
致谢
课程主要内容
课程主要内容
主题
主题1. FPGA设计流程与时序收敛
实验室设备介绍
实验室设备介绍
实验室设备介绍
实验室设备介绍
实验室设备介绍
在研项目-高速无损检测机
在研项目-雷达信号处理机
Outline
几个小问题?
幻灯片编号 11
处理单元
异步电路 Vs. 同步电路 ?
幻灯片编号 14
软件无线电
IF软件无线电
幻灯片编号 17
幻灯片编号 18
幻灯片编号 19
幻灯片编号 20
Outline
FPGA架构
概述
Slices and CLBs
Simplified Slice Structure
查找表(Look-Up Tables)
使用LUT实现移位flip-flop Shift Register (SRL16CE)
Shift Register LUT(SRL)例子
灵活的时序单元(Sequential Elements)
Distributed SelectRAM资源
Block SelectRAM资源
专用 Multiplier Blocks
Global Clock 布线资源
Global Clock 布线资源
Global Clock 布线资源
数字时钟管理器 (DCM)
幻灯片编号 41
幻灯片编号 42
IOB Element
SelectIO标准
数控匹配阻抗(DCI)
Outline
附加约束的基本作用
? 路径(Path)
周期约束
周期约束
周期约束:精确的时间信息
输入时钟抖动是时钟不确定性的原因之一(Clock Uncertainty)
周期约束的计算
偏移约束
幻灯片编号 55
OFFSET_IN_BEFORE和OFFSET_IN_AFTER
OFFSET_IN的计算
幻灯片编号 58
OFFSET_OUT_BEFORE和OFFSET_OUT_AFTER
OFFSET_OUT的计算
幻灯片编号 61
Offset Constraints 自动处理时钟延迟和抖动
Offset Constraints Account for Clock Delay
约束编辑器(Constraints Editor)
输入约束
轻松的小问题
Path-Specific Timing Constraints
More About Path-Specific Timing Constraints
Advanced Tab of the Constraints Editor
Creating Groups of End Points
Pin-Specific OFFSET Constraints
Creating Groups of Pads
Creating Group OFFSET Constraints
OFFSET Constraints with Two-Phase Clocks
Constraining Between Related Clock Domains
Constraining Between Unrelated Clock Domains
Multicycle Path Constraints
False Paths
Creating Multicycle Path and False Path Constraints
Timing Constraint Priority
Timing Reports
使用Timing Analyzer 查看时序报告(Timing Reports)
Timing Report Structure
Using the Timing Analyzer
Timing Analyzer GUI
幻灯片编号 86
Cross-Probing
Report Example:Tilo
Estimating Design Performance
60/40 Rule
Analyzing Post-Place & Route Timing
Case 1
Poor Placement: Solutions
Case 2
High Fanout: Solutions
Case 3
Too Many Logic Levels: Solutions
I/O位置约束
通用I/O Layout 指导
通用I/O Layout 指导
大型器件I/O Layout 指导
Outline
Xilinx 设计流程
Outline
幻灯片编号 112
幻灯片编号 113
System Generator for DSP平台设计
SysGen 设计流程
建立一个System Generator 设计
System Generator 设计
幻灯片编号 118
Xilinx Processor家族
PowerPC-Based Embedded Design
MicroBlaze-Based Embedded Design
Xilinx Platform Studio (XPS)
Embedded DevelopmentTool Flow Overview
幻灯片编号 124
幻灯片编号 125
Timing Closure
幻灯片编号 127
主题
主题2. 高级FPGA结构与资源
Outline
Fourth-Generation Virtex Family
幻灯片编号 4
Increased Functionality with Dramatic Power Reduction
幻灯片编号 6
Virtex-4 FPGA CLB
I/O Banking
IOB Tile
幻灯片编号 10
Data Input DDR_CLOCK_EDGE = Opposite Edge
Data Input DDR_CLOCK_EDGE = Same Edge
Data Input DDR_CLOCK_EDGE = Same Edge Pipelined
ISERDES
Use Examples
ILOGIC Block Diagram
Input DDR VHDL Code Example
Input DDR Verilog Code Example
IDDR VHDL Instantiate primitives Example
IDDR VHDL Instantiate primitives Example
OLOGIC Block Diagram
Output DDR in OPPOSITE_EDGE Mode
Output DDR in SAME_EDGE Mode
OSERDES
ODDR VHDL Instantiate primitives Example
ChipSync WizardMemory Applications: General and Data Setup
Memory Interface GeneratorMemory Corner: www.xilinx.com Technology Solutions Memory
Virtex-4 FPGA Block RAM
New Block RAM
FIFO16
Block RAM FIFO Feature Summary
FIFO Operation Mode
FIFO16 Use
Virtex-4 FPGA XtremeDSP Technology Slice
XtremeDSP Technology Slice Advantages
Virtex-II Pro FPGA Design Example
Virtex-II Pro FPGA Design Example
Virtex-4 FPGA Design Example
DSP Tile
幻灯片编号 40
The Most Advanced Serial I/O
Integrated PowerPC 405 Processor CoreWorld’s Most Popular Embedded Processor Architecture
New Tri-Mode Ethernet MAC
Knowledge Check & Answers
Knowledge Check & Answers
Knowledge Check & Answers
Knowledge Check & Answers
Summary
Where Can I Learn More?
Outline
Objectives
Virtex-4 FPGA Clocking Floorplan
Virtex-4 FPGA Clocking
Global Clock Pads
Clock Nets Can Drive Nonclock Pins
BUFGCTRL
Virtex-4 FPGA ClockingBUFGCTRL Drivers
Virtex-4 FPGA ClockingBUFGCTRL – Providing all the Hooks
BUFGCTRL Example
BUFGCTRL Attributes
BUFGCTRL Truth Table
BUFG
BUFG Multiplexer
BUFGMUX_VIRTEX4
BUFGCTRL 2:1 Multiplexer
BUFGCE
BUFGCE Design Example
BUFGMUX+CE Example
Regional Clocking
Clock-Capable I/O
BUFIO
BUFIO
Regional Clock Buffer (BUFR)
Regional Clock BufferExample
Multiple Clocks?
Use
Global Clock Buffers SetupIP (COREGen & Architecture Wizard)
BUFR and BUFIO Instantiation
Knowledge Check
Answers
Summary
Outline
Objectives
Virtex Family Product and Process Evolution
65-nm Xilinx Virtex-5 Platform FPGA New Benchmark in FPGA Price Performance
Virtex-5 FPGA Platform Feature Overview
Virtex-5 and Virtex-4 FPGA Comparison
Virtex-5 and Virtex-4 FPGA Comparison
Virtex-5 and Virtex-4 FPGA Comparison
Higher Performance 1.25-Gbps SelectIO Technology
Second-Generation ChipSync Calibration Circuitry for Every I/O
Measurements Show 35 Percent Lower Dynamic Power
New and Enhanced Hardened IPDrives System Integration
Continuing the Drive for Innovation
Optimized Performance and Area6-Input LUT Yields Best Results at 65 nm
Real 6-Input LUTReduce Logic Levels and Improve Performance
Virtex-5 FPGA Clock Management Tile
Virtex-5 FPGA Block RAM and FIFO
DSP48E Slice
Summary
Where Can I Learn More?
Outline
Objectives
Virtex-5 FPGA Floorplan
Revamped CLB
SLICEL and SLICEM
Real 6-Input LUTReduce Logic Levels and Improve Performance
6-Input LUT with Dual Output
6-Input LUTs are Better
Quad-Port Memory in One SLICEM
32-Bit Shift Registers in One LUT
Knowledge Check & Answers
Knowledge Check & Answers
Knowledge Check & Answers
Knowledge Check & Answers
Application Example MicroBlaze Processor
CLB Summary
Knowledge Check & Answer
Summary
Outline
Objectives
Virtex-5 FPGA Memory OptionsThe Right Memory for the Application
Distributed Memory
Block RAM and FIFO Block
Interfacing to External Memories
Virtex-5 FPGA Control Logic SchemeDynamic Power Reduction
Many Block RAM Configurations
BRAM/FIFO Memory Capacity
Main Changes from Virtex-4 Block RAM
Virtex-5 FPGA Block RAM and FIFO Enhancements
Independent 18-kb Block RAM and FIFO
Simple Dual-Port or Single-Port Block RAM
Single-Port Mode
Simple Dual-Port Mode
True Dual-Port Mode
Block RAM is Cascadable
Output Register Set/Reset
Block RAM Use
RAMB Primitives
WE During Write First Mode
IP (COREGen & Architecture Wizard)
Knowledge Check & Answer
FIFO18/36 Top-Level View
FIFO18/36
Two Modes
Virtex-5 FPGA FIFOs are Cascadable
Cascading FIFO to Increase Depth
Cascading FIFO in Width
Cascading FIFO in Width
FIFO Functions: Reset
FIFO Functions: FIFO Empty
FIFO Functions: RDERR
FIFO Functions: FIFO Full
FIFO Functions: WRERR
FIFO Functions: Almost Empty and Almost Full
FIFO Functions: WRCOUNT and RDCOUNT
Integrated Error Correction
FIFO18/36 Use
Knowledge Check & Answers
Summary
Where Can I Learn More?
Outline
Objectives
DSP48EExtending the Capabilities of the DSP48 Slice
DSP48E Slice Power Estimate
Virtex-4 FPGA DSP48 Slice
Virtex-5 FPGA (25x18) Multiplier
Independent C Input
SIMD and Logic Unit
幻灯片编号 174
Basic Logic Functions
A Input Cascade
Pattern Detector
Attributes
Attributes
Attributes
Attributes
Port Descriptions
Port Descriptions
Virtex-5 FPGA CARRYIN Multiplexer and Register
CARRYINSEL Multiplexer
A/B Registers
Valid Register Configurations
Overflow/Underflow
Counter Auto Reset
Complex Multiply (25 X 18)
MACC Extension
Key Applications that will Benefit from Wider Inputs
Key Applications that will Benefit from Expanded Second Stage
Key Applications that will Benefit from Expanded Cascade
Key Applications that will Benefit from Independent C Input
Symmetric Rounding
16 x 16 Multiplier
MADD
2-Input Adder
2-Input Adder
Loadable MAC
Dynamically ReconfigurableDSP OPMODEs
Dynamically ReconfigurableDSP OPMODEs
Multiply (35 X 25)
Implement or Accelerate DSP Functions
Adder Tree versus Cascaded Adder
Adder Tree versus Cascaded Adder
幻灯片编号 209
Problem: The FIR Filter
Chaining Up The Trees
Answer: The FIR Filter
IP SupportIP (COREGen & Architecture Wizard)
Knowledge Check & Answer
Summary
Where Can I Learn More?
Where Can I Learn More?
Outline
Objectives
Virtex-5 FPGA Delivers Powerful Clock Management
Three Types of Clock Resources
Virtex-5 FPGA Clock Management Summary
Product Comparison
Virtex-5 FPGA Clock Management Tile
Standard CMT Configurations
CMT General Use Model
DCM Features
PLL Features
PLL Primitives
PLL Basics
PLL Equations
PLL Counter Attributes
PLL Attributes
PLL Basic Equations
Phase-Locked Loop Attributes
Phase-Locked Loop Attributes
Phase-Locked Loop Attributes
Phase-Locked Loop Attributes
Phase-Locked Loop Attributes
Phase-Locked Loop Attributes
Counter Output Examples
Phase-Locked Loop Attributes
CLKFBOUT_PHASE Affect on Waveforms
Phase-Locked Loop Attributes
PLL and Jitter
PLL Jitter Filter
What is the PLL Output Jitter?
Knowledge Check
A Knowledge Check & answer
PLL Use Example Frequency Synthesizer and Jitter Filter
PLL Use Example Clock Network De-Skew
PLL Use Example Zero Delay Buffer
PLL Use ExampleDCM2PLL
PLL Use ExamplePLL2DCM
Virtex-5 FPGA Clock Regions and I/O Banks
Virtex-5 FPGA Global Clocking
Virtex-5 FPGA I/O Clocking
Virtex-5 FPGA Regional Clocking
Use
Knowledge Check
Answer
Knowledge Check
Answer
Summary
Where Can I Learn More?
幻灯片编号 270
主题
主题3. 高速FPGA接口设计
Outline
Objectives
What is Identical to the Virtex-II Pro FPGA?
What’s New or Changed?
I/O Banking
IOB Tiles
Output DDR Changes
Data Output DDR_CLOCK_EDGE = Opposite Edge
Data Output DDR_CLOCK_EDGE = Same Edge
ODDR Instantiation
Input DDR Changes
Data Input DDR_CLOCK_EDGE = Opposite Edge
Data Input DDR_CLOCK_EDGE = Same Edge
Data Input DDR_CLOCK_EDGE = Same Edge Pipelined
IDDR Instantiation
Source-Synchronous System
Source-Synchronous Problems and Solutions
Source-Synchronous Problems and Solutions (continued)
Virtex-4 Source-Synchronous Clocking Resources
OSERDES
OSERDES Width Expansion
OSERDES Attributes
Source-Synchronous Input Solutions
Source-Synchronous Solution
ISERDES
ISERDES Width Expansion
ISERDES Attributes
Dynamic Phase Alignment
Source-Synchronous Clocking (Ideal)
Source-Synchronous Clocking
Delay Chain
64-Tap Absolute Delay Line
Delay Chain
ISERDES Attributes for IDELAY
Delay-Chain Operation
IDELAYCTRL
Word Alignment
Bitslip
Bitslip Use Rules
Word Alignment (Bitslip)
Use Examples
ChipSync WizardMemory Applications: General and Data Setup
Memory Interface Generator
Knowledge Check & Answers
Knowledge Check & Answers
Summary
Where Can I Learn More?
System Interface Platforms
ML461 Memory Interface Boards
Outline
Objectives
I/O Banking Architecture
EnhancementsInput and Output Buffers
EnhancementsChipSync Technology Enhancements
Easy Interface to Source- Synchronous Memory
Virtex-5 FPGA SelectIO Interface Simplifies Design with Built-In Critical Circuits
Easy Frequency Division
Easy Word Alignment
Up to 39 Channels Can Be Clock Aligned in a Region
Memory Reference Design Kits
Delay Chain
64-Tap Absolute Delay Line
Delay Chain
Delay-Chain Operation
IDELAYCTRL
Bitslip
Bitslip Use Rules
OSERDES Attributes
ISERDES Attributes
ISERDES Attributes for IDELAY
Edge-Aligned DDR Inputs Opposite Edge
Edge-Aligned DDR InputsSame Edge
Edge-Aligned DDR InputsSame-Edge Pipelined
Drive DDR Output Data with One Clock
ISERDES Manages Incoming Data
OSERDES Simplifies Frequency Multiplication
Data Output Alignment
Bit Alignment Centers Clock within Data Valid Window
Use Examples
ChipSync WizardMemory Applications: General and Data Setup
Memory Interface Generator
Knowledge Check
Knowledge Check & Answer
Summary
Where Can I Learn More?
Outline
Virtex-4 I/O Tile
Virtex-4 IOB Switching Characteristics
Virtex-4 IOB Switching Characteristics
Virtex-4 ILOGIC Switching Characteristics
Virtex-4 ILOGIC Switching Characteristics
Virtex-4 ILOGIC Switching Characteristics
Virtex-4 OLOGIC Switching Characteristics
Virtex-4 OLOGIC Switching Characteristics
Virtex-4 OLOGIC Switching Characteristics
Virtex-4 OLOGICSwitching Characteristics
Virtex-4 ISERDES Switching Characteristics
Virtex-4 ISERDESSwitching Characteristics
Virtex-4 ISERDES Switching Characteristics
Virtex-4 OSERDESSwitching Characteristics
Virtex-4 OSERDES Switching Characteristics
Virtex-4 OSERDESSwitching Characteristics
Virtex-4 CLBSwitching Characteristics
Virtex-4 CLBSwitching Characteristics
Virtex-4 CLBSwitching Characteristics
Virtex-4 CLB Switching Characteristics
Virtex-4 Block RAM Switching Characteristics
Virtex-4 Block RAM Switching Characteristics
Virtex-4 Block RAM Switching Characteristics
Virtex-4 FIFO switching Characteristics
Virtex-4 FIFO switching Characteristics
Virtex-4 FIFO switching Characteristics
Virtex-4 FIFO switching Characteristics
Virtex-4 FIFO switching Characteristics
Clock Switching Characteristics
Clock Switching Characteristics
Introduction
Circuit Description
Working Principles
Circuit Description
Working Principles
Circuit Description
Circuit Description
Example1: SDR Transceiver
幻灯片编号 128
TX_CLK_AND_DAT Module Block Diagram
幻灯片编号 130
RX_CLK_AND_DAT Module Block Diagram
Bus Alignment: Clock Training
Clock to Data Centering Circuit
幻灯片编号 134
Simplified Virtex-4 DDR Transceiver Block Diagram
TX_CLOCKS Module Using PMCD
TX_CLK_AND_DAT Module
幻灯片编号 138
RX_CLK_AND_DAT Module
ISERDES_ALIGNMENT_MACHINE Module
幻灯片编号 141
Example3:高速存储器数据捕获
存储器的读数据捕获(Read Data Capture)
幻灯片编号 144
幻灯片编号 145
幻灯片编号 146
幻灯片编号 147
幻灯片编号 148
幻灯片编号 149
幻灯片编号 150
533Mbps(267MHz)DDR2接口(V4)
Input Setup Timing Requirements
Input Hold Timing Requirements
Input Requirement Reporting
Setup Time and Relative Mins
Setup Time and Clock Uncertainty
Hold Time and Relative Mins
Hold Time and Uncertainty
OFFSET Analysis Templates
OFFSET IN BEFORE Constraint
System Synchronous Interface
System Synchronous Inputs
System Synchronous Inputs
System Synchronous Data Sheet
Knowledge Check
Answers
Knowledge Check
Answers
OFFSET IN Analysis Header
Source Synchronous DDR I/F
Source Synchronous Examples
DDR: Group Registers for Offset In Constraints based on RISING Keyword
DDR: Group Registers for Offset In Constraints based on FALLING Keyword
DDR: Rising Edge Example
DDR: Rising Edge
DDR: Falling Edge
DDR: Falling Edge
Data Sheet OFFSET Reporting
Knowledge Check
Answer
Answer
OFFSET OUT Constraint
OFFSET OUT Reporting Header
OFFSET OUT Reporting Details
Source Synchronous Example
Lab Design:System Synchronous Interface (SDR)
Lab OverviewSystem Synchronous Interface (SDR)
Lab Design:Source Synchronous Interface (DDR)
Lab OverviewSource Synchronous Interface (DDR)
幻灯片编号 190
主题
主题4.FGPA-DSP系统实现
Agenda
On the Same Wavelength
Why DSP?
Sample Rates and Bit Widths
Bits and Signal-to-Noise Ratio
Bits and Dynamic Range
Sample Rates and Nyquist
Sub-Sampling
FIR Filter
Digital Mixing
Digital Mixing
Fast Fourier Transform (FFT)
Where are Xilinx Devices Used?
MAC Engine
“MAC Farm”
Full Parallel FIR Filter
Basic Building Blocks of DSP
Delays and Data Storage
Delays and Data Storage
Addition and Summation
Accumulation
Multipliers
Tuning the Receiver
Objectives
Xilinx FPGA Device Architecture
Flip-Flops for Delay
Serial to Parallel Conversion
Look-Up Tables
Multiplexer LUT
Parallel to Serial Conversion
Dedicated Multiplexers
DSP48 Tile
Dedicated Carry Logic
Xilinx Full Adder
Addition without the Lowest Level Detail
Twos Complement
Subtract and Add/Subtract
Serial Addition
Simplified DSP48 Slice
Accumulation
Accumulation in the DSP48 Slice
Multiplication
Embedded Multipliers
Building Wider Multipliers
DSP48 Slice Multiplier
35x35 Multiplier Using a Single DSP48 Slice
35x35 Multiplier Using a Single DSP48 Slice
35x35 Multiplier Using Multiple DSP48 Slices
Slice-Based Multipliers
Addition Trees and the Multiplier
Addition Trees and the Multiplier
Partial Products and the Multiplier Logic
Partial Products and the Multiplier Logic
Multiplier Size
Pipelining and Performance
“Shift and Add” Multiplier
“Shift and Add” Multiplier Logic
Constants Remove ROM Multiplier Inputs
Constants Remove ROM Multiplier Inputs
Small KCM Multipliers
Hexadecimal KCM Multipliers
Memories are Made of This
Objectives
Distributed RAM
Distributed RAM
Larger Distributed RAM
Distributed RAM Functions
Distributed RAM Functions
Dual Port Distributed RAM
SRL16E
SRL16E
Compact Delay
Compact Delay
Shift and Scan
Expanding the Shift Register
Using the SRL16E as a FIFO
Embedded Block RAM
Block ROM andMixed-Mode RAM
Block ROM andMixed-Mode RAM
Virtex-4 FPGA Block RAM
Virtex-4 FPGA Block RAM Optional Output Register
Virtex-4 FPGA Cascadable Block RAM
FIFO16 in the Virtex-4 FPGA
Synchronous FIFO
Working Together
Selective Filters
Objectives
Outline
Inside the FIR Filter
Inside the FIR Filter
Gain and Saturation
Gain and Saturation
Outline
MAC Engine FIR Filter
MAC Engine FIR Filter
Selecting Block RAM and Clock Cycles
Selecting Block RAM and Clock Cycles
Selecting Distributed RAMand Clock Cycles
Selecting Distributed RAMand Clock Cycles
Selecting Distributed RAMand Clock Cycles
Exploiting Symmetryversus Bandwidth
Exploiting Symmetryversus Bandwidth
Selecting Distributed RAMfor Symmetry
Selecting Distributed RAMfor Symmetry
Outline
Full Parallel FIR Filter
Adder Tree to Adder Chain
Systolic FIR Filter
Symmetry Saves Multipliers and Logic
Symmetric Systolic FIR Filter
Transpose FIR Structure
Transpose FIR Filter
Transpose FIR Structure
Spectrum Coverage
Outline
Going Slow: Memory Deficiency
Going Slow: Improving Processing Efficiency
Going Faster:Processing Deficiency
Going Faster:Processing Deficiency
Going Faster:Processing Deficiency
Slowing Down the Parallel FIR
Smaller Serial Multipliers
Reducing the Addition Process
Serial Delay Registers
Accumulate and Add
Add and Accumulate
Serial Distributed Arithmetic (SDA)
SDA Symmetrical Filter
Filter Dynamics and Trends 31-Tap Filter Example: 31 Taps, 12-bit Samples, Symmetrical
Filter Dynamics and Trends
Filter Dynamics and Trends
Spectrum Coverage
Outline
Filling the Gap
Splitting the Sample
Splitting the Sample
Splitting the Filter Sample
Splitting the Filter Sample
DRAM-Based Systolic Semi-Parallel (SP) FIR
Block RAM-Based SP FIR
Transpose Block RAM SP FIR
Spectrum Coverage
Outline
Zero is Worth Nothing!
Zero is Worth Nothing!
Missing Zero Logic
Missing Zero Logic
Outline
FIR Filter Exercise
FIR Filter Exercise
FIR Filter Answer
Summary
幻灯片编号 154