logo资料库

NVIDIA-Turing-Architecture-WhitepaperNVIDIA-图灵架构的白皮书.pdf

第1页 / 共87页
第2页 / 共87页
第3页 / 共87页
第4页 / 共87页
第5页 / 共87页
第6页 / 共87页
第7页 / 共87页
第8页 / 共87页
资料共87页,剩余部分请下载后查看
Introduction to the NVIDIA Turing Architecture
NVIDIA Turing Key Features
New Streaming Multiprocessor (SM)
Turing Tensor Cores
Real-Time Ray Tracing Acceleration
New Shading Advancements
Mesh Shading
Variable Rate Shading (VRS)
Texture-Space Shading
Multi-View Rendering (MVR)
Deep Learning Features for Graphics
Deep Learning Features for Inference
GDDR6 High-Performance Memory Subsystem
Second-Generation NVIDIA NVLink
USB-C and VirtualLink
Turing GPU Architecture In-Depth
Turing TU102 GPU
Turing Streaming Multiprocessor (SM) Architecture
Turing Tensor Cores
Turing Optimized for Datacenter Applications
Turing Memory Architecture and Display Features
GDDR6 Memory Subsystem
L2 Cache and ROPs
Turing Memory Compression
Video and Display Engine
USB-C and VirtualLink
NVLink Improves SLI
Turing Ray Tracing Technology
Turing RT Cores
NVIDIA NGX Technology
NGX Software Architecture
Deep Learning Super-Sampling (DLSS)
InPainting
AI Slow-Mo
AI Super Rez
Turing Advanced Shading Technologies
Mesh Shading
Variable Rate Shading
Content Adaptive Shading
Motion Adaptive Shading
Foveated Rendering
Texture Space Shading
The Mechanics of TSS
Multi-View Rendering
Multi-View Rendering Use Cases
Resource Management and Binding Model
Turing Features Enhance Virtual Reality
Conclusion
Appendix A Turing TU104 GPU
Appendix B Turing TU106 GPU
Appendix C RTX-OPS Description
The Hybrid Rendering Model
RTX-OPS Workload-based Metric Explained
Appendix D Ray Tracing Overview
Basic Ray Tracing Mechanics
Bounding Volume Hierarchy
Denoising Filtering
Ray-Traced Shadows, Ambient Occlusion, and Reflections
NVIDIA TURING GPU ARCHITECTURE Graphics Reinvented
WP-09183-001_v01 TABLE OF CONTENTS Introduction to the NVIDIA Turing Architecture ....................................................................1 NVIDIA Turing Key Features.......................................................................................................... 3 New Streaming Multiprocessor (SM) ....................................................................................... 3 Turing Tensor Cores ................................................................................................................. 4 Real-Time Ray Tracing Acceleration ......................................................................................... 4 New Shading Advancements .................................................................................................... 4 Mesh Shading ...................................................................................................................... 4 Variable Rate Shading (VRS) ................................................................................................ 5 Texture-Space Shading ........................................................................................................ 5 Multi-View Rendering (MVR)............................................................................................... 5 Deep Learning Features for Graphics ....................................................................................... 5 Deep Learning Features for Inference ...................................................................................... 6 GDDR6 High-Performance Memory Subsystem ....................................................................... 6 Second-Generation NVIDIA NVLink .......................................................................................... 6 USB-C and VirtualLink ............................................................................................................... 6 Turing GPU Architecture In-Depth ........................................................................................7 Turing TU102 GPU ........................................................................................................................ 7 Turing Streaming Multiprocessor (SM) Architecture .................................................................. 11 Turing Tensor Cores ............................................................................................................... 15 Turing Optimized for Datacenter Applications ........................................................................... 16 Turing Memory Architecture and Display Features .................................................................... 20 GDDR6 Memory Subsystem ................................................................................................... 20 L2 Cache and ROPs ................................................................................................................. 21 Turing Memory Compression ................................................................................................. 22 Video and Display Engine ....................................................................................................... 22 USB-C and VirtualLink ................................................................................................................. 24 NVLink Improves SLI ................................................................................................................... 24 Turing Ray Tracing Technology............................................................................................26 Turing RT Cores .......................................................................................................................... 31 NVIDIA NGX Technology .....................................................................................................34 NGX Software Architecture ........................................................................................................ 34 Deep Learning Super-Sampling (DLSS) ....................................................................................... 35 InPainting ................................................................................................................................... 38 AI Slow-Mo ............................................................................................................................. 39 AI Super Rez ........................................................................................................................... 39 NVIDIA Turing GPU Architecture WP-09183-001_v01 | ii
Turing Advanced Shading Technologies ..............................................................................40 Mesh Shading ............................................................................................................................. 40 Variable Rate Shading ................................................................................................................. 43 Content Adaptive Shading ...................................................................................................... 45 Motion Adaptive Shading ....................................................................................................... 46 Foveated Rendering ............................................................................................................... 47 Texture Space Shading ............................................................................................................... 48 The Mechanics of TSS ............................................................................................................. 49 Multi-View Rendering ................................................................................................................. 51 Multi-View Rendering Use Cases ............................................................................................ 52 Resource Management and Binding Model ............................................................................... 54 Turing Features Enhance Virtual Reality ..............................................................................55 Conclusion ..........................................................................................................................57 Appendix A Turing TU104 GPU ............................................................................................58 Appendix B Turing TU106 GPU ...........................................................................................63 Appendix C RTX-OPS Description ........................................................................................66 The Hybrid Rendering Model ..................................................................................................... 66 RTX-OPS Workload-based Metric Explained ............................................................................... 67 Appendix D Ray Tracing Overview .......................................................................................69 Basic Ray Tracing Mechanics ...................................................................................................... 70 Bounding Volume Hierarchy .................................................................................................. 71 Denoising Filtering ...................................................................................................................... 73 Ray-Traced Shadows, Ambient Occlusion, and Reflections ........................................................ 73 NVIDIA Turing GPU Architecture WP-09183-001_v01 | iii
LIST OF FIGURES Turing Reinvents Graphics ............................................................................................ 2 Figure 1. Figure 2. Turing TU102 Full GPU with 72 SM Units ..................................................................... 8 Figure 3. NVIDIA Turing TU102 GPU .......................................................................................... 10 Figure 4. Turing TU102/TU104/TU106 Streaming Multiprocessor (SM).................................... 12 Figure 5. Concurrent Execution of Floating Point and Integer Instructions in the Turing SM.... 13 Figure 6. New Shared Memory Architecture ............................................................................. 14 Figure 7. Turing Shading Performance Speedup versus Pascal on Many Different Workloads . 14 Figure 8. New Turing Tensor Cores Provide Multi-Precision for AI Inference............................ 16 Figure 9. Tesla T4 delivers up to 40X Higher Inference Performance ........................................ 17 Figure 10. Tesla T4 Delivers More than 50X the Energy Efficiency of CPU-based Inferencing .... 18 Figure 11. Turing GDDR6 ............................................................................................................. 21 Figure 12. 50% Higher Effective Bandwidth ................................................................................ 22 Figure 13. Video Feature Enhancements ..................................................................................... 23 Figure 14. NVLink Enables New SLI Display Topologies ............................................................... 25 Figure 15. SOL MAN from NVIDIA SOL Ray Tracing Demo (See Demo) ....................................... 27 Figure 16. Hybrid Rendering Pipeline .......................................................................................... 28 Figure 17. Details of Ray Tracing and Rasterization Pipeline Stages ............................................ 29 Figure 18. From Reflections Demo .............................................................................................. 30 Figure 19. Ray Tracing Pre Turing ................................................................................................ 32 Figure 20. Turing Ray Tracing with RT Cores................................................................................ 32 Figure 21. Turing Ray Tracing Performance ................................................................................. 33 Figure 22. Turing with 4K DLSS is Twice the Performance of Pascal with 4K TAA ....................... 35 Figure 23. DLSS 2X versus 64xSS image almost Indistinguishable................................................ 36 Figure 24. DLSS 2X Provides Significantly Better Temporal Stability and Image Clarity Than TAA ......................................................................................................... 37 Figure 25. NGX InPainting Examples, Missing Image Data Is Intelligently Replaced with Meaningful Image Information ................................................................................... 38 Figure 26. AI Super Rez Provides Improved Image Clarity Over Other Filtering Methods .......... 39 Figure 27. Mesh Shading, Visually Rich Images ........................................................................... 40 Figure 28. Current Graphics Pipeline versus a Graphics Pipeline with Task and Mesh Shaders .. 41 Figure 29. Screenshot from the Asteroid Field Demo .................................................................. 42 Figure 30. An Asteroid at Low and High Levels of Detail (LOD) ................................................... 42 Figure 31. Dynamically Computed, Spherical Cutaway of a Koenigsegg Model, Viewed in NVIDIA Holodeck™ ..................................................................................... 43 Figure 32. Turing VRS Supported Shading Rates and Example Application to a Game Frame ..... 44 Figure 33. Example of Content Adaptive Shading ........................................................................ 46 NVIDIA Turing GPU Architecture WP-09183-001_v01 | iv
Figure 34. Perceived Blur Due to Object Motion Combined with Retinal and Display Persistence ..................................................................................................... 47 Figure 35. Traditional Rasterization and Shading Process ........................................................... 49 Figure 36. Texture Space Shading Process ................................................................................... 50 Figure 37. Texture Space Shading for Stereo ............................................................................... 51 Figure 38. 200° FOV HMD Where Two Canted Panels are Used and Benefit from MVR ............. 53 Figure 39Figure 37 MVR Single Pass Cascaded Shadow Map Rendering .................................... 54 Figure 40. Turing Features for VR ................................................................................................ 56 Figure 41. Turing TU104 Full Chip Diagram ................................................................................. 59 Figure 42. Turing TU106 Full Chip Diagram ................................................................................. 64 Figure 43. Workload Distribution Over One Turing Frame Time ................................................. 66 Figure 44. Peak Operations of Each Type Base for GTX 2080 Ti .................................................. 68 Figure 45. Basic Ray Tracing Process ........................................................................................... 70 Figure 46. Abstraction of Tree Traversal and a Ray Intersecting Different Levels of Bounding Boxes .......................................................................................................... 72 Figure 47. Shadow Map Percentage Closer Filtering (PCF) versus Ray Tracing with Denoising ... 74 Figure 48. Shadow Mapping Compared to Ray Traced Shadows that use 1 Sample Per Pixel and Denoising............................................................................................... 74 Figure 49. Screen-Space Ambient Occlusion Compared to Ray-Traced Ambient Occlusion ........ 75 Figure 50. RTX Ray Tracing........................................................................................................... 76 Figure 51. Scene from Battlefield V with RTX On and Off ............................................................ 77 Figure 52. Scene #2 from Battlefield V with RTX On and Off ....................................................... 78 Figure 53. Shadow of the Tomb Raider with RTX ON .................................................................. 79 NVIDIA Turing GPU Architecture WP-09183-001_v01 | v
LIST OF TABLES Table 1. Comparison of NVIDIA Pascal GP102 and Turing TU102 .................................... 8 Table 2. Enhanced Video Engine, Tesla P4 versus Tesla T4 ............................................ 19 Table 3. DisplayPort Support in Turing GPUs .................................................................. 23 Table 4. Comparison of NVIDIA Pascal GP104 and Turing TU104 GPUs ........................ 60 Table 5. Comparison of the Pascal Tesla P4 and the Turing Tesla T4 ........................... 61 Table 6. Comparison of NVIDIA Pascal GP104 to Turing TU106 GPUs ........................... 64 NVIDIA Turing GPU Architecture WP-09183-001_v01 | vi
INTRODUCTION TO THE NVIDIA TURING ARCHITECTURE Fueled by the ongoing growth of the gaming market and its insatiable demand for better 3D graphics, NVIDIA® has evolved the GPU into the world’s leading parallel processing engine for many computationally-intensive applications. In addition to rendering highly realistic and immersive 3D games, NVIDIA GPUs also accelerate content creation workflows, high performance computing (HPC) and datacenter applications, and numerous artificial intelligence systems and applications. Turing represents the biggest architectural leap forward in over a decade, providing a new core GPU architecture that enables major advances in efficiency and performance for PC gaming, professional graphics applications, and deep learning inferencing. Using new hardware-based accelerators and a Hybrid Rendering approach, Turing fuses rasterization, real-time ray tracing, AI, and simulation to enable incredible realism in PC games, amazing new effects powered by neural networks, cinematic-quality interactive experiences, and fluid interactivity when creating or navigating complex 3D models. Within the core architecture, the key enablers for Turing’s significant boost in graphics performance are a new GPU processor (streaming multiprocessor—SM) architecture with improved shader execution efficiency, and a new memory system architecture that includes support for the latest GDDR6 memory technology. Image processing applications such as the ImageNet Challenge were among the first success stories for deep learning, so it is no surprise that AI has the potential to solve many important problems in graphics. Turing’s Tensor Cores power a suite of new deep learning-based Neural Services that offer stunning graphics effects for games and professional graphics, in addition to providing fast AI inferencing for cloud-based systems. NVIDIA Turing GPU Architecture WP-09183-001_v01 | 1
Introduction to the NVIDIA Turing Architecture The long-sought after holy-grail of computer graphics rendering—real-time ray tracing—is now reality in single-GPU systems with the NVIDIA Turing GPU architecture. Turing GPUs introduce new RT Cores, accelerator units that are dedicated to performing ray tracing operations with extraordinary efficiency, eliminating expensive software emulation-based ray tracing approaches of the past. These new units, combined with NVIDIA RTX™ software technology and sophisticated filtering algorithms, enable Turing to deliver real-time ray-traced rendering, including photorealistic objects and environments with physically accurate shadows, reflections, and refractions. In parallel with Turing’s development, Microsoft announced both the DirectML for AI and DirectX Raytracing (DXR) APIs in early 2018. With the combination of Turing GPU architecture and the new AI and ray tracing APIs from Microsoft, game developers can rapidly deploy real-time AI and ray tracing in their games. In addition to its groundbreaking AI and ray tracing features, Turing also includes many new advanced shading features that improve performance, enhance image quality, and deliver new levels of geometric complexity. Turing GPUs also inherit all the enhancements to the NVIDIA CUDA™ platform introduced in the Volta architecture that improve the capability, flexibility, productivity, and portability of compute applications. Features such as independent thread scheduling, hardware-accelerated Multi Process Service (MPS) with address space isolation for multiple applications, and Cooperative Groups are all part of the Turing GPU architecture. Several of the new NVIDIA GeForce® and NVIDIA Quadro™ GPU products will be powered by Turing GPUs. In this paper we focus on the architecture and capabilities of NVIDIA’s flagship Turing GPU, which is codenamed TU102 and will be shipping in the GeForce RTX 2080 Ti and Quadro RTX 6000. Technical details, including product specifications for TU104 and TU106 Turing GPUs, are located in the appendices. Figure 1 shows how Turing reinvents graphics with an entirely new architecture that includes enhanced Tensor Cores, new RT Cores, and many new advanced shading features. Turing combines programmable shading, real-time ray tracing, and AI algorithms to deliver incredibly realistic and physically accurate graphics for games and professional applications. Figure 1. Turing Reinvents Graphics NVIDIA Turing GPU Architecture WP-09183-001_v01 | 2
分享到:
收藏