NVIDIA TURING GPU
ARCHITECTURE
Graphics Reinvented
WP-09183-001_v01
TABLE OF CONTENTS
Introduction to the NVIDIA Turing Architecture ....................................................................1
NVIDIA Turing Key Features.......................................................................................................... 3
New Streaming Multiprocessor (SM) ....................................................................................... 3
Turing Tensor Cores ................................................................................................................. 4
Real-Time Ray Tracing Acceleration ......................................................................................... 4
New Shading Advancements .................................................................................................... 4
Mesh Shading ...................................................................................................................... 4
Variable Rate Shading (VRS) ................................................................................................ 5
Texture-Space Shading ........................................................................................................ 5
Multi-View Rendering (MVR)............................................................................................... 5
Deep Learning Features for Graphics ....................................................................................... 5
Deep Learning Features for Inference ...................................................................................... 6
GDDR6 High-Performance Memory Subsystem ....................................................................... 6
Second-Generation NVIDIA NVLink .......................................................................................... 6
USB-C and VirtualLink ............................................................................................................... 6
Turing GPU Architecture In-Depth ........................................................................................7
Turing TU102 GPU ........................................................................................................................ 7
Turing Streaming Multiprocessor (SM) Architecture .................................................................. 11
Turing Tensor Cores ............................................................................................................... 15
Turing Optimized for Datacenter Applications ........................................................................... 16
Turing Memory Architecture and Display Features .................................................................... 20
GDDR6 Memory Subsystem ................................................................................................... 20
L2 Cache and ROPs ................................................................................................................. 21
Turing Memory Compression ................................................................................................. 22
Video and Display Engine ....................................................................................................... 22
USB-C and VirtualLink ................................................................................................................. 24
NVLink Improves SLI ................................................................................................................... 24
Turing Ray Tracing Technology............................................................................................26
Turing RT Cores .......................................................................................................................... 31
NVIDIA NGX Technology .....................................................................................................34
NGX Software Architecture ........................................................................................................ 34
Deep Learning Super-Sampling (DLSS) ....................................................................................... 35
InPainting ................................................................................................................................... 38
AI Slow-Mo ............................................................................................................................. 39
AI Super Rez ........................................................................................................................... 39
NVIDIA Turing GPU Architecture
WP-09183-001_v01 | ii
Turing Advanced Shading Technologies ..............................................................................40
Mesh Shading ............................................................................................................................. 40
Variable Rate Shading ................................................................................................................. 43
Content Adaptive Shading ...................................................................................................... 45
Motion Adaptive Shading ....................................................................................................... 46
Foveated Rendering ............................................................................................................... 47
Texture Space Shading ............................................................................................................... 48
The Mechanics of TSS ............................................................................................................. 49
Multi-View Rendering ................................................................................................................. 51
Multi-View Rendering Use Cases ............................................................................................ 52
Resource Management and Binding Model ............................................................................... 54
Turing Features Enhance Virtual Reality ..............................................................................55
Conclusion ..........................................................................................................................57
Appendix A Turing TU104 GPU ............................................................................................58
Appendix B Turing TU106 GPU ...........................................................................................63
Appendix C RTX-OPS Description ........................................................................................66
The Hybrid Rendering Model ..................................................................................................... 66
RTX-OPS Workload-based Metric Explained ............................................................................... 67
Appendix D Ray Tracing Overview .......................................................................................69
Basic Ray Tracing Mechanics ...................................................................................................... 70
Bounding Volume Hierarchy .................................................................................................. 71
Denoising Filtering ...................................................................................................................... 73
Ray-Traced Shadows, Ambient Occlusion, and Reflections ........................................................ 73
NVIDIA Turing GPU Architecture
WP-09183-001_v01 | iii
LIST OF FIGURES
Turing Reinvents Graphics ............................................................................................ 2
Figure 1.
Figure 2.
Turing TU102 Full GPU with 72 SM Units ..................................................................... 8
Figure 3. NVIDIA Turing TU102 GPU .......................................................................................... 10
Figure 4.
Turing TU102/TU104/TU106 Streaming Multiprocessor (SM).................................... 12
Figure 5.
Concurrent Execution of Floating Point and Integer Instructions in the Turing SM.... 13
Figure 6. New Shared Memory Architecture ............................................................................. 14
Figure 7.
Turing Shading Performance Speedup versus Pascal on Many Different Workloads . 14
Figure 8. New Turing Tensor Cores Provide Multi-Precision for AI Inference............................ 16
Figure 9.
Tesla T4 delivers up to 40X Higher Inference Performance ........................................ 17
Figure 10. Tesla T4 Delivers More than 50X the Energy Efficiency of CPU-based Inferencing .... 18
Figure 11. Turing GDDR6 ............................................................................................................. 21
Figure 12. 50% Higher Effective Bandwidth ................................................................................ 22
Figure 13. Video Feature Enhancements ..................................................................................... 23
Figure 14. NVLink Enables New SLI Display Topologies ............................................................... 25
Figure 15. SOL MAN from NVIDIA SOL Ray Tracing Demo (See Demo) ....................................... 27
Figure 16. Hybrid Rendering Pipeline .......................................................................................... 28
Figure 17. Details of Ray Tracing and Rasterization Pipeline Stages ............................................ 29
Figure 18. From Reflections Demo .............................................................................................. 30
Figure 19. Ray Tracing Pre Turing ................................................................................................ 32
Figure 20. Turing Ray Tracing with RT Cores................................................................................ 32
Figure 21. Turing Ray Tracing Performance ................................................................................. 33
Figure 22. Turing with 4K DLSS is Twice the Performance of Pascal with 4K TAA ....................... 35
Figure 23. DLSS 2X versus 64xSS image almost Indistinguishable................................................ 36
Figure 24. DLSS 2X Provides Significantly Better Temporal Stability and Image
Clarity Than TAA ......................................................................................................... 37
Figure 25. NGX InPainting Examples, Missing Image Data Is Intelligently Replaced with
Meaningful Image Information ................................................................................... 38
Figure 26. AI Super Rez Provides Improved Image Clarity Over Other Filtering Methods .......... 39
Figure 27. Mesh Shading, Visually Rich Images ........................................................................... 40
Figure 28. Current Graphics Pipeline versus a Graphics Pipeline with Task and Mesh Shaders .. 41
Figure 29. Screenshot from the Asteroid Field Demo .................................................................. 42
Figure 30. An Asteroid at Low and High Levels of Detail (LOD) ................................................... 42
Figure 31. Dynamically Computed, Spherical Cutaway of a Koenigsegg Model,
Viewed in NVIDIA Holodeck™ ..................................................................................... 43
Figure 32. Turing VRS Supported Shading Rates and Example Application to a Game Frame ..... 44
Figure 33. Example of Content Adaptive Shading ........................................................................ 46
NVIDIA Turing GPU Architecture
WP-09183-001_v01 | iv
Figure 34. Perceived Blur Due to Object Motion Combined with Retinal and
Display Persistence ..................................................................................................... 47
Figure 35. Traditional Rasterization and Shading Process ........................................................... 49
Figure 36. Texture Space Shading Process ................................................................................... 50
Figure 37. Texture Space Shading for Stereo ............................................................................... 51
Figure 38. 200° FOV HMD Where Two Canted Panels are Used and Benefit from MVR ............. 53
Figure 39Figure 37 MVR Single Pass Cascaded Shadow Map Rendering .................................... 54
Figure 40. Turing Features for VR ................................................................................................ 56
Figure 41. Turing TU104 Full Chip Diagram ................................................................................. 59
Figure 42. Turing TU106 Full Chip Diagram ................................................................................. 64
Figure 43. Workload Distribution Over One Turing Frame Time ................................................. 66
Figure 44. Peak Operations of Each Type Base for GTX 2080 Ti .................................................. 68
Figure 45. Basic Ray Tracing Process ........................................................................................... 70
Figure 46. Abstraction of Tree Traversal and a Ray Intersecting Different Levels of
Bounding Boxes .......................................................................................................... 72
Figure 47. Shadow Map Percentage Closer Filtering (PCF) versus Ray Tracing with Denoising ... 74
Figure 48. Shadow Mapping Compared to Ray Traced Shadows that use 1 Sample
Per Pixel and Denoising............................................................................................... 74
Figure 49. Screen-Space Ambient Occlusion Compared to Ray-Traced Ambient Occlusion ........ 75
Figure 50. RTX Ray Tracing........................................................................................................... 76
Figure 51. Scene from Battlefield V with RTX On and Off ............................................................ 77
Figure 52. Scene #2 from Battlefield V with RTX On and Off ....................................................... 78
Figure 53. Shadow of the Tomb Raider with RTX ON .................................................................. 79
NVIDIA Turing GPU Architecture
WP-09183-001_v01 | v
LIST OF TABLES
Table 1. Comparison of NVIDIA Pascal GP102 and Turing TU102 .................................... 8
Table 2.
Enhanced Video Engine, Tesla P4 versus Tesla T4 ............................................ 19
Table 3. DisplayPort Support in Turing GPUs .................................................................. 23
Table 4. Comparison of NVIDIA Pascal GP104 and Turing TU104 GPUs ........................ 60
Table 5. Comparison of the Pascal Tesla P4 and the Turing Tesla T4 ........................... 61
Table 6. Comparison of NVIDIA Pascal GP104 to Turing TU106 GPUs ........................... 64
NVIDIA Turing GPU Architecture
WP-09183-001_v01 | vi
INTRODUCTION TO THE NVIDIA TURING
ARCHITECTURE
Fueled by the ongoing growth of the gaming market and its insatiable demand for better 3D
graphics, NVIDIA® has evolved the GPU into the world’s leading parallel processing engine for
many computationally-intensive applications. In addition to rendering highly realistic and
immersive 3D games, NVIDIA GPUs also accelerate content creation workflows, high performance
computing (HPC) and datacenter applications, and numerous artificial intelligence systems and
applications.
Turing represents the biggest architectural leap forward in over a decade, providing a new core
GPU architecture that enables major advances in efficiency and performance for PC gaming,
professional graphics applications, and deep learning inferencing.
Using new hardware-based accelerators and a Hybrid Rendering approach, Turing fuses
rasterization, real-time ray tracing, AI, and simulation to enable incredible realism in PC games,
amazing new effects powered by neural networks, cinematic-quality interactive experiences, and
fluid interactivity when creating or navigating complex 3D models.
Within the core architecture, the key enablers for Turing’s significant boost in graphics
performance are a new GPU processor (streaming multiprocessor—SM) architecture with
improved shader execution efficiency, and a new memory system architecture that includes
support for the latest GDDR6 memory technology.
Image processing applications such as the ImageNet Challenge were among the first success
stories for deep learning, so it is no surprise that AI has the potential to solve many important
problems in graphics. Turing’s Tensor Cores power a suite of new deep learning-based Neural
Services that offer stunning graphics effects for games and professional graphics, in addition to
providing fast AI inferencing for cloud-based systems.
NVIDIA Turing GPU Architecture
WP-09183-001_v01 | 1
Introduction to the NVIDIA Turing Architecture
The long-sought after holy-grail of computer graphics rendering—real-time ray tracing—is now
reality in single-GPU systems with the NVIDIA Turing GPU architecture. Turing GPUs introduce
new RT Cores, accelerator units that are dedicated to performing ray tracing operations with
extraordinary efficiency, eliminating expensive software emulation-based ray tracing approaches
of the past. These new units, combined with NVIDIA RTX™ software technology and sophisticated
filtering algorithms, enable Turing to deliver real-time ray-traced rendering, including
photorealistic objects and environments with physically accurate shadows, reflections, and
refractions.
In parallel with Turing’s development, Microsoft announced both the DirectML for AI and DirectX
Raytracing (DXR) APIs in early 2018. With the combination of Turing GPU architecture and the
new AI and ray tracing APIs from Microsoft, game developers can rapidly deploy real-time AI and
ray tracing in their games.
In addition to its groundbreaking AI and ray tracing features, Turing also includes many new
advanced shading features that improve performance, enhance image quality, and deliver new
levels of geometric complexity.
Turing GPUs also inherit all the enhancements to the NVIDIA CUDA™ platform introduced in the
Volta architecture that improve the capability, flexibility, productivity, and portability of compute
applications. Features such as independent thread scheduling, hardware-accelerated Multi
Process Service (MPS) with address space isolation for multiple applications, and Cooperative
Groups are all part of the Turing GPU architecture.
Several of the new NVIDIA GeForce® and NVIDIA Quadro™ GPU products will be powered by
Turing GPUs. In this paper we focus on the architecture and capabilities of NVIDIA’s flagship
Turing GPU, which is codenamed TU102 and will be shipping in the GeForce RTX 2080 Ti and
Quadro RTX 6000. Technical details, including product specifications for TU104 and TU106 Turing
GPUs, are located in the appendices.
Figure 1 shows how Turing reinvents graphics with an entirely new architecture that includes
enhanced Tensor Cores, new RT Cores, and many new advanced shading features. Turing
combines programmable shading, real-time ray tracing, and AI algorithms to deliver incredibly
realistic and physically accurate graphics for games and professional applications.
Figure 1.
Turing Reinvents Graphics
NVIDIA Turing GPU Architecture
WP-09183-001_v01 | 2