CST STUDIO SUITE R 2019
GPU Computing Guide
9
1
0
2
i
e
d
u
G
g
n
i
t
u
p
m
o
C
U
P
G
s
e
m
`e
t
s
y
S
t
l
u
a
s
s
a
D
c
A
I
L
U
M
I
S
/
M
O
C
.
S
D
3
Copyright 1998-2019 Dassault Systemes Deutschland GmbH.
CST Studio Suite is a Dassault Syst`emes product.
All rights reserved.
2
9
1
0
2
i
e
d
u
G
g
n
i
t
u
p
m
o
C
U
P
G
s
e
m
`e
t
s
y
S
t
l
u
a
s
s
a
D
c
A
I
L
U
M
I
S
/
M
O
C
.
S
D
3
Contents
1 Nomenclature
2 Supported Solvers and Features
2.1 Unsupported Features
. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3 Operating System Support
4 Supported Hardware
5 Unsupported Hardware
6 NVIDIA Drivers Download and Installation
6.1 GPU Driver Installation . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.2 Verifying Correct Installation of GPU Hardware and Drivers . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . .
6.3 Uninstalling NVIDIA Drivers
7 Switch On GPU Computing
7.1
Interactive Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7.2 Simulations in Batch Mode . . . . . . . . . . . . . . . . . . . . . . . . . . .
8 Usage Guidelines
8.1 The Error Correction Code (ECC) Feature . . . . . . . . . . . . . . . . . .
8.2 Tesla Compute Cluster (TCC) Mode . . . . . . . . . . . . . . . . . . . . .
8.3 Disable the Exclusive Mode . . . . . . . . . . . . . . . . . . . . . . . . . .
8.4 Display Link . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8.5 Combined MPI Computing and GPU Computing . . . . . . . . . . . . . .
8.6 Service User . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8.7 GPU Computing using Windows Remote Desktop (RDP) . . . . . . . . . .
8.8 Running Multiple Simulations at the Same Time . . . . . . . . . . . . . . .
8.9 Video Card Drivers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8.10 Operating Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8.11 Latest CST Service Pack . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8.12 GPU Monitoring/Utilization . . . . . . . . . . . . . . . . . . . . . . . . . .
8.13 Select Subset of Available GPU Cards
. . . . . . . . . . . . . . . . . . . .
9 NVIDIA GPU Boost
10 Licensing
11 Troubleshooting Tips
12 History of Changes
3
4
4
4
5
14
14
14
18
19
20
20
20
21
21
23
24
24
25
25
25
25
26
26
26
26
27
28
31
31
33
1 Nomenclature
The following section explains the nomenclature used in this document.
command
<...>
Commands you have to enter either on a command prompt (cmd
on MS Windows or your favorite shell on Linux) are typeset using
typewriter fonts.
Within commands the sections you should replace according to your
environment are enclosed in ”<...>”. For example ””
should be replaced by the directory where you have installed CST
STUDIO SUITE (e.g. ”c:\Program Files\CST STUDIO SUITE”).
3
9
1
0
2
i
e
d
u
G
g
n
i
t
u
p
m
o
C
U
P
G
s
e
m
`e
t
s
y
S
t
l
u
a
s
s
a
D
c
A
I
L
U
M
I
S
/
M
O
C
.
S
D
3
4
9
1
0
2
i
e
d
u
G
g
n
i
t
u
p
m
o
C
U
P
G
s
e
m
`e
t
s
y
S
t
l
u
a
s
s
a
D
c
A
I
L
U
M
I
S
/
M
O
C
.
S
D
3
2 Supported Solvers and Features
• Transient Solver (T-solver/TLM-solver)
• Integral Equation Solver (direct solver and MLFMM only)
• Multilayer solver (M-solver)
• Particle-In-Cell (PIC-solver)
• Asymptotic Solver (A-solver; for this solver TCC mode is required on Windows!)
• Conjugate Heat Transfer Solver (CHT-solver)
Co-simulation with CST CABLE STUDIO is also supported.
2.1 Unsupported Features
The following features are currently not supported by GPU Computing. This list is subject
to change in future releases or service packs of CST STUDIO SUITE.
Solver
Unsupported Features on GPU
Transient Solver
• Subgridding
Particle In Cell Solver
• Modulation of External Fields
• Open Boundaries
3 Operating System Support
CST STUDIO SUITE is continuously tested on different operating systems. For a list of
supported operating systems please refer to
https://updates.cst.com/downloads/CST-OS-Support.pdf
In general, GPU computing can be used on any of the supported operating systems.
5
9
1
0
2
i
e
d
u
G
g
n
i
t
u
p
m
o
C
U
P
G
s
e
m
`e
t
s
y
S
t
l
u
a
s
s
a
D
c
A
I
L
U
M
I
S
/
M
O
C
.
S
D
3
4 Supported Hardware
CST STUDIO SUITE currently supports up to 16 GPU devices in a single host system,
meaning each number of GPU devices between 1 and 16 is supported.1
The following tables contain some basic information about the GPU hardware currently
supported by the GPU Computing feature of CST STUDIO SUITE, as well as the require-
ments for the host system equipped with the hardware. To ensure compatibility of GPU
hardware and host system please check
https://www.nvidia.com/object/tesla-qualified-servers.html
Please note that a 64 bit computer architecture is required for GPU Computing.
A general hardware recommendation can be found here:
https://www.cst.com/products/csts2/hardwarerecommendation
1It is strongly recommended to contact CST before purchasing a system with more than four GPU cards
to ensure that the hardware is working properly and is configured correctly for CST STUDIO SUITE.
6
9
1
0
2
i
e
d
u
G
g
n
i
t
u
p
m
o
C
U
P
G
s
e
m
`e
t
s
y
S
t
l
u
a
s
s
a
D
c
A
I
L
U
M
I
S
/
M
O
C
.
S
D
3
CST STUDIO SUITE officially supports the Nvidia Tesla and Quadro cards listed in the
table below. That means that these GPUs are well tested and validated with CST software
and you can contact CST support in case you run into any problems.
List of supported GPU hardware for CST STUDIO SUITE 2019 2 3
Platform
Series
Card Name
Quadro RTX 8000 4
Turing Workstations
Quadro RTX 6000 4
Turing Workstations
Quadro RTX 5000 4
Turing Workstations
Quadro RTX 4000 4
Turing Workstations
Volta
Workstations
Quadro GV100
Servers
Tesla V100-SXM2-32GB (Chip) Volta
Servers
Tesla V100-PCIE-32GB
Volta
Tesla V100-SXM2-16GB (Chip) Volta
Servers
Servers
Volta
Tesla V100-PCIE-16GB
Servers
Pascal
Tesla P100-SXM2 (Chip)
Servers
Pascal
Tesla P100-PCIE-16GB
Tesla P100 16GB
Pascal
Servers
Pascal
Tesla P100-PCIE-12GB
Servers
Quadro P6000 4
Pascal Workstations
Quadro GP100
Pascal Workstations
Tesla P40 4
Pascal
Tesla P4 4
Pascal
Tesla M60 4
Maxwell Servers/Workst.
Tesla M40 4
Maxwell Servers
Quadro M6000 24GB 4
Maxwell Workstations
Quadro M6000 4
Maxwell Workstations
Tesla K80
Kepler
Kepler
Tesla K40 m/c/s/st/d/t
Kepler Workstations
Quadro K6000
Tesla K20X
Kepler
Kepler
Tesla K20m/K20c/K20s
Tesla K10 4
Kepler
Servers
Servers/Workst.
Servers
Servers
Servers/Workst.
Min. CST Version
2019 SP6
2019 SP6
2019 SP6
2019 SP6
2018 SP6
2018 SP6
2018 SP6
2018 SP1
2018 SP1
2017 release
2017 release
2017 release
2017 SP2
2017 SP 2
2017 SP2
2018 release
2018 release
2016 SP4
2016 SP4
2016 SP4
2015 SP4
2014 SP6
2013 SP5
2013 SP4
2013 release
2013 release
2013 release
Servers
Servers
2Please note that cards of different series (e.g. ”Maxwell” and ”Pascal”) can’t be combined in a single
host system for GPU Computing.
3Platform = Servers: These GPUs are only available with a passive cooling system which only provides
sufficient cooling if it’s used in combination with additional fans. These fans are usually available for server
chassis only!
Platform = Workstations: These GPUs provide active cooling, so they are suitable for workstation com-
puter chassis as well.
4 Important: The double precision performance of this GPU device is poor, thus, it is recommended
for T-solver simulations only.
7
9
1
0
2
i
e
d
u
G
g
n
i
t
u
p
m
o
C
U
P
G
s
e
m
`e
t
s
y
S
t
l
u
a
s
s
a
D
c
A
I
L
U
M
I
S
/
M
O
C
.
S
D
3
Hardware Type
NVIDIA Tesla P100 Chip
(for Servers)
NVIDIA Tesla P100 PCIe1
(for Servers)
Min. CST version required
2017 release
2017 release
Number of GPUs
Max. Problem Size
(Transient Solver)
Form Factor
Memory
Bandwidth
1
1
approx. 160 million mesh cells
approx. 160 / 120 million mesh cells
Chip
Passive Cooling
Dual-Slot PCI-Express
Passive Cooling
16 GB CoWoS HBM2
16 / 12 GB CoWoS HBM2
732 GB/s
732 GB/s / 549 GB/s
Single Precision Performance 2
10.6 TFlops
Double Precision Performance 2
5.3 TFlops
9.3 TFlops
4.7 TFlops
Power Consumption
300 W (max.)
250 W (max.)
System interface
NVIDIA NVLink
1x PCIe Gen 3 (x16 electrically)
Power Supply of Host System
min. 750 W
Min. RAM of Host System 3
64 GB
min. 750 W
64 GB
1 The 12 GB version has about 25 percent less performance compared to the 16 GB
version.
2 Measured with BOOST enabled
3The host system requires approximately 4 times as much memory as is available on the GPU cards. Al-
though it is technically possible to use less memory than this recommendation, the simulation performance
of larger models will suffer.
CST assumes no liability for any problems caused by this information.
8
9
1
0
2
i
e
d
u
G
g
n
i
t
u
p
m
o
C
U
P
G
s
e
m
`e
t
s
y
S
t
l
u
a
s
s
a
D
c
A
I
L
U
M
I
S
/
M
O
C
.
S
D
3
Hardware Type
NVIDIA Quadro GP 100
(for Workstations)
NVIDIA Quadro P60001
(for Workstations)
Min. CST version required
2017 SP 2
2017 SP 2
Number of GPUs
Max. Problem Size
(Transient Solver)
Form Factor
Memory
Bandwidth
1
1
approx. 160 million mesh cells
approx. 240 million mesh cells
Dual-Slot PCI-Express
Dual-Slot PCI-Express
16 GB HBM2
24 GB GDDR5X
720 GB/s
Single Precision Performance 2
10.3 TFlops
Double Precision Performance 2
5.2 TFlops
432 GB/s
12.0 TFlops
0.2 TFlops
Power Consumption
300 W (max.)
300 W (max.)
System interface
1x PCIe Gen 3 (x16 electrically)
1x PCIe Gen 3 (x16 electrically)
Power Supply of Host System
min. 750 W
min. 750 W
Min. RAM of Host System 3
64 GB
96 GB
1 The double precision performance of this GPU device is poor, thus, it is recommended
for T-solver simulations only.
2 Measured with BOOST enabled
3The host system requires approximately 4 times as much memory as is available on the GPU cards. Al-
though it is technically possible to use less memory than this recommendation, the simulation performance
of larger models will suffer.
CST assumes no liability for any problems caused by this information.