Intel® C++ Compiler 19.0 Developer Guide and Reference
Legal Information
Contents
Introducing the Intel® C++ Compiler
Feature Requirements
Getting Help and Support
Related Information
Notational Conventions
Compiler Setup
Using the Command Line
Specifying the Location of Compiler Components with compilervars
Invoking the Intel® Compiler
Using the Command Line on Windows*
Understanding File Extensions
Using Makefiles to Compile Your Application
Using Compiler Options
Specifying Include Files
Specifying Object Files
Specifying Assembly Files
Converting Projects to Use a Selected Compiler from the Command Line
Using Eclipse* (Linux*)
Adding the Compiler to Eclipse*
Multi-Version Compiler Support
Using Cheat Sheets
Creating a Simple Project
Creating a New Project
Adding a C Source File
Setting Options for a Project or File
Excluding Source Files from a Build
Building a Project
Running a Project
Intel® C/C++ Error Parser
Make Files
Project Types and Makefiles
Exporting Makefiles
Using Intel® Performance Libraries with Eclipse*
Using Microsoft Visual Studio* (Windows*)
Creating a New Project
Using the Intel® C++ Compiler
Building Your Intel® C++ Project
Selecting the Compiler Version
Switching Back to the Visual C++* Compiler
Selecting a Configuration
Specifying a Target Platform
Specifying Directory Paths
Specifying a Base Platform Toolset with the Intel® C++ Compiler
Using Property Pages
Using Intel® Performance Libraries with Microsoft Visual Studio*
Changing the Selected Intel® Performance Libraries
Including MPI Support
Using Guided Auto Parallelism in Microsoft Visual Studio*
Using Code Coverage in Microsoft Visual Studio*
Using Profile Guided Optimization in Microsoft Visual Studio*
Performing Parallel Project Builds
Optimization Reports: Enabling in Microsoft Visual Studio*
Optimization Reports: Viewing
Dialog Box Help
Options: Compilers dialog box
Options: Intel® Performance Libraries dialog box
Use Intel® C++ dialog box
Options: Guided Auto Parallelism dialog box
Profile Guided Optimization dialog box
Options: Profile Guided Optimization (PGO) dialog box
Configure Analysis dialog box
Options: Converter dialog box
Code Coverage dialog box
Options: Code Coverage dialog box
Code Coverage Settings dialog box
Options: Optimization Reports dialog box
Using Xcode* (macOS*)
Creating an Xcode* Project
Selecting the Intel® Compiler
Building the Target
Setting Compiler Options
Running the Executable
Using Intel® Performance Libraries with Xcode*
Compiler Reference
C/C++ Calling Conventions
Compiler Options
New Options
Alphabetical List of Compiler Options
Deprecated and Removed Compiler Options
Ways to Display Certain Option Information
Displaying Options Passed to Offload Compilation
Displaying General Option Information From the Command Line
Compiler Option Details
General Rules for Compiler Options
What Appears in the Compiler Option Descriptions
Offload Options
qoffload, Qoffload
qoffload-arch, Qoffload-arch
qoffload-attribute-target, Qoffload-attribute-target
qoffload-option, Qoffload-option
Optimization Options
falias, Oa
fast
fbuiltin, Oi
fdefer-pop
ffnalias, Ow
ffunction-sections
foptimize-sibling-calls
fprotect-parens, Qprotect-parens
Gf
GF
nolib-inline
O
Od
Ofast
Os
Ot
Ox
Code Generation Options
arch
ax, Qax
EH
fasynchronous-unwind-tables
fexceptions
fomit-frame-pointer, Oy
Gd
Gr
GR
guard
Gv
Gz
hotpatch
m
m32, m64, Q32, Q64
m80387
march
masm
mconditional-branch, Qconditional-branch
minstruction, Qinstruction
momit-leaf-frame-pointer
mregparm
mregparm-version
mstringop-inline-threshold, Qstringop-inline-threshold
mstringop-strategy, Qstringop-strategy
mtune, tune
qcf-protection, Qcf-protection
Qcxx-features
Qpatchable-addresses
Qsafeseh
regcall, Qregcall
x, Qx
xHost, QxHost
Interprocedural Optimization (IPO) Options
ffat-lto-objects
ip, Qip
ip-no-inlining, Qip-no-inlining
ip-no-pinlining, Qip-no-pinlining
ipo, Qipo
ipo-c, Qipo-c
ipo-jobs, Qipo-jobs
ipo-S, Qipo-S
ipo-separate, Qipo-separate
Advanced Optimization Options
alias-const, Qalias-const
ansi-alias, Qansi-alias
ansi-alias-check, Qansi-alias-check
cilk-serialize, Qcilk-serialize
complex-limited-range, Qcomplex-limited-range
daal, Qdaal
fargument-alias, Qalias-args
fargument-noalias-global
ffreestanding, Qfreestanding
fjump-tables
ftls-model
funroll-all-loops
guide, Qguide
guide-data-trans, Qguide-data-trans
guide-file, Qguide-file
guide-file-append, Qguide-file-append
guide-opts, Qguide-opts
guide-par, Qguide-par
guide-profile, Qguide-profile
guide-vec, Qguide-vec
ipp, Qipp
ipp-link, Qipp-link
mkl, Qmkl
qopt-args-in-regs, Qopt-args-in-regs
qopt-assume-safe-padding, Qopt-assume-safe-padding
qopt-block-factor, Qopt-block-factor
qopt-calloc, Qopt-calloc
qopt-class-analysis, Qopt-class-analysis
qopt-dynamic-align, Qopt-dynamic-align
qopt-jump-tables, Qopt-jump-tables
qopt-malloc-options
qopt-matmul, Qopt-matmul
qopt-mem-layout-trans, Qopt-mem-layout-trans
qopt-multi-version-aggressive, Qopt-multi-version-aggressive
qopt-prefetch, Qopt-prefetch
qopt-prefetch-distance, Qopt-prefetch-distance
qopt-prefetch-issue-excl-hint, Qopt-prefetch-issue-excl-hint
qopt-ra-region-strategy, Qopt-ra-region-strategy
qopt-streaming-stores, Qopt-streaming-stores
qopt-subscript-in-range, Qopt-subscript-in-range
qopt-threads-per-core, Qopt-threads-per-core
qopt-zmm-usage, Qopt-zmm-usage
qoverride-limits, Qoverride-limits
Qvla
scalar-rep, Qscalar-rep
simd, Qsimd
simd-function-pointers, Qsimd-function-pointers
tbb, Qtbb
unroll, Qunroll
unroll-aggressive, Qunroll-aggressive
use-intel-optimized-headers, Quse-intel-optimized-headers
vec, Qvec
vec-guard-write, Qvec-guard-write
vec-threshold, Qvec-threshold
vecabi, Qvecabi
Profile Guided Optimization (PGO) Options
finstrument-functions, Qinstrument-functions
fnsplit, Qfnsplit
Gh
GH
p
prof-data-order, Qprof-data-order
prof-dir, Qprof-dir
prof-file, Qprof-file
prof-func-groups
prof-func-order, Qprof-func-order
prof-gen, Qprof-gen
prof-gen-sampling
prof-hotness-threshold, Qprof-hotness-threshold
prof-src-dir, Qprof-src-dir
prof-src-root, Qprof-src-root
prof-src-root-cwd, Qprof-src-root-cwd
prof-use, Qprof-use
prof-use-sampling
prof-value-profiling, Qprof-value-profiling
profile-functions, Qprofile-functions
profile-loops, Qprofile-loops
profile-loops-report, Qprofile-loops-report
Qcov-dir
Qcov-file
Qcov-gen
Optimization Report Options
qopt-report, Qopt-report
qopt-report-annotate, Qopt-report-annotate
qopt-report-annotate-position, Qopt-report-annotate-position
qopt-report-embed, Qopt-report-embed
qopt-report-file, Qopt-report-file
qopt-report-filter, Qopt-report-filter
qopt-report-format, Qopt-report-format
qopt-report-help, Qopt-report-help
qopt-report-per-object, Qopt-report-per-object
qopt-report-phase, Qopt-report-phase
qopt-report-routine, Qopt-report-routine
qopt-report-names, Qopt-report-names
tcollect, Qtcollect
tcollect-filter, Qtcollect-filter
OpenMP* Options and Parallel Processing Options
fmpc-privatize
par-affinity, Qpar-affinity
par-num-threads, Qpar-num-threads
par-runtime-control, Qpar-runtime-control
par-schedule, Qpar-schedule
par-threshold, Qpar-threshold
parallel, Qparallel
parallel-source-info, Qparallel-source-info
qopenmp, Qopenmp
qopenmp-lib, Qopenmp-lib
qopenmp-link, Qopenmp-link
qopenmp-offload, Qopenmp-offload
qopenmp-simd, Qopenmp-simd
qopenmp-stubs, Qopenmp-stubs
qopenmp-threadprivate, Qopenmp-threadprivate
Qpar-adjust-stack
Floating-Point Options
fast-transcendentals, Qfast-transcendentals
fimf-absolute-error, Qimf-absolute-error
fimf-accuracy-bits, Qimf-accuracy-bits
fimf-arch-consistency, Qimf-arch-consistency
fimf-domain-exclusion, Qimf-domain-exclusion
fimf-force-dynamic-target, Qimf-force-dynamic-target
fimf-max-error, Qimf-max-error
fimf-precision, Qimf-precision
fimf-use-svml, Qimf-use-svml
fma, Qfma
fp-model, fp
fp-port, Qfp-port
fp-speculation, Qfp-speculation
fp-stack-check, Qfp-stack-check
fp-trap, Qfp-trap
fp-trap-all, Qfp-trap-all
ftz, Qftz
Ge
mp1, Qprec
pc, Qpc
prec-div, Qprec-div
prec-sqrt, Qprec-sqrt
rcd, Qrcd
Inlining Options
fgnu89-inline
finline
finline-functions
finline-limit
inline-calloc, Qinline-calloc
inline-factor, Qinline-factor
inline-forceinline, Qinline-forceinline
inline-level, Ob
inline-max-per-compile, Qinline-max-per-compile
inline-max-per-routine, Qinline-max-per-routine
inline-max-size, Qinline-max-size
inline-max-total-size, Qinline-max-total-size
inline-min-caller-growth, Qinline-min-caller-growth
inline-min-size, Qinline-min-size
Qinline-dllimport
Output, Debug, and Precompiled Header (PCH) Options
c
debug (Linux* OS and OS* X)
debug (Windows* OS)
Fa
FA
fasm-blocks
FC
fcode-asm
Fd
FD
Fe
feliminate-unused-debug-types, Qeliminate-unused-debug-types
femit-class-debug-always
fmerge-constants
fmerge-debug-strings
Fo
Fp
FR
fsource-asm
ftrapuv, Qtrapuv
fverbose-asm
g
gdwarf
Gm
grecord-gcc-switches
gsplit-dwarf
map-opts, Qmap-opts
o
pch
pch-create
pch-dir
pch-use
pdbfile
print-multi-lib
Qpchi
Quse-msasm-symbols
RTC
S
use-asm, Quse-asm
use-msasm
V
Y-
Yc
Yd
Yu
Zi, Z7, ZI
Zo
Preprocessor Options
A, QA
B
C
D
dD, QdD
dM, QdM
dN, QdN
E
EP
FI
gcc, gcc-sys
gcc-include-dir
H, QH
I
I-
icc
idirafter
imacros
iprefix
iquote
isystem
iwithprefix
iwithprefixbefore
Kc++, TP
M, QM
MD, QMD
MF, QMF
MG, QMG
MM, QMM
MMD, QMMD
MP (Linux* OS)
MQ
MT, QMT
nostdinc++
P
pragma-optimization-level
u (Windows* OS)
U
undef
X
Component Control Options
Qinstall
Qlocation
Qoption
Language Options
ansi
check
early-template-check
fblocks
ffriend-injection
fno-gnu-keywords
fno-implicit-inline-templates
fno-implicit-templates
fno-operator-names
fno-rtti
fnon-lvalue-assign
fpermissive
fshort-enums
fsyntax-only
ftemplate-depth, Qtemplate-depth
funsigned-bitfields
funsigned-char
GZ
H (Windows* OS)
help-pragma, Qhelp-pragma
intel-extensions, Qintel-extensions
J
restrict, Qrestrict
std, Qstd
strict-ansi
vd
vmb
vmg
vmm
vms
x (type option)
Za
Zc
Ze
Zg
Zp
Zs
Data Options
align
auto-ilp32, Qauto-ilp32
auto-p32
check-pointers, Qcheck-pointers
check-pointers-dangling, Qcheck-pointers-dangling
check-pointers-mpx, Qcheck-pointers-mpx
check-pointers-narrowing, Qcheck-pointers-narrowing
check-pointers-undimensioned, Qcheck-pointers-undimensioned
falign-functions, Qfnalign
falign-loops, Qalign-loops
falign-stack
fcommon
fextend-arguments, Qextend-arguments
fkeep-static-consts, Qkeep-static-consts
fmath-errno
fminshared
fmudflap
fpack-struct
fpascal-strings
fpic
fpie
freg-struct-return
fstack-protector
fstack-security-check
fvisibility
fvisibity-inlines-hidden
fzero-initialized-in-bss, Qzero-initialized-in-bss
GA
Gs
GS
GT
homeparams
malign-double
malign-mac68k
malign-natural
malign-power
mcmodel
mdynamic-no-pic
mlong-double
no-bss-init, Qnobss-init
noBool
Qlong-double
Qsfalign
Compiler Diagnostic Options
diag, Qdiag
diag-dump, Qdiag-dump
diag-enable=power, Qdiag-enable:power
diag-error-limit, Qdiag-error-limit
diag-file, Qdiag-file
diag-file-append, Qdiag-file-append
diag-id-numbers, Qdiag-id-numbers
diag-once, Qdiag-once
fnon-call-exceptions
traceback
w
w0...w5, W0...W5
Wabi
Wall
Wbrief
Wcheck
Wcomment
Wcontext-limit, Qcontext-limit
wd, Qwd
Wdeprecated
we, Qwe
Weffc++, Qeffc++
Werror, WX
Werror-all
Wextra-tokens
Wformat
Wformat-security
Wic-pointer
Winline
WL
Wmain
Wmissing-declarations
Wmissing-prototypes
wn, Qwn
Wnon-virtual-dtor
wo, Qwo
Wp64
Wpch-messages
Wpointer-arith
Wport
wr, Qwr
Wremarks
Wreorder
Wreturn-type
Wshadow
Wsign-compare
Wstrict-aliasing
Wstrict-prototypes
Wtrigraphs
Wuninitialized
Wunknown-pragmas
Wunused-function
Wunused-variable
ww, Qww
Wwrite-strings
Compatibility Options
clang-name
clangxx-name
fabi-version
fms-dialect
gcc-name
gnu-prefix
gxx-name
Qgcc-dialect
Qms
Qvc
stdlib
vmv
Linking or Linker Options
Bdynamic
Bstatic
Bsymbolic
Bsymbolic-functions
cxxlib
dynamic-linker
dynamiclib
F (Windows* OS)
F (OS* X)
fixed
Fm
fuse-ld
l
L
LD
link
MD
MT
no-libgcc
nodefaultlibs
nostartfiles
nostdlib
pie
pthread
shared
shared-intel
shared-libgcc
static
static-intel
static-libgcc
static-libstdc++
staticlib
T
u (Linux* OS)
v
Wa
Wl
Wp
Xlinker
Zl
Miscellaneous Options
bigobj
dryrun
dumpmachine
dumpversion
global-hoist, Qglobal-hoist
Gy
help
intel-freestanding
intel-freestanding-target-os
MP-force
multibyte-chars, Qmultibyte-chars
multiple-processes, MP
nologo
print-sysroot
save-temps, Qsave-temps
showIncludes
sox
sysroot
Tc
TC
Tp
V, QV
version
watch
Alternate Compiler Options
Related Options
Portability Options
GCC-Compatible Warning Options
Floating-Point Operations
Understanding Floating-Point Operations
Programming Tradeoffs in Floating-point Applications
Floating-point Optimizations
Using the -fp-model (/fp) Option
Denormal Numbers
Floating-Point Environment
Setting the FTZ and DAZ Flags
Checking the Floating-point Stack State
Tuning Performance
Overview: Tuning Performance
Handling Floating-point Array Operations in a Loop Body
Reducing the Impact of Denormal Exceptions
Avoiding Mixed Data Type Arithmetic Expressions
Using Efficient Data Types
Understanding IEEE Floating-Point Operations
Floating-Point Formats
Special Values
Attributes
align
align_value
avoid_false_share
code_align
concurrency_safe
const
cpu_dispatch
cpu_specific
mpx
target
vector
vector_variant
Intrinsics
Details about Intrinsics
Naming and Usage Syntax
Links and Bibliography
Intrinsics for All Intel® Architectures
Overview: Intrinsics across Intel® Architectures
Integer Arithmetic Intrinsics
Floating-point Intrinsics
String and Block Copy Intrinsics
Miscellaneous Intrinsics
_may_i_use_cpu_feature
_allow_cpu_features
Data Alignment, Memory Allocation Intrinsics, and Inline Assembly
Overview
Alignment Support
Allocating and Freeing Aligned Memory Blocks
Inline Assembly
Intrinsics for Managing Extended Processor States and Registers
Overview
Intrinsics for Reading and Writing the Content of Extended Control Registers
_xgetbv()
_xsetbv()
Intrinsics for Saving and Restoring the Extended Processor States
_fxsave()
_fxsave64()
_fxrstor()
_fxrstor64()
_xsave()/_xsavec()/_xsaves()
_xsave64()/ _xsavec64()/ _xsaves64()
_xsaveopt()
_xsaveopt64()
_xrstor()/xrstors()
_xrstor64()/xrstors64()
Intrinsics for the Short Vector Random Number Generator Library
Data Types and Calling Conventions
Usage Model
Engine Initialization and Finalization
svrng_new_rand0_engine/svrng_new_rand0_ex
svrng_new_rand_engine/svrng_new_rand_ex
svrng_new_mcg31m1_engine/svrng_new_mcg31m1_ex
svrng_new_mcg59_engine/svrng_new_mcg59_ex
svrng_new_mt19937_engine/svrng_new_mt19937_ex
svrng_delete_engine
Distribution Initialization and Finalization
svrng_new_uniform_distribution_[int|float|double]/svrng_update_uniform_distribution_[int|float|double]
svrng_new_normal_distribution_[float|double]/svrng_update_normal_distribution_[float|double]
svrng_delete_distribution
Random Values Generation
svrng_generate[1|2|4|8|16|32]_[uint|ulong]
svrng_generate[1|2|4|8|16|32]_[int|float|double]
Service Routines
Parallel Computation Support
svrng_copy_engine
svrng_skipahead_engine
svrng_leapfrog_engine
Error Handling
svrng_set_status
svrng_get_status
Intrinsics for Intel® Advanced Vector Extensions 512 (Intel® AVX-512) 4VNNIW Instructions
Intrinsics for Intel® Advanced Vector Extensions 512 (Intel® AVX-512) 4FMAPS Instructions
Intrinsics for Intel® Advanced Vector Extensions 512 (Intel® AVX-512) VPOPCNTDQ Instructions
Intrinsics for Intel® Advanced Vector Extensions 512 (Intel® AVX-512) Additional Instructions
Intrinsics for Arithmetic Operations
Intrinsics for Bit Manipulation Operations
Intrinsics for Comparison Operations
Intrinsics for Conversion Operations
Intrinsics for Load Operations
Intrinsics for Logical Operations
Intrinsics for Miscellaneous Operations
Intrinsics for Move Operations
Intrinsics for Set Operations
Intrinsics for Shift Operations
Intrinsics for Store Operations
Intrinsics for Intel® Advanced Vector Extensions 512 (Intel® AVX-512) Instructions
Overview: Intrinsics for Intel® Advanced Vector Extensions 512 (Intel® AVX-512) Instructions
Intrinsics for Arithmetic Operations
Intrinsics for Addition Operations
Intrinsics for FP Addition Operations
Intrinsics for Integer Addition Operations
Intrinsics for Determining Minimum and Maximum Values
Intrinsics for Determining Minimum and Maximum FP Values
Intrinsics for Determining Minimum and Maximum Integer Values
Intrinsics for FP Fused Multiply-Add (FMA) Operations
Intrinsics for Multiplication Operations
Intrinsics for FP Multiplication Operations
Intrinsics for Integer Multiplication Operations
Intrinsics for Subtraction Operations
Intrinsics for FP Subtraction Operations
Intrinsics for Integer Subtraction Operations
Intrinsics for Short Vector Math Library (SVML) Operations
Intrinsics for Division Operations
Intrinsics for Error Function Operations
Intrinsics for Exponential Operations
Intrinsics for Logarithmic Operations
Intrinsics for Reciprocal Operations
Intrinsics for Root Function Operations
Intrinsics for Rounding Operations
Intrinsics for Trigonometric Operations
Intrinsics for Other Mathematics Operations
Intrinsics for FP Division Operations
Intrinsics for Absolute Value Operations
Intrinsics for Scale Operations
Intrinsics for Blend Operations
Intrinsics for Bit Manipulation Operations
Intrinsics for Integer Bit Manipulation and Conflict Detection Operations
Intrinsics for Bitwise Logical Operations
Intrinsics for Integer Bit Rotation Operations
Intrinsics for Integer Bit Shift Operations
Intrinsics for Broadcast Operations
Intrinsics for FP Broadcast Operations
Intrinsics for Integer Broadcast Operations
Intrinsics for Comparison Operations
Intrinsics for FP Comparison Operations
Intrinsics for Integer Comparison Operations
Intrinsics for Compression Operations
Intrinsics for Conversion Operations
Intrinsics for FP Conversion Operations
Intrinsics for Integer Conversion Operations
Intrinsics for Expand and Load Operations
Intrinsics for FP Expand and Load Operations
Intrinsics for Integer Expand and Load Operations
Intrinsics for Gather and Scatter Operations
Intrinsics for FP Gather and Scatter Operations
Intrinsics for Integer Gather and Scatter Operations
Intrinsics for Insert and Extract Operations
Intrinsics for FP Insert and Extract Operations
Intrinsics for Integer Insert and Extract Operations
Intrinsics for Load and Store Operations
Intrinsics for FP Loads and Store Operations
Intrinsics for Integer Load and Store Operations
Intrinsics for Miscellaneous Operations
Intrinsics for Miscellaneous FP Operations
Intrinsics for Miscellaneous Integer Operations
Intrinsics for Move Operations
Intrinsics for FP Move Operations
Intrinsics for Integer Move Operations
Intrinsics for Pack and Unpack Operations
Intrinsics for FP Pack and Unpack Operations
Intrinsics for Integer Pack and Unpack Operations
Intrinsics for Permutation Operations
Intrinsics for FP Permutation Operations
Intrinsics for Integer Permutation Operations
Intrinsics for Reduction Operations
Intrinsics for FP Reduction Operations
Intrinsics for Integer Reduction Operations
Intrinsics for Set Operations
Intrinsics for Shuffle Operations
Intrinsics for FP Shuffle Operations
Intrinsics for Integer Shuffle Operations
Intrinsics for Test Operations
Intrinsics for Typecast Operations
Intrinsics for Vector Mask Operations
Intrinsics for Later Generation Intel® Core™ Processor Instruction Extensions
Overview: Intrinsics for 3rd Generation Intel® Core™ Processor Instruction Extensions
Overview: Intrinsics for 4th Generation Intel® Core™ Processor Instruction Extensions
Intrinsics for Converting Half Floats that Map to 3rd Generation Intel® Core™ Processor Instructions
_mm_cvtph_ps()
_mm256_cvtph_ps()
_mm_cvtps_ph()
_mm256_cvtps_ph()
Intrinsics that Generate Random Numbers of 16/32/64 Bit Wide Random Integers
_rdrand16_step(), _rdrand32_step(), _rdrand64_step()
_rdseed16_step/ _rdseed32_step/ _rdseed64_step
Intrinsics for Multi-Precision Arithmetic
_addcarry_u32(), _addcarry_u64()
_addcarryx_u32(), _addcarryx_u64()
_subborrow_u32(), _subborrow_u64()
Intrinsics that Allow Reading from and Writing to the FS Base and GS Base Registers
_readfsbase_u32(), _readfsbase_u64()
_readgsbase_u32(), _readgsbase_u64()
_writefsbase_u32(), _writefsbase_u64()
_writegsbase_u32(), _writegsbase_u64()
Intrinsics for Intel® Advanced Vector Extensions 2
Overview: Intrinsics for Intel® Advanced Vector Extensions 2 (Intel® AVX2) Instructions
Intrinsics for Arithmetic Operations
_mm256_abs_epi8/16/32
_mm256_add_epi8/16/32/64
_mm256_adds_epi8/16
_mm256_adds_epu8/16
_mm256_sub_epi8/16/32/64
_mm256_subs_epi8/16
_mm256_subs_epu8/16
_mm256_avg_epu8/16
_mm256_hadd_epi16/32
_mm256_hadds_epi16
_mm256_hsub_epi16/32
_mm256_hsubs_epi16
_mm256_madd_epi16
_mm256_maddubs_epi16
_mm256_mul_epi32
_mm256_mul_epu32
_mm256_mulhi_epi16
_mm256_mulhi_epu16
_mm256_mullo_epi16/32
_mm256_mulhrs_epi16
_mm256_sign_epi8/16/32
_mm256_mpsadbw_epu8
_mm256_sad_epu8
Intrinsics for Arithmetic Shift Operations
_mm256_sra_epi16/32
_mm256_srai_epi16/32
_mm256_srav_epi32
_mm_srav_epi32
Intrinsics for Blend Operations
_mm_blend_epi32, _mm256_blend_epi16/32
_mm256_blendv_epi8
Intrinsics for Bitwise Operations
_mm256_and_si256
_mm256_andnot_si256
_mm256_or_si256
_mm256_xor_si256
Intrinsics for Broadcast Operations
_mm_broadcastss_ps, _mm256_broadcastss_ps
_mm256_broadcastsd_pd
_mm_broadcastb_epi8, _mm256_broadcastb_epi8
_mm_broadcastw_epi16, _mm256_broadcastw_epi16
_mm_broadcastd_epi32, _mm256_broadcastd_epi32
_mm_broadcastq_epi64, _mm256_broadcastq_epi64
_mm256_broadcastsi128_si256
Intrinsics for Compare Operations
_mm256_cmpeq_epi8/16/32/64
_mm256_cmpgt_epi8/16/32/64
_mm256_max_epi8/16/32
_mm256_max_epu8/16/32
_mm256_min_epi8/16/32
_mm256_min_epu8/16/32
Intrinsics for Fused Multiply Add Operations
_mm_fmadd_pd, _mm256_fmadd_pd
_mm_fmadd_ps, _mm256_fmadd_ps
_mm_fmadd_sd
_mm_fmadd_ss
_mm_fmaddsub_pd, _mm256_fmaddsub_pd
_mm_fmaddsub_ps, _mm256_fmaddsub_ps
_mm_fmsub_pd, _mm256_fmsub_pd
_mm_fmsub_ps, _mm256_fmsub_ps
_mm_fmsub_sd
_mm_fmsub_ss
_mm_fmsubadd_pd, _mm256_fmsubadd_pd
_mm_fmsubadd_ps, _mm256_fmsubadd_ps
_mm_fnmadd_pd, _mm256_fnmadd_pd
_mm_fnmadd_ps, _mm256_fnmadd_ps
_mm_fnmadd_sd
_mm_fnmadd_ss
_mm_fnmsub_pd, _mm256_fnmsub_pd
_mm_fnmsub_ps, _mm256_fnmsub_ps
_mm_fnmsub_sd
_mm_fnmsub_ss
Intrinsics for GATHER Operations
_mm_mask_i32gather_pd, _mm256_mask_i32gather_pd
_mm_i32gather_pd, _mm256_i32gather_pd
_mm_mask_i64gather_pd, _mm256_mask_i64gather_pd
_mm_i64gather_pd, _mm256_i64gather_pd
_mm_mask_i32gather_ps, _mm256_mask_i32gather_ps
_mm_i32gather_ps, _mm256_i32gather_ps
_mm_mask_i64gather_ps, _mm256_mask_i64gather_ps
_mm_i64gather_ps, _mm256_i64gather_ps
_mm_mask_i32gather_epi32, _mm256_mask_i32gather_epi32
_mm_i32gather_epi32, _mm256_i32gather_epi32
_mm_mask_i32gather_epi64,_mm256_mask_i32gather_epi64
_mm_i32gather_epi64,_mm256_i32gather_epi64
_mm_mask_i64gather_epi32,_mm256_mask_i64gather_epi32
_mm_i64gather_epi32,_mm256_i64gather_epi32
_mm_mask_i64gather_epi64,_mm256_mask_i64gather_epi64
_mm_i64gather_epi64,_mm256_i64gather_epi64
Intrinsics for Logical Shift Operations
_mm256_sll_epi16/32/64
_mm256_slli_epi16/32/64
_mm256_sllv_epi32/64
_mm_sllv_epi32/64
_mm256_slli_si256
_mm256_srli_si256
_mm256_srl_epi16/32/64
_mm256_srli_epi16/32/64
_mm256_srlv_epi32/64
_mm_srlv_epi32/64
Intrinsics for Insert/Extract Operations
_mm256_inserti128_si256
_mm256_extracti128_si256
_mm256_insert_epi8/16/32/64
_mm256_extract_epi8/16/32/64
Intrinsics for Masked Load/Store Operations
_mm_maskload_epi32/64, _mm256_maskload_epi32/64
_mm_maskstore_epi32/64, _mm256_maskstore_epi32/64
Intrinsics for Miscellaneous Operations
_mm256_alignr_epi8
_mm256_movemask_epi8
_mm256_stream_load_si256
Intrinsics for Operations to Manipulate Integer Data at Bit-Granularity
_bextr_u32/64
_blsi_u32/64
_blsmsk_u32/64
_blsr_u32/64
_bzhi_u32/64
_pext_u32/64
_pdep_u32/64
_lzcnt_u32/64
_tzcnt_u32/64
Intrinsics for Pack/Unpack Operations
_mm256_packs_epi16/32
_mm256_packus_epi16/32
_mm256_unpackhi_epi8/16/32/64
_mm256_unpacklo_epi8/16/32/64
Intrinsics for Packed Move with Extend Operations
_mm256_cvtepi8_epi16/32/64
_mm256_cvtepi16_epi32/64
_mm256_cvtepi32_epi64
_mm256_cvtepu8_epi16/32/64
_mm256_cvtepu16_epi32/64
_mm256_cvtepu32_epi64
Intrinsics for Permute Operations
_mm256_permutevar8x32_epi32
_mm256_permutevar8x32_ps
_mm256_permute4x64_epi64
_mm256_permute4x64_pd
_mm256_permute2x128_si256
Intrinsics for Shuffle Operations
_mm256_shuffle_epi8
_mm256_shuffle_epi32
_mm256_shufflehi_epi16
_mm256_shufflelo_epi16
Intrinsics for Intel® Transactional Synchronization Extensions (Intel® TSX)
Intel® Transactional Synchronization Extensions (Intel® TSX) Overview
Intel® Transactional Synchronization Extensions (Intel® TSX) Programming Considerations
Restricted Transactional Memory Intrinsics
Restricted Transactional Memory Overview
_xtest
_xbegin
_xend
_xabort
Hardware Lock Elision Intrinsics (Windows*)
Hardware Lock Elision Overview
HLE Acquire _InterlockedCompareExchange Functions
HLE Acquire _InterlockedExchangeAdd Functions
HLE Release _InterlockedCompareExchange Functions
HLE Release _InterlockedExchangeAdd Functions
HLE Release _Store Functions
Function Prototype and Macro Definitions
Intrinsics for Intel® Advanced Vector Extensions
Overview
Details of Intel® AVX Intrinsics and FMA Intrinsics
Intrinsics for Arithmetic Operations
_mm256_add_pd
_mm256_add_ps
_mm256_addsub_pd
_mm256_addsub_ps
_mm256_hadd_pd
_mm256_hadd_ps
_mm256_sub_pd
_mm256_sub_ps
_mm256_hsub_pd
_mm256_hsub_ps
_mm256_mul_pd
_mm256_mul_ps
_mm256_div_pd
_mm256_div_ps
_mm256_dp_ps
_mm256_sqrt_pd
_mm256_sqrt_ps
_mm256_rsqrt_ps
_mm256_rcp_ps
Intrinsics for Bitwise Operations
_mm256_and_pd
_mm256_and_ps
_mm256_andnot_pd
_mm256_andnot_ps
_mm256_or_pd
_mm256_or_ps
_mm256_xor_pd
_mm256_xor_ps
Intrinsics for Blend and Conditional Merge Operations
_mm256_blend_pd
_mm256_blend_ps
_mm256_blendv_pd
_mm256_blendv_ps
Intrinsics for Compare Operations
_mm_cmp_pd, _mm256_cmp_pd
_mm_cmp_ps, _mm256_cmp_ps
_mm_cmp_sd
_mm_cmp_ss
Intrinsics for Conversion Operations
_mm256_cvtepi32_pd
_mm256_cvtepi32_ps
_mm256_cvtpd_epi32
_mm256_cvtps_epi32
_mm256_cvtpd_ps
_mm256_cvtps_pd
_mm256_cvttp_epi32
_mm256_cvttps_epi32
_mm256_cvtsi256_si32
_mm256_cvtsd_f64
_mm256_cvtss_f32
Intrinsics to Determine Minimum and Maximum Values
_mm256_max_pd
_mm256_max_ps
_mm256_min_pd
_mm256_min_ps
Intrinsics for Load and Store Operations
_mm256_broadcast_pd
_mm256_broadcast_ps
_mm256_broadcast_sd
_mm256_broadcast_ss, _mm_broadcast_ss
_mm256_load_pd
_mm256_load_ps
_mm256_load_si256
_mm256_loadu_pd
_mm256_loadu_ps
_mm256_loadu_si256
_mm256_maskload_pd, _mm_maskload_pd
_mm256_maskload_ps, _mm_maskload_ps
_mm256_store_pd
_mm256_store_ps
_mm256_store_si256
_mm256_storeu_pd
_mm256_storeu_ps
_mm256_storeu_si256
_mm256_stream_pd
_mm256_stream_ps
_mm256_stream_si256
_mm256_maskstore_pd, _mm_maskstore_pd
_mm256_maskstore_ps, _mm_maskstore_ps
Intrinsics for Miscellaneous Operations
_mm256_extractf128_pd
_mm256_extractf128_ps
_mm256_extractf128_si256
_mm256_insertf128_pd
_mm256_insertf128_ps
_mm256_insertf128_si256
_mm256_lddqu_si256
_mm256_movedup_pd
_mm256_movehdup_ps
_mm256_moveldup_ps
_mm256_movemask_pd
_mm256_movemask_ps
_mm256_round_pd
_mm256_round_ps
_mm256_set_pd
_mm256_set_ps
_mm256_set_epi8/16/32/64x
_mm256_setr_pd
_mm256_setr_ps
_mm256_setr_epi32
_mm256_set1_pd
_mm256_set1_ps
_mm256_set1_epi32
_mm256_setzero_pd
_mm256_setzero_ps
_mm256_setzero_si256
_mm256_zeroall
_mm256_zeroupper
Intrinsics for Packed Test Operations
_mm256_testz_si256
_mm256_testc_si256
_mm256_testnzc_si256
_mm256_testz_pd, _mm_testz_pd
_mm256_testz_ps, _mm_testz_ps
_mm256_testc_pd, _mm_testc_pd
_mm256_testc_ps, _mm_testc_ps
_mm256_testnzc_pd, _mm_testnzc_pd
_mm256_testnzc_ps, _mm_testnzc_ps
Intrinsics for Permute Operations
_mm256_permute_pd, _mm_permute_pd
_mm256_permute_ps, _mm_permute_ps
_mm256_permutevar_pd, _mm_permutevar_pd
_mm_permutevar_ps, _mm256_permutevar_ps
_mm256_permute2f128_pd
_mm256_permute2f128_ps
_mm256_permute2f128_si256
Intrinsics for Shuffle Operations
_mm256_shuffle_pd
_mm256_shuffle_ps
Intrinsics for Unpack and Interleave Operations
_mm256_unpackhi_pd
_mm256_unpackhi_ps
_mm256_unpacklo_pd
_mm256_unpacklo_ps
Support Intrinsics for Vector Typecasting Operations
_mm256_castpd_ps
_mm256_castps_pd
_mm256_castpd_si256
_mm256_castps_si256
_mm256_castsi256_pd
_mm256_castsi256_ps
_mm256_castpd128_pd256
_mm256_castpd256_pd128
_mm256_castps128_ps256
_mm256_castps256_ps128
_mm256_castsi128_si256
_mm256_castsi256_si128
Intrinsics Generating Vectors of Undefined Values
_mm256_undefined_ps()
_mm256_undefined_pd()
_mm256_undefined_si256
Intrinsics for Intel® Streaming SIMD Extensions 4 (Intel® SSE4)
Overview
Efficient Accelerated String and Text Processing
Overview
Packed Compare Intrinsics
Application Targeted Accelerators Intrinsics
Vectorizing Compiler and Media Accelerators
Overview: Vectorizing Compiler and Media Accelerators
Packed Blending Intrinsics
Floating Point Dot Product Intrinsics
Packed Format Conversion Intrinsics
Packed Integer Min/Max Intrinsics
Floating Point Rounding Intrinsics
DWORD Multiply Intrinsics
Register Insertion/Extraction Intrinsics
Test Intrinsics
Packed DWORD to Unsigned WORD Intrinsic
Packed Compare for Equal Intrinsic
Cacheability Support Intrinsic
Intrinsics for Intel® Supplemental Streaming SIMD Extensions 3 (SSSE3)
Overview
Addition Intrinsics
Subtraction Intrinsics
Multiplication Intrinsics
Absolute Value Intrinsics
Shuffle Intrinsics
Concatenate Intrinsics
Negation Intrinsics
Intrinsics for Intel® Streaming SIMD Extensions 3 (Intel® SSE3)
Overview
Integer Vector Intrinsic
Single-precision Floating-point Vector Intrinsics
Double-precision Floating-point Vector Intrinsics
Miscellaneous Intrinsics
Intrinsics for Intel® Streaming SIMD Extensions 2 (Intel® SSE2)
Overview
Macro Functions
Floating-point Intrinsics
Arithmetic Intrinsics
Logical Intrinsics
Compare Intrinsics
Conversion Intrinsics
Load Intrinsics
Set Intrinsics
Store Intrinsics
Integer Intrinsics
Arithmetic Intrinsics
Logical Intrinsics
Shift Intrinsics
Compare Intrinsics
Conversion Intrinsics
Move Intrinsics
Load Intrinsics
Set Intrinsics
Store Intrinsics
Miscellaneous Functions and Intrinsics
Cacheability Support Intrinsics
Miscellaneous Intrinsics
Casting Support Intrinsics
Pause Intrinsic
Macro Function for Shuffle
Intrinsics Returning Vectors of Undefined Values
Intrinsics for Intel® Streaming SIMD Extensions (Intel® SSE)
Overview
Details about Intel® Streaming SIMD Extensions Intrinsics
Writing Programs with Intel® Streaming SIMD Extensions (Intel® SSE) Intrinsics
Arithmetic Intrinsics
Logical Intrinsics
Compare Intrinsics
Conversion Intrinsics
Load Intrinsics
Set Intrinsics
Store Intrinsics
Cacheability Support Intrinsics
Integer Intrinsics
Intrinsics to Read and Write Registers
Miscellaneous Intrinsics
Macro Functions
Macro Function for Shuffle Operations
Macro Functions to Read and Write Control Registers
Macro Function for Matrix Transposition
Intrinsics for MMX™ Technology
Overview
Details about MMX™ Technology Intrinsics
The EMMS Instruction: Why You Need It
EMMS Usage Guidelines
General Support Intrinsics
Packed Arithmetic Intrinsics
Shift Intrinsics
Logical Intrinsics
Compare Intrinsics
Set Intrinsics
Intrinsics for Advanced Encryption Standard Implementation
Overview
Intrinsics for Carry-less Multiplication Instruction and Advanced Encryption Standard Instructions
Intrinsics for Converting Half Floats
Overview
Intrinsics for Converting Half Floats
Intrinsics for Short Vector Math Library Operations
Overview
Intrinsics for Division Operations
_mm_div_epi8/ _mm256_div_epi8
_mm_div_epi16/ _mm256_div_epi16
_mm_div_epi32/ _mm256_div_epi32
_mm_div_epi64/ _mm256_div_epi64
_mm_div_epu8/ _mm256_div_epu8
_mm_div_epu16/ _mm256_div_epu16
_mm_div_epu32/ _mm256_div_epu32
_mm_div_epu64/ _mm256_div_epu64
_mm_rem_epi8/ _mm256_rem_epi8
_mm_rem_epi16/ _mm256_rem_epi16
_mm_rem_epi32/ _mm256_rem_epi32
_mm_rem_epi64/ _mm256_rem_epi64
_mm_rem_epu8/ _mm256_rem_epu8
_mm_rem_epu16/ _mm256_rem_epu16
_mm_rem_epu32/ _mm256_rem_epu32
_mm_rem_epu64/ _mm256_rem_epu64
Intrinsics for Error Function Operations
_mm_cdfnorminv_pd, _mm256_cdfnorminv_pd
_mm_cdfnorminv_ps, _mm256_cdfnorminv_ps
_mm_erf_pd, _mm256_erf_pd
_mm_erf_ps, _mm256_erf_ps
_mm_erfc_pd, _mm256_erfc_pd
_mm_erfc_ps, _mm256_erfc_ps
_mm_erfinv_pd, _mm256_erfinv_pd
_mm_erfinv_ps, _mm256_erfinv_ps
Intrinsics for Exponential Operations
_mm_exp2_pd, _mm256_exp2_pd
_mm_exp2_ps, _mm256_exp2_ps
_mm_exp_pd, _mm256_exp_pd
_mm_exp_ps, _mm256_exp_ps
_mm_exp10_pd, _mm256_exp10_pd
_mm_exp10_ps, _mm256_exp10_ps
_mm_expm1_pd, _mm256_expm1_pd
_mm_expm1_ps, _mm256_expm1_ps
_mm_cexp_ps, _mm256_cexp_ps
_mm_pow_pd, _mm256_pow_pd
_mm_pow_ps, _mm256_pow_ps
_mm_hypot_pd, _mm256_hypot_pd
_mm_hypot_ps, _mm256_hypot_ps
Intrinsics for Logarithmic Operations
_mm_log2_pd, _mm256_log2_pd
_mm_log2_ps, _mm256_log2_ps
_mm_log10_pd, _mm256_log10_pd
_mm_log10_ps, _mm256_log10_ps
_mm_log_pd, _mm256_log_pd
_mm_log_ps, _mm256_log_ps
_mm_logb_pd, _mm256_logb_pd
_mm_logb_ps, _mm256_logb_ps
_mm_log1p_pd, _mm256_log1p_pd
_mm_log1p_ps, _mm256_log1p_ps
_mm_clog_ps, _mm256_clog_ps
Intrinsics for Square Root and Cube Root Operations
_mm_sqrt_pd, _mm256_sqrt_pd
_mm_sqrt_ps, _mm256_sqrt_ps
_mm_invsqrt_pd, _mm256_invsqrt_pd
_mm_invsqrt_ps, _mm256_invsqrt_ps
_mm_cbrt_pd, _mm256_cbrt_pd
_mm_cbrt_ps, _mm256_cbrt_ps
_mm_invcbrt_pd, _mm256_invcbrt_pd
_mm_invcbrt_ps, _mm256_invcbrt_ps
_mm_csqrt_ps, _mm256_csqrt_ps
Intrinsics for Trigonometric Operations
_mm_acos_pd, _mm256_acos_pd
_mm_acos_ps, _mm256_acos_ps
_mm_acosh_pd, _mm256_acosh_pd
_mm_acosh_ps, _mm256_acosh_ps
_mm_asin_pd, _mm256_asin_pd
_mm_asin_ps, _mm256_asin_ps
_mm_asinh_pd, _mm256_asinh_pd
_mm_asinh_ps, _mm256_asinh_ps
_mm_atan_pd, _mm256_atan_pd
_mm_atan_ps, _mm256_atan_ps
_mm_atan2_pd, _mm256_atan2_pd
_mm_atan2_ps, _mm256_atan2_ps
_mm_atanh_pd, _mm256_atanh_pd
_mm_atanh_ps, _mm256_atanh_ps
_mm_cos_pd, _mm256_cos_pd
_mm_cos_ps, _mm256_cos_ps
_mm_cosd_pd, _mm256_cosd_pd
_mm_cosd_ps, _mm256_cosd_ps
_mm_cosh_pd, _mm256_cosh_pd
_mm_cosh_ps, _mm256_cosh_ps
_mm_sin_pd, _mm256_sin_pd
_mm_sin_ps, _mm256_sin_ps
_mm_sind_pd, _mm256_sind_pd
_mm_sind_ps, _mm256_sind_ps
_mm_sinh_pd, _mm256_sinh_pd
_mm_sinh_ps, _mm256_sinh_ps
_mm_tan_pd, _mm256_tan_pd
_mm_tan_ps, _mm256_tan_ps
_mm_tand_pd, _mm256_tand_pd
_mm_tand_ps, _mm256_tand_ps
_mm_tanh_pd, _mm256_tanh_pd
_mm_tanh_ps, _mm256_tanh_ps
_mm_sincos_pd, _mm256_sincos_pd
_mm_sincos_ps, _mm256_sincos_ps
Libraries
Creating Libraries
Using Intel Shared Libraries
Using Shared Libraries on macOS*
Managing Libraries
Redistributing Libraries When Deploying Applications
Introduction to the SIMD Data Layout Templates
Usage Guidelines: Function Calls and Containers
Constructing an n_container
Bounds
User-Level Interface
SDLT Primitives (SDLT_PRIMITIVE)
soa1d_container
aos1d_container
access_by
n_container
Layouts
Shape
n_extent_generator
make_ n_container template function
extent_d template function
Bounds
bounds_t
sdlt::bounds Template Function
n_bounds_t
n_bounds_generator
bounds_d Template Function
Accessors
soa1d_container::accessor and aos1d_container::accessor
soa1d_container::const_accessor and aos1d_container::const_accessor
Accessor Concept
Proxy Objects
Proxy
ConstProxy
Number Representation
aligned_offset
fixed_offset
Indexes
linear_index
n_index_t (needs new content)
n_index_generator
index_d template function
Convenience and Correctness
max_val
min_val
Examples
Example 1
Example 2
Example 3
Example 4
Example 5
Intel® C++ Class Libraries
C++ Classes and SIMD Operations
Capabilities of C++ SIMD Classes
Integer Vector Classes
Terms, Conventions, and Syntax Defined
Rules for Operators
Assignment Operator
Logical Operators
Addition and Subtraction Operators
Multiplication Operators
Shift Operators
Comparison Operators
Conditional Select Operators
Debug Operations
Unpack Operators
Pack Operators
Clear MMX™ State Operator
Integer Functions for Streaming SIMD Extensions
Conversions between Fvec and Ivec
Floating-point Vector Classes
Fvec Notation Conventions
Data Alignment
Conversions
Constructors and Initialization
Arithmetic Operators
Minimum and Maximum Operators
Logical Operators
Compare Operators
Conditional Select Operators for Fvec Classes
Cacheability Support Operators
Debug Operations
Load and Store Operators
Unpack Operators
Move Mask Operators
Classes Quick Reference
Programming Example
C++ Library Extensions
Intel's valarray Implementation
Using Intel's valarray Implementation
Intel's C++ Asynchronous I/O Extensions for Windows* Operating Systems
Intel's C++ Asynchronous I/O Library for Windows* Operating Systems
aio_read
aio_write
Example for aio_read and aio_write Functions
aio_suspend
Example for aio_suspend Function
aio_error
aio_return
Example for aio_error and aio_return Functions
aio_fsync
aio_cancel
Example for aio_cancel Function
lio_listio
Example for lio_listio Function
Handling Errors Caused by Asynchronous I/O Functions
Intel's C++ Asynchronous I/O Class for Windows* Operating Systems
Template Class async_class
get_last_operation_id
wait
get_status
get_last_error
get_error_operation_id
stop_queue
resume_queue
clear_queue
Example for Using async_class Template Class
IEEE 754-2008 Binary Floating-Point Conformance Library
Overview: Intel® IEEE 754-2008 Binary Floating-Point Conformance Library
Using the Intel® IEEE 754-2008 Binary Floating-Point Conformance Library
Function List
Homogeneous General-Computational Operations Functions
formatOf General-Computational Operations Functions
Quiet-Computational Operations Functions
Signaling-Computational Operations Functions
Non-Computational Operations Functions
Intel's Numeric String Conversion Library
Overview: Intel's Numeric String Conversion Library
Function List
Macros
ISO Standard Predefined Macros
Additional Predefined Macros
Pragmas
Intel-specific Pragma Reference
alloc_section
block_loop/noblock_loop
cilk grainsize
code_align
distribute_point
inline, noinline, forceinline
intel_omp_task
intel_omp_taskq
ivdep
loop_count
nofusion
novector
offload
offload_attribute
offload_transfer
offload_wait
optimize
optimization_level
optimization_parameter
parallel/noparallel
prefetch/noprefetch
simd
simdoff
unroll/nounroll
unroll_and_jam/nounroll_and_jam
unused
vector
Intel-supported Pragma Reference
Error Handling
Remarks, Warnings, and Errors
Compilation
Supported Environment Variables
Compilation Phases
Passing Options to the Linker
Linking Tools and Options
Specifying Alternate Tools and Paths
Using Configuration Files
Using Response Files
Global Symbols and Visibility Attributes (Linux* and macOS*)
Specifying Symbol Visibility Explicitly (Linux* and macOS*)
Saving Compiler Information in Your Executable
Linking Debug Information
Optimization and Programming Guide
OpenMP* Support
OpenMP* Source Compatibility and Interoperability with Other Compilers
Adding OpenMP* Support to your Application
Parallel Processing Model
Worksharing Using OpenMP*
Controlling Thread Allocation
OpenMP* Pragmas Summary
OpenMP* Library Support
OpenMP* Run-time Library Routines
Intel® Compiler Extension Routines to OpenMP*
OpenMP* Support Libraries
Using the OpenMP* Libraries
Thread Affinity Interface (Linux* and Windows*)
OpenMP* Advanced Issues
OpenMP* Implementation-Defined Behaviors
OpenMP* Examples
Intel(R) Cilk(TM) Plus
Summary of Intel® Cilk™ Plus Language Features
Debugging an Intel® Cilk™ Plus Program
Intel® Cilk Plus™ Keywords
cilk_spawn
cilk_sync
cilk_for
Intel® Cilk™ Plus Execution Model
Strands
Pedigrees
Work and Span
Mapping Strands to Workers
Exception Handling
Reducers
Using Reducers - A Simple Example
How Reducers Work
Safety, Correctness, and Performance
Reducer Library
Using Reducers - More Examples
Advanced Topic: How to Write a New Reducer
Holders
Using Holders - An Example
Holder Syntax
Operating System Considerations
Intel® Cilk™ Plus Run Time System API
__cilkrts_bump_loop_rank
__cilkrts_bump_worker_rank
__cilkrts_end_cilk
__cilkrts_get_nworkers
__cilkrts_get_pedigree
__cilkrts_get_total_workers
__cilkrts_get_worker_number
__cilkrts_init
__cilkrts_set_param
Understanding Race Conditions
Using Locks
Locks Cause Determinancy Races
Deadlocks
Lock Contention
Holding a Lock Across a Strand Boundary
Performance Considerations for Intel(R) Cilk(TM) Plus Programs
Granularity
Optimize the Serial Program
Timing Programs and Program Segments
Common Performance Pitfalls
Cache Efficiency and Bandwidth
False Sharing
Memory Allocation Bottlenecks
Glossary
Automatic Parallelization
Enabling Auto-parallelization
Programming with Auto-parallelization
Enabling Further Loop Parallelization for Multicore Platforms
Language Support for Auto-parallelization
Vectorization
Automatic Vectorization
Automatic Vectorization Overview
Programming Guidelines for Vectorization
Using Automatic Vectorization
Vectorization and Loops
Loop Constructs
Explicit Vector Programming
User-Mandated or SIMD Vectorization
SIMD-Enabled Functions
SIMD-Enabled Function Pointers
Vectorizing a Loop Using the _Simd Keyword
Function Annotations and the SIMD Directive for Vectorization
Array Notations
C/C++ Extensions for Array Notations Overview
C/C++ Extensions for Array Notations Programming Model
Guided Auto Parallelism
Using Guided Auto Parallelism
Guided Auto Parallelism Messages
GAP Message (Diagnostic ID 30506)
GAP Message (Diagnostic ID 30513)
GAP Message (Diagnostic ID 30515)
GAP Message (Diagnostic ID 30519)
GAP Message (Diagnostic ID 30521)
GAP Message (Diagnostic ID 30522)
GAP Message (Diagnostic ID 30523)
GAP Message (Diagnostic ID 30525)
GAP Message (Diagnostic ID 30526)
GAP Message (Diagnostic ID 30528)
GAP Message (Diagnostic ID 30531)
GAP Message (Diagnostic ID 30532)
GAP Message (Diagnostic ID 30533)
GAP Message (Diagnostic ID 30534)
GAP Message (Diagnostic ID 30535)
GAP Message (Diagnostic ID 30536)
GAP Message (Diagnostic ID 30537)
GAP Message (Diagnostic ID 30538)
GAP Message (Diagnostic ID 30753)
GAP Message (Diagnostic ID 30754)
GAP Message (Diagnostic ID 30755)
GAP Message (Diagnostic ID 30756)
GAP Message (Diagnostic ID 30757)
GAP Message (Diagnostic ID 30758)
GAP Message (Diagnostic ID 30759)
GAP Message (Diagnostic ID 30760)
Intel® Many Integrated Core Architecture (Intel® MIC Architecture)
Programming for Intel® MIC Architecture
Overview: Heterogeneous Programming
Dealing with Multiple Coprocessors in a System
Offload Using a Pragma
Initiating an Offload
Placing Variables and Functions on the Coprocessor
Managing Memory Allocation for Pointer Variables
Device-Only Memory Allocation
Writing Target-Specific Code Using a Pragma
Writing Code that Should Not Be Built for CPU-Only Execution
Allocating Memory for Parts of Arrays
Moving Data from One Variable to Another
Restrictions on Offloaded Code Using a Pragma
Offload Using Shared Virtual Memory
Using Shared Memory
_Cilk_offload
_Cilk_shared
Rules for Using _Cilk_shared and _Cilk_offload
Synchronization Between the CPU and the Target
Writing Target-Specific Code with _Cilk_offload
Restrictions on Offloaded Code Using Shared Virtual Memory
Restrictions When Programming on Windows* for Intel® MIC Architecture
About Asynchronous Computation (Intel® MIC Architecture)
About Asynchronous Data Transfer (Intel® MIC Architecture)
Offload Using Streams
Applying the target Attribute to Multiple Declarations
Controlling the Coprocessor's Execution Environment
Environment Variable for I/O Proxy Control for Offloaded Code
Calling Functions on the CPU to Modify the Coprocessor's Execution Environment
Using Libraries With Offloaded Code
Special Cases
OpenMP* Considerations
OpenMP* Defaults
OpenMP* Affinity Specifications
Balanced Affinity Type
Setting the Number of OpenMP* Threads on the Coprocessor
Setting Environment Variables on the CPU to Modify the Coprocessor's Execution Environment
Data Alignment for Intel® MIC Architecture
Generating an Offload Report
_Offload_report
Calling exit() From an Offload Region
Building for Intel® MIC Architecture
Setting Stack Size on Coprocessors
Appending Archiver Options for Creating Libraries
Appending Linker Options
Logging Stdout and Stderr from Offloaded Code
About Building Native Intel® MIC Architecture Applications
Profile-Guided Optimization (PGO)
Profile-Guided Optimization via HW counters
Profile an Application with Instrumentation
Profile Function or Loop Execution Time
Profile-Guided Optimization Report
PGO API Support
Resetting Profile Information
Dumping Profile Information
Interval Profile Dumping
Resetting the Dynamic Profile Counters
Dumping and Resetting Profile Information
Getting Coverage Summary Information on Demand
High-Level Optimization (HLO)
Interprocedural Optimization (IPO)
Using IPO
IPO-Related Performance Issues
IPO for Large Programs
Understanding Code Layout and Multi-Object IPO
Creating a Library from IPO Objects
Requesting Compiler Reports with the xi* Tools
Inline Expansion of Functions
Compiler Directed Inline Expansion of Functions
Developer Directed Inline Expansion of User Functions
Inlining Report
Processor Targeting
Methods to Optimize Code Size
Disable or Decrease the Amount of Inlining
Strip Symbols from Your Binaries
Dynamically Link Intel-Provided Libraries
Exclude Unused Code and Data from the Executable
Disable Recognition and Expansion of Intrinsic Functions
Optimize Exception Handling Data on Linux and macOS* Systems
Disable Passing Arguments in Registers Instead of On the Stack
Disable Loop Unrolling
Disable Automatic Vectorization
Avoid References to Compiler-Specific Libraries
Avoid Unnecessary 16-Byte Alignment
Intel® Math Library
Overview: Intel® Math Library
Using the Intel® Math Library
Math Functions
Function List
Trigonometric Functions
Hyperbolic Functions
Exponential Functions
Special Functions
Nearest Integer Functions
Remainder Functions
Miscellaneous Functions
Complex Functions
C99 Macros
Automatically-Aligned Dynamic Allocation
Automatically-Aligned Dynamic Allocation
Pointer Checker
Pointer Checker Overview
Pointer Checker Feature Summary
Using the Pointer Checker
Checking Bounds
Checking for Dangling Pointers
Checking Arrays
Working with Enabled and Non-Enabled Modules
Storing Bounds Information
Passing and Returning Bounds
Checking Run-Time Library Functions
Writing a Wrapper
Checking Custom Memory Allocators
Checking Multi-Threaded Code
How the Compiler Defines Bounds Information for Pointers
Finding and Reporting Out-of-Bounds Errors
Tools
PGO Tools
PGO Tools Overview
Code Coverage Tool
Test Prioritization Tool
Profmerge and Proforder Tools
Using Function Order Lists, Function Grouping, Function Ordering, and Data Ordering Optimizations
Comparison of Function Order Lists and IPO Code Layout
Compiler Option Mapping Tool
Offload Extract Tool
Compatibility and Portability
Conformance to the C/C++ Standards
GCC* Compatibility and Interoperability
Microsoft Compatibility
Precompiled Header Support
Compilation and Execution Differences
Declaration in Scope of Function Defined in a Namespace
Enum Bit-Field Signedness
Portability
Porting from the Microsoft* Compiler to the Intel® C++ Compiler
Overview: Porting from the Microsoft* Compiler to the Intel® C++ Compiler
Modifying Your makefile
Other Considerations
Porting from GCC* to the Intel® C++ Compiler
Overview: Porting from gcc* to the Intel® C++ Compiler
Modifying Your makefile
Equivalent Macros
Other Considerations
Understanding the 64-bit Data Model used by macOS*
Index