Contents at a Glance
Contents
About the Author
About the Technical Reviewer
Acknowledgments
Introduction
Chapter 1: X86-32 Core Architecture
Historical Overview
Data Types
Fundamental Data Types
Numerical Data Types
Packed Data Types
Miscellaneous Data Types
Internal Architecture
Segment Registers
General-Purpose Registers
EFLAGS Register
Instruction Pointer
Instruction Operands
Memory Addressing Modes
Instruction Set Overview
Data Transfer
Binary Arithmetic
Data Comparison
Data Conversion
Logical
Rotate and Shift
Byte Set and Bit String
String
Flag Manipulation
Control Transfer
Miscellaneous
Summary
Chapter 2: X86-32 Core Programming
Getting Started
First Assembly Language Function
Integer Multiplication and Division
X86-32 Programming Fundamentals
Calling Convention
Memory Addressing Modes
Integer Addition
Condition Codes
Arrays
One-Dimensional Arrays
Two-Dimensional Arrays
Structures
Simple Structures
Dynamic Structure Creation
Strings
Counting Characters
String Concatenation
Comparing Arrays
Array Reversal
Summary
Chapter 3: X87 Floating-Point Unit
X87 FPU Core Architecture
Data Registers
X87 FPU Special-Purpose Registers
X87 FPU Operands and Encodings
X87 FPU Instruction Set
Data Transfer
Basic Arithmetic
Data Comparison
Transcendental
Constants
Control
Summary
Chapter 4: X87 FPU Programming
X87 FPU Programming Fundamentals
Simple Arithmetic
Floating-Point Compares
X87 FPU Advanced Programming
Floating-Point Arrays
Transcendental Instructions
Advanced Stack Usage
Summary
Chapter 5: MMX Technology
SIMD Processing Concepts
Wraparound vs. Saturated Arithmetic
MMX Execution Environment
MMX Instruction Set
Data Transfer
Arithmetic
Comparison
Conversion
Logical and Shift
Unpack and Shuffle
Insertion and Extraction
State and Cache Control
Summary
Chapter 6: MMX Technology Programming
MMX Programming Fundamentals
Packed Integer Addition
Packed Integer Shifts
Packed Integer Multiplication
MMX Advanced Programming
Integer Array Processing
Using MMX and the x87 FPU
Summary
Chapter 7: Streaming SIMD Extensions
X86-SSE Overview
X86-SSE Execution Environment
X86-SSE Register Set
X86-SSE Data Types
X86-SSE Control-Status Register
X86-SSE Processing Techniques
X86-SSE Instruction Set Overview
Scalar Floating-Point Data Transfer
Scalar Floating-Point Arithmetic
Scalar Floating-Point Comparison
Scalar Floating-Point Conversion
Packed Floating-Point Data Transfer
Packed Floating-Point Arithmetic
Packed Floating-Point Comparison
Packed Floating-Point Conversion
Packed Floating-Point Shuffle and Unpack
Packed Floating-Point Insertion and Extraction
Packed Floating-Point Blend
Packed Floating-Point Logical
Packed Integer Extensions
Packed Integer Data Transfer
Packed Integer Arithmetic
Packed Integer Comparison
Packed Integer Conversion
Packed Integer Shuffle and Unpack
Packed Integer Insertion and Extraction
Packed Integer Blend
Packed Integer Shift
Text String Processing
Non-Temporal Data Transfer and Cache Control
Miscellaneous
Summary
Chapter 8: X86-SSE programming – Scalar Floating-Point
Scalar Floating-Point Fundamentals
Scalar Floating-Point Arithmetic
Scalar Floating-Point Compare
Scalar Floating-Point Conversions
Advanced Scalar Floating-Point Programming
Scalar Floating-Point Spheres
Scalar Floating-Point Parallelograms
Summary
Chapter 9: X86-SSE Programming – Packed Floating-Point
Packed Floating-Point Fundamentals
Packed Floating-Point Arithmetic
Packed Floating-Point Compare
Packed Floating-Point Conversions
Advanced Packed Floating-Point Programming
Packed Floating-Point Least Squares
Packed Floating-Point 4 × 4 Matrices
Summary
Chapter 10: X86-SSE Programming – Packed Integers
Packed Integer Fundamentals
Advanced Packed Integer Programming
Packed Integer Histogram
Packed Integer Threshold
Summary
Chapter 11: X86-SSE Programming – Text Strings
Text String Fundamentals
Text String Programming
Text String Calculate Length
Text String Replace Characters
Summary
Chapter 12: Advanced Vector Extensions (AVX)
X86-AVX Overview
X86-AVX Execution Environment
X86-AVX Register Set
X86-AVX Data Types
X86-AVX Instruction Syntax
X86-AVX Feature Extensions
X86-AVX Instruction Set Overview
Promoted x86-SSE Instructions
New Instructions
Broadcast
Blend
Permute
Extract and Insert
Masked Move
Variable Bit Shift
Gather
Feature Extension Instructions
Half-Precision Floating- Point
FMA
VFMADD Subgroup
VFMSUB Subgroup
VFMADDSUB Subgroup
VFMSUBADD Subgroup
VFNMADD Subgroup
VFNMSUB Subgroup
General-Purpose Register
Summary
Chapter 13: X86-AVX Programming - Scalar Floating-Point
Programming Fundamentals
Scalar Floating-Point Arithmetic
Scalar Floating-Point Compares
Advanced Programming
Roots of a Quadratic Equation
Spherical Coordinates
Summary
Chapter 14: X86-AVX Programming - Packed Floating-Point
Programming Fundamentals
Packed Floating-Point Arithmetic
Packed Floating-Point Compares
Advanced Programming
Correlation Coefficient
Matrix Column Means
Summary
Chapter 15: X86-AVX Programming - Packed Integers
Packed Integer Fundamentals
Packed Integer Arithmetic
Packed Integer Unpack Operations
Advanced Programming
Image Pixel Clipping
Image Threshold Part Deux
Summary
Chapter 16: X86-AVX Programming - New Instructions
Detecting Processor Features (CPUID)
Data-Manipulation Instructions
Data Broadcast
Data Blend
Data Permute
Data Gather
Fused-Multiply-Add Programming
General-Purpose Register Instructions
Flagless Multiplication and Bit Shifts
Enhanced Bit Manipulation
Summary
Chapter 17: X86-64 Core Architecture
Internal Architecture
General-Purpose Registers
RFLAGS Register
Instruction Pointer Register
Instruction Operands
Memory Addressing Modes
Differences Between X86-64 and X86-32
Instruction Set Overview
Basic Instruction Use
Invalid Instructions
New Instructions
Deprecated Resources
Summary
Chapter 18: X86-64 Core Programming
X86-64 Programming Fundamentals
Integer Arithmetic
Memory Addressing
Integer Operands
Floating-Point Arithmetic
X86-64 Calling Convention
Basic Stack Frames
Using Non-Volatile Registers
Using Non-Volatile XMM Registers
Macros for Prologs and Epilogs
X86-64 Arrays and Strings
Two-Dimensional Arrays
Strings
Summary
Chapter 19: X86-64 SIMD Architecture
X86-SSE-64 Execution Environment
X86-SSE-64 Register Set
X86-SSE-64 Data Types
X86-SSE-64 Instruction Set Overview
X86-AVX Execution Environment
X86-AVX-64 Register Set
X86-AVX-64 Data Types
X86-AVX-64 Instruction Set Overview
Summary
Chapter 20: X86-64 SIMD Programming
X86-SSE-64 Programming
Image Histogram
Image Conversion
Vector Arrays
X86-AVX-64 Programming
Ellipsoid Calculations
RGB Image Processing
Matrix Inverse
Miscellaneous Instructions
Summary
Chapter 21: Advanced Topics and Optimization Techniques
Processor Microarchitecture
Multi-Core Processor Overview
Microarchitecture Pipeline Functionality
Execution Engine
Optimizing Assembly Language Code
Basic Optimizations
Floating-Point Arithmetic
Program Branches
Data Alignment
SIMD Techniques
Summary
Chapter 22: Advanced Topics Programming
Non-Temporal Memory Stores
Data Prefetch
Summary
Index