logo资料库

Modern.X86.Assembly.Language.Programming.32-bit.64-bit.SSE.and.A....pdf

第1页 / 共685页
第2页 / 共685页
第3页 / 共685页
第4页 / 共685页
第5页 / 共685页
第6页 / 共685页
第7页 / 共685页
第8页 / 共685页
资料共685页,剩余部分请下载后查看
Contents at a Glance
Contents
About the Author
About the Technical Reviewer
Acknowledgments
Introduction
Chapter 1: X86-32 Core Architecture
Historical Overview
Data Types
Fundamental Data Types
Numerical Data Types
Packed Data Types
Miscellaneous Data Types
Internal Architecture
Segment Registers
General-Purpose Registers
EFLAGS Register
Instruction Pointer
Instruction Operands
Memory Addressing Modes
Instruction Set Overview
Data Transfer
Binary Arithmetic
Data Comparison
Data Conversion
Logical
Rotate and Shift
Byte Set and Bit String
String
Flag Manipulation
Control Transfer
Miscellaneous
Summary
Chapter 2: X86-32 Core Programming
Getting Started
First Assembly Language Function
Integer Multiplication and Division
X86-32 Programming Fundamentals
Calling Convention
Memory Addressing Modes
Integer Addition
Condition Codes
Arrays
One-Dimensional Arrays
Two-Dimensional Arrays
Structures
Simple Structures
Dynamic Structure Creation
Strings
Counting Characters
String Concatenation
Comparing Arrays
Array Reversal
Summary
Chapter 3: X87 Floating-Point Unit
X87 FPU Core Architecture
Data Registers
X87 FPU Special-Purpose Registers
X87 FPU Operands and Encodings
X87 FPU Instruction Set
Data Transfer
Basic Arithmetic
Data Comparison
Transcendental
Constants
Control
Summary
Chapter 4: X87 FPU Programming
X87 FPU Programming Fundamentals
Simple Arithmetic
Floating-Point Compares
X87 FPU Advanced Programming
Floating-Point Arrays
Transcendental Instructions
Advanced Stack Usage
Summary
Chapter 5: MMX Technology
SIMD Processing Concepts
Wraparound vs. Saturated Arithmetic
MMX Execution Environment
MMX Instruction Set
Data Transfer
Arithmetic
Comparison
Conversion
Logical and Shift
Unpack and Shuffle
Insertion and Extraction
State and Cache Control
Summary
Chapter 6: MMX Technology Programming
MMX Programming Fundamentals
Packed Integer Addition
Packed Integer Shifts
Packed Integer Multiplication
MMX Advanced Programming
Integer Array Processing
Using MMX and the x87 FPU
Summary
Chapter 7: Streaming SIMD Extensions
X86-SSE Overview
X86-SSE Execution Environment
X86-SSE Register Set
X86-SSE Data Types
X86-SSE Control-Status Register
X86-SSE Processing Techniques
X86-SSE Instruction Set Overview
Scalar Floating-Point Data Transfer
Scalar Floating-Point Arithmetic
Scalar Floating-Point Comparison
Scalar Floating-Point Conversion
Packed Floating-Point Data Transfer
Packed Floating-Point Arithmetic
Packed Floating-Point Comparison
Packed Floating-Point Conversion
Packed Floating-Point Shuffle and Unpack
Packed Floating-Point Insertion and Extraction
Packed Floating-Point Blend
Packed Floating-Point Logical
Packed Integer Extensions
Packed Integer Data Transfer
Packed Integer Arithmetic
Packed Integer Comparison
Packed Integer Conversion
Packed Integer Shuffle and Unpack
Packed Integer Insertion and Extraction
Packed Integer Blend
Packed Integer Shift
Text String Processing
Non-Temporal Data Transfer and Cache Control
Miscellaneous
Summary
Chapter 8: X86-SSE programming – Scalar Floating-Point
Scalar Floating-Point Fundamentals
Scalar Floating-Point Arithmetic
Scalar Floating-Point Compare
Scalar Floating-Point Conversions
Advanced Scalar Floating-Point Programming
Scalar Floating-Point Spheres
Scalar Floating-Point Parallelograms
Summary
Chapter 9: X86-SSE Programming – Packed Floating-Point
Packed Floating-Point Fundamentals
Packed Floating-Point Arithmetic
Packed Floating-Point Compare
Packed Floating-Point Conversions
Advanced Packed Floating-Point Programming
Packed Floating-Point Least Squares
Packed Floating-Point 4 × 4 Matrices
Summary
Chapter 10: X86-SSE Programming – Packed Integers
Packed Integer Fundamentals
Advanced Packed Integer Programming
Packed Integer Histogram
Packed Integer Threshold
Summary
Chapter 11: X86-SSE Programming – Text Strings
Text String Fundamentals
Text String Programming
Text String Calculate Length
Text String Replace Characters
Summary
Chapter 12: Advanced Vector Extensions (AVX)
X86-AVX Overview
X86-AVX Execution Environment
X86-AVX Register Set
X86-AVX Data Types
X86-AVX Instruction Syntax
X86-AVX Feature Extensions
X86-AVX Instruction Set Overview
Promoted x86-SSE Instructions
New Instructions
Broadcast
Blend
Permute
Extract and Insert
Masked Move
Variable Bit Shift
Gather
Feature Extension Instructions
Half-Precision Floating- Point
FMA
VFMADD Subgroup
VFMSUB Subgroup
VFMADDSUB Subgroup
VFMSUBADD Subgroup
VFNMADD Subgroup
VFNMSUB Subgroup
General-Purpose Register
Summary
Chapter 13: X86-AVX Programming - Scalar Floating-Point
Programming Fundamentals
Scalar Floating-Point Arithmetic
Scalar Floating-Point Compares
Advanced Programming
Roots of a Quadratic Equation
Spherical Coordinates
Summary
Chapter 14: X86-AVX Programming - Packed Floating-Point
Programming Fundamentals
Packed Floating-Point Arithmetic
Packed Floating-Point Compares
Advanced Programming
Correlation Coefficient
Matrix Column Means
Summary
Chapter 15: X86-AVX Programming - Packed Integers
Packed Integer Fundamentals
Packed Integer Arithmetic
Packed Integer Unpack Operations
Advanced Programming
Image Pixel Clipping
Image Threshold Part Deux
Summary
Chapter 16: X86-AVX Programming - New Instructions
Detecting Processor Features (CPUID)
Data-Manipulation Instructions
Data Broadcast
Data Blend
Data Permute
Data Gather
Fused-Multiply-Add Programming
General-Purpose Register Instructions
Flagless Multiplication and Bit Shifts
Enhanced Bit Manipulation
Summary
Chapter 17: X86-64 Core Architecture
Internal Architecture
General-Purpose Registers
RFLAGS Register
Instruction Pointer Register
Instruction Operands
Memory Addressing Modes
Differences Between X86-64 and X86-32
Instruction Set Overview
Basic Instruction Use
Invalid Instructions
New Instructions
Deprecated Resources
Summary
Chapter 18: X86-64 Core Programming
X86-64 Programming Fundamentals
Integer Arithmetic
Memory Addressing
Integer Operands
Floating-Point Arithmetic
X86-64 Calling Convention
Basic Stack Frames
Using Non-Volatile Registers
Using Non-Volatile XMM Registers
Macros for Prologs and Epilogs
X86-64 Arrays and Strings
Two-Dimensional Arrays
Strings
Summary
Chapter 19: X86-64 SIMD Architecture
X86-SSE-64 Execution Environment
X86-SSE-64 Register Set
X86-SSE-64 Data Types
X86-SSE-64 Instruction Set Overview
X86-AVX Execution Environment
X86-AVX-64 Register Set
X86-AVX-64 Data Types
X86-AVX-64 Instruction Set Overview
Summary
Chapter 20: X86-64 SIMD Programming
X86-SSE-64 Programming
Image Histogram
Image Conversion
Vector Arrays
X86-AVX-64 Programming
Ellipsoid Calculations
RGB Image Processing
Matrix Inverse
Miscellaneous Instructions
Summary
Chapter 21: Advanced Topics and Optimization Techniques
Processor Microarchitecture
Multi-Core Processor Overview
Microarchitecture Pipeline Functionality
Execution Engine
Optimizing Assembly Language Code
Basic Optimizations
Floating-Point Arithmetic
Program Branches
Data Alignment
SIMD Techniques
Summary
Chapter 22: Advanced Topics Programming
Non-Temporal Memory Stores
Data Prefetch
Summary
Index
For your convenience Apress has placed some of the front matter material after the index. Please use the Bookmarks and Contents at a Glance links to access them.
Contents at a Glance About the Author ���������������������������������������������������������������������������� xix About the Technical Reviewer �������������������������������������������������������� xxi Acknowledgments ������������������������������������������������������������������������ xxiii Introduction ������������������������������������������������������������������������������������xxv ■ Chapter 1: X86-32 Core Architecture ��������������������������������������������� 1 ■ Chapter 2: X86-32 Core Programming ����������������������������������������� 27 ■ Chapter 3: X87 Floating-Point Unit ����������������������������������������������� 87 ■ Chapter 4: X87 FPU Programming ���������������������������������������������� 103 ■ Chapter 5: MMX Technology ������������������������������������������������������� 133 ■ Chapter 6: MMX Technology Programming �������������������������������� 147 ■ Chapter 7: Streaming SIMD Extensions �������������������������������������� 179 ■ Chapter 8: X86-SSE programming – Scalar Floating-Point �������� 207 ■ Chapter 9: X86-SSE Programming – Packed Floating-Point ������� 237 ■ Chapter 10: X86-SSE Programming – Packed Integers �������������� 273 ■ Chapter 11: X86-SSE Programming – Text Strings �������������������� 303 ■ Chapter 12: Advanced Vector Extensions (AVX) ������������������������� 327 ■ Chapter 13: X86-AVX Programming - Scalar Floating-Point ������ 351 ■ Chapter 14: X86-AVX Programming - Packed Floating-Point ������������377 ■ Chapter 15: X86-AVX Programming - Packed Integers �������������� 405 v
■ Contents at a GlanCe ■ Chapter 16: X86-AVX Programming - New Instructions ������������� 439 ■ Chapter 17: X86-64 Core Architecture ��������������������������������������� 491 ■ Chapter 18: X86-64 Core Programming ������������������������������������� 503 ■ Chapter 19: X86-64 SIMD Architecture �������������������������������������� 557 ■ Chapter 20: X86-64 SIMD Programming ������������������������������������ 563 ■ Chapter 21: Advanced Topics and Optimization Techniques ������ 623 ■ Chapter 22: Advanced Topics Programming ������������������������������ 637 Index ���������������������������������������������������������������������������������������������� 657 vi
Introduction Since the invention of the personal computer, software developers have used assembly language to create innovative solutions for a wide variety of algorithmic challenges. During the early days of the PC era, it was common practice to code large portions of a program or complete applications using x86 assembly language. Even as the use of high-level languages such as C, C++, and C# became more prevalent, many software developers continued to employ assembly language to code performance-critical sections of their programs. And while compilers have improved remarkably over the years in terms of generating machine code that is both spatially and temporally efficient, situations still exist where it makes sense for software developers to exploit the benefits of assembly language programming. The inclusion of single-instruction multiple-data (SIMD) architectures in modern x86 processors provides another reason for the continued interest in assembly language programming. A SIMD-capable processor includes computational resources that facilitate concurrent calculations using multiple data values, which can significantly improve the performance of applications that must deliver real-time responsiveness. SIMD architectures are also well-suited for computationally-intense problem domains such as image processing, audio and video encoding, computer-aided design, computer graphics, and data mining. Unfortunately, many high-level languages and development tools are unable to fully (or even partially) exploit the SIMD capabilities of a modern x86 processor. Assembly language, on the other hand, enables the software developer to take full advantage of a processor’s entire computational resource suite. Modern X86 Assembly Language Programming Modern X86 Assembly Language Programming is an edifying text on the subject of x86 assembly language programming. Its primary purpose is to teach you how to code functions using x86 assembly language that can be invoked from a high-level language. The book includes informative material that explains the internal architecture of an x86 processor as viewed from the perspective of an application program. It also contains an abundance of sample code that is structured to help you quickly understand x86 assembly language programming and the computational resources of the x86 platform. Major topics of the book include the following: • • X86 32-bit core architecture, data types, internal registers, memory addressing modes, and the basic instruction set X87 core architecture, register stack, special purpose registers, floating-point encodings, and instruction set xxv
■ IntroduCtIon • MMX technology and the fundamentals of packed integer arithmetic • • • • Streaming SIMD extensions (SSE) and Advanced Vector Extensions (AVX), including internal registers, packed integer and floating-point arithmetic, and associated instruction sets X86 64-bit core architecture, data types, internal registers, memory addressing modes, and the basic instruction set 64-bit extensions to SSE and AVX technologies X86 microarchitecture and assembly language optimization techniques Before proceeding I should also explicitly mention some of the topics that are not covered. This book does not examine legacy aspects of x86 assembly language programming such as 16-bit real-mode applications or segmented memory models. Except for a few historical observations and comparisons, all of the discussions and sample code emphasize x86 protected-mode programming using a flat linear memory model. This book does not discuss x86 instructions or architectural features that are managed by operating systems or require elevated privileges. It also doesn’t explore how to use x86 assembly language to develop software that is intended for operating systems or device drivers. However, if your ultimate goal is to use x86 assembly language to create software for one of these environments, you will need to thoroughly understand the material presented in this book. While it is still theoretically possible to write an entire application program using assembly language, the demanding requirements of contemporary software development make such an approach impractical and ill advised. Instead, this book concentrates on creating x86 assembly language modules and functions that are callable from C++. All of the sample code and programing examples presented in this book use Microsoft Visual C++ and Microsoft Macro Assembler. Both of these tools are included with Microsoft’s Visual Studio development tool. Target Audience The target audience for this book is software developers, including: • • Software developers who are creating application programs for Windows-based platforms and want to learn how to write performance-enhancing algorithms and functions using x86 assembly language. Software developers who are creating application programs for non-Windows environments and want to learn x86 assembly language programming. xxvi
■ IntroduCtIon • • Software developers who have a basic understanding of x86 assembly language programming and want to learn how to use the x86’s SSE and AVX instruction sets. Software developers and computer science students who want or need to gain a better understanding of the x86 platform, including its internal architecture and instruction sets. The principal audience for Modern X86 Assembly Language Programming is Windows software developers since the sample code uses Visual C++ and Microsoft Macro Assembler. It is important to note, however, that this is not a book on how to use the Microsoft development tools. Software developers who are targeting non-Windows platforms also can learn from the book since most of the informative content is organized and communicated independent of any specific operating system. In order to understand the book’s subject material and sample code, a background that includes some programming experience using C or C++ will be helpful. Prior experience with Visual Studio or knowledge of a particular Windows API is not a prerequisite to benefit from the book. Outline of Book The primary objective of this book is to help you learn x86 assembly language programming. In order to achieve this goal, you must also thoroughly understand the internal architecture and execution environment of an x86 processor. The book’s chapters and content are organized with this in mind. The following paragraphs summarize the book’s major topics and each chapter’s content. X86-32 Core Architecture—Chapter 1 covers the core architecture of the x86-32 platform. It includes a discussion of the platform’s fundamental data types, internal architecture, instruction operands, and memory addressing modes. This chapter also presents an overview of the core x86-32 instruction set. Chapter 2 explains the fundamentals of x86-32 assembly language programming using the core x86-32 instruction set and common programming constructs. All of the sample code discussed in Chapter 2 (and subsequent chapters) is packaged as working programs, which means that you can run, modify, or otherwise experiment with the code in order to enhance your learning experience. X87 Floating-Point Unit—Chapter 3 surveys the architecture of the x87 floating- point unit (FPU) and includes operational descriptions of the x87 FPU’s register stack, control word register, status word register, and instruction set. This chapter also delves into the binary encodings that are used to represent floating-point numbers and certain special values. Chapter 4 contains an assortment of sample code that demonstrates how to perform floating-point calculations using the x87 FPU instruction set. Readers who need to maintain an existing x87 FPU code base or are targeting processors that lack the scalar floating-point capabilities of x86-SSE and x86-AVX (e.g., Intel’s Quark) will benefit the most from this chapter. MMX Technology—Chapter 5 describes the x86’s first SIMD extension, which is called MMX technology. It examines the architecture of MMX technology including its register set, operand types, and instruction set. This chapter also discusses a number of related topics, including SIMD processing concepts and the mechanics of packed- xxvii
■ IntroduCtIon integer arithmetic. Chapter 6 includes sample code that illustrates basic MMX operations, including packed-integer arithmetic (both wraparound and saturated), integer array processing, and how to properly handle transitions between MMX and x87 FPU code. Streaming SIMD Extensions—Chapter 7 focuses on the architecture of Streaming SIMD Extensions (SSE). X86-SSE adds a new set of 128-bit wide registers to the x86 platform and incorporates several instruction set additions that support computations using packed integers, packed floating-point (both single and double precision), and text strings. Chapter 7 also discusses the scalar floating-point capabilities of x86-SSE, which can be used to both simplify and improve the performance of algorithms that require scalar floating-point arithmetic. Chapters 8 - 11 contain an extensive collection of sample code that highlights use of the x86-SSE instruction set. Included in this chapter are several examples that demonstrate using the packed-integer capabilities of x86-SSE to perform common image-processing tasks, such as histogram construction and pixel thresholding. These chapters also include sample code that illustrates how to use the packed floating- point, scalar floating-point, and text string-processing instructions of x86-SSE. Advanced Vector Extensions—Chapter 12 explores the x86’s most recent SIMD extension, which is called Advanced Vector Extensions (AVX). This chapter explains the x86-AVX execution environment, its data types and register sets, and the new three- operand instruction syntax. It also discusses the data broadcast, gather, and permute capabilities of x86-AVX along with several x86-AVX concomitant extensions, including fused-multiply-add (FMA), half-precision floating-point, and new general-purpose register instructions. Chapters 13 - 16 contain sample code that depicts use of the various x86-AVX computational resources. Examples include using the x86-AVX instruction set with packed integers, packed floating-point, and scalar floating-point operands. These chapters also contain sample code that explicates use of the data broadcast, gather, permute, and FMA instructions. X86-64 Core Architecture—Chapter 17 peruses the x86-64 platform and includes a discussion of the platform’s core architecture, supported data types, general purpose registers, and status flags. It also explains the enhancements made to the x86-32 platform in order to support 64-bit operands and memory addressing. The chapter concludes with a discussion of the x86-64 instruction set, including those instructions that have been deprecated or are no longer available. Chapter 18 explores the fundamentals x86-64 assembly language programming using a variety of sample code. Examples include how to perform integer calculations using operands of various sizes, memory addressing modes, scalar floating-point arithmetic, and common programming constructs. Chapter 18 also explains the calling convention that must be observed in order to invoke an x86-64 assembly language function from C++. X86-64 SSE and AVX—Chapter 19 describes the enhancements to x86-SSE and x86- AVX that are available on the x86-64 platform. This includes a discussion of the respective execution environments and extended data register sets. Chapter 20 contains sample code that highlights use of the x86-SSE and x86-AVX instruction sets with the x86-64 core architecture. Advanced Topics—The last two chapters of this book consider advanced topics and optimization techniques related to x86 assembly language programming. Chapter 21 examines key elements of an x86 processor’s microarchitecture, including its front-end pipelines, out-of-order execution model, and internal execution units. It also includes a discussion of programming techniques that you can employ to write x86 assembly xxviii
分享到:
收藏