Modern.X86.Assembly.Language.Programming.32-bit.64-bit.SSE.and.A....pdf

发布时间：2022-05-29 发布人：admin 分类：说明书资料大小：8.31M 资料格式：pdf 举报版权申诉

xfliu364-8456467-4744300845230119282.pdf-第1页.png

第1页 / 共685页

xfliu364-8456467-4744300845230119282.pdf-第2页.png

第2页 / 共685页

xfliu364-8456467-4744300845230119282.pdf-第3页.png

第3页 / 共685页

xfliu364-8456467-4744300845230119282.pdf-第4页.png

第4页 / 共685页

xfliu364-8456467-4744300845230119282.pdf-第5页.png

第5页 / 共685页

xfliu364-8456467-4744300845230119282.pdf-第6页.png

第6页 / 共685页

xfliu364-8456467-4744300845230119282.pdf-第7页.png

第7页 / 共685页

xfliu364-8456467-4744300845230119282.pdf-第8页.png

第8页 / 共685页

Contents at a Glance

Contents

About the Author

About the Technical Reviewer

Acknowledgments

Introduction

Chapter 1: X86-32 Core Architecture

Historical Overview

Data Types

Fundamental Data Types

Numerical Data Types

Packed Data Types

Miscellaneous Data Types

Internal Architecture

Segment Registers

General-Purpose Registers

EFLAGS Register

Instruction Pointer

Instruction Operands

Memory Addressing Modes

Instruction Set Overview

Data Transfer

Binary Arithmetic

Data Comparison

Data Conversion

Logical

Rotate and Shift

Byte Set and Bit String

String

Flag Manipulation

Control Transfer

Miscellaneous

Summary

Chapter 2: X86-32 Core Programming

Getting Started

First Assembly Language Function

Integer Multiplication and Division

X86-32 Programming Fundamentals

Calling Convention

Memory Addressing Modes

Integer Addition

Condition Codes

Arrays

One-Dimensional Arrays

Two-Dimensional Arrays

Structures

Simple Structures

Dynamic Structure Creation

Strings

Counting Characters

String Concatenation

Comparing Arrays

Array Reversal

Summary

Chapter 3: X87 Floating-Point Unit

X87 FPU Core Architecture

Data Registers

X87 FPU Special-Purpose Registers

X87 FPU Operands and Encodings

X87 FPU Instruction Set

Data Transfer

Basic Arithmetic

Data Comparison

Transcendental

Constants

Control

Summary

Chapter 4: X87 FPU Programming

X87 FPU Programming Fundamentals

Simple Arithmetic

Floating-Point Compares

X87 FPU Advanced Programming

Floating-Point Arrays

Transcendental Instructions

Advanced Stack Usage

Summary

Chapter 5: MMX Technology

SIMD Processing Concepts

Wraparound vs. Saturated Arithmetic

MMX Execution Environment

MMX Instruction Set

Data Transfer

Arithmetic

Comparison

Conversion

Logical and Shift

Unpack and Shuffle

Insertion and Extraction

State and Cache Control

Summary

Chapter 6: MMX Technology Programming

MMX Programming Fundamentals

Packed Integer Addition

Packed Integer Shifts

Packed Integer Multiplication

MMX Advanced Programming

Integer Array Processing

Using MMX and the x87 FPU

Summary

Chapter 7: Streaming SIMD Extensions

X86-SSE Overview

X86-SSE Execution Environment

X86-SSE Register Set

X86-SSE Data Types

X86-SSE Control-Status Register

X86-SSE Processing Techniques

X86-SSE Instruction Set Overview

Scalar Floating-Point Data Transfer

Scalar Floating-Point Arithmetic

Scalar Floating-Point Comparison

Scalar Floating-Point Conversion

Packed Floating-Point Data Transfer

Packed Floating-Point Arithmetic

Packed Floating-Point Comparison

Packed Floating-Point Conversion

Packed Floating-Point Shuffle and Unpack

Packed Floating-Point Insertion and Extraction

Packed Floating-Point Blend

Packed Floating-Point Logical

Packed Integer Extensions

Packed Integer Data Transfer

Packed Integer Arithmetic

Packed Integer Comparison

Packed Integer Conversion

Packed Integer Shuffle and Unpack

Packed Integer Insertion and Extraction

Packed Integer Blend

Packed Integer Shift

Text String Processing

Non-Temporal Data Transfer and Cache Control

Miscellaneous

Summary

Chapter 8: X86-SSE programming – Scalar Floating-Point

Scalar Floating-Point Fundamentals

Scalar Floating-Point Arithmetic

Scalar Floating-Point Compare

Scalar Floating-Point Conversions

Advanced Scalar Floating-Point Programming

Scalar Floating-Point Spheres

Scalar Floating-Point Parallelograms

Summary

Chapter 9: X86-SSE Programming – Packed Floating-Point

Packed Floating-Point Fundamentals

Packed Floating-Point Arithmetic

Packed Floating-Point Compare

Packed Floating-Point Conversions

Advanced Packed Floating-Point Programming

Packed Floating-Point Least Squares

Packed Floating-Point 4 × 4 Matrices

Summary

Chapter 10: X86-SSE Programming – Packed Integers

Packed Integer Fundamentals

Advanced Packed Integer Programming

Packed Integer Histogram

Packed Integer Threshold

Summary

Chapter 11: X86-SSE Programming – Text Strings

Text String Fundamentals

Text String Programming

Text String Calculate Length

Text String Replace Characters

Summary

Chapter 12: Advanced Vector Extensions (AVX)

X86-AVX Overview

X86-AVX Execution Environment

X86-AVX Register Set

X86-AVX Data Types

X86-AVX Instruction Syntax

X86-AVX Feature Extensions

X86-AVX Instruction Set Overview

Promoted x86-SSE Instructions

New Instructions

Broadcast

Blend

Permute

Extract and Insert

Masked Move

Variable Bit Shift

Gather

Feature Extension Instructions

Half-Precision Floating- Point

FMA

VFMADD Subgroup

VFMSUB Subgroup

VFMADDSUB Subgroup

VFMSUBADD Subgroup

VFNMADD Subgroup

VFNMSUB Subgroup

General-Purpose Register

Summary

Chapter 13: X86-AVX Programming - Scalar Floating-Point

Programming Fundamentals

Scalar Floating-Point Arithmetic

Scalar Floating-Point Compares

Advanced Programming

Roots of a Quadratic Equation

Spherical Coordinates

Summary

Chapter 14: X86-AVX Programming - Packed Floating-Point

Programming Fundamentals

Packed Floating-Point Arithmetic

Packed Floating-Point Compares

Advanced Programming

Correlation Coefficient

Matrix Column Means

Summary

Chapter 15: X86-AVX Programming - Packed Integers

Packed Integer Fundamentals

Packed Integer Arithmetic

Packed Integer Unpack Operations

Advanced Programming

Image Pixel Clipping

Image Threshold Part Deux

Summary

Chapter 16: X86-AVX Programming - New Instructions

Detecting Processor Features (CPUID)

Data-Manipulation Instructions

Data Broadcast

Data Blend

Data Permute

Data Gather

Fused-Multiply-Add Programming

General-Purpose Register Instructions

Flagless Multiplication and Bit Shifts

Enhanced Bit Manipulation

Summary

Chapter 17: X86-64 Core Architecture

Internal Architecture

General-Purpose Registers

RFLAGS Register

Instruction Pointer Register

Instruction Operands

Memory Addressing Modes

Differences Between X86-64 and X86-32

Instruction Set Overview

Basic Instruction Use

Invalid Instructions

New Instructions

Deprecated Resources

Summary

Chapter 18: X86-64 Core Programming

X86-64 Programming Fundamentals

Integer Arithmetic

Memory Addressing

Integer Operands

Floating-Point Arithmetic

X86-64 Calling Convention

Basic Stack Frames

Using Non-Volatile Registers

Using Non-Volatile XMM Registers

Macros for Prologs and Epilogs

X86-64 Arrays and Strings

Two-Dimensional Arrays

Strings

Summary

Chapter 19: X86-64 SIMD Architecture

X86-SSE-64 Execution Environment

X86-SSE-64 Register Set

X86-SSE-64 Data Types

X86-SSE-64 Instruction Set Overview

X86-AVX Execution Environment

X86-AVX-64 Register Set

X86-AVX-64 Data Types

X86-AVX-64 Instruction Set Overview

Summary

Chapter 20: X86-64 SIMD Programming

X86-SSE-64 Programming

Image Histogram

Image Conversion

Vector Arrays

X86-AVX-64 Programming

Ellipsoid Calculations

RGB Image Processing

Matrix Inverse

Miscellaneous Instructions

Summary

Chapter 21: Advanced Topics and Optimization Techniques

Processor Microarchitecture

Multi-Core Processor Overview

Microarchitecture Pipeline Functionality

Execution Engine

Optimizing Assembly Language Code

Basic Optimizations

Floating-Point Arithmetic

Program Branches

Data Alignment

SIMD Techniques

Summary

Chapter 22: Advanced Topics Programming

Non-Temporal Memory Stores

Data Prefetch

Summary

Index

For your convenience Apress has placed some of the front matter material after the index. Please use the Bookmarks and Contents at a Glance links to access them.

Contents at a Glance About the Author �� xix About the Technical Reviewer �� xxi Acknowledgments �� xxiii Introduction ��xxv ■ Chapter 1: X86-32 Core Architecture �� 1 ■ Chapter 2: X86-32 Core Programming �� 27 ■ Chapter 3: X87 Floating-Point Unit �� 87 ■ Chapter 4: X87 FPU Programming �� 103 ■ Chapter 5: MMX Technology �� 133 ■ Chapter 6: MMX Technology Programming �� 147 ■ Chapter 7: Streaming SIMD Extensions �� 179 ■ Chapter 8: X86-SSE programming – Scalar Floating-Point �� 207 ■ Chapter 9: X86-SSE Programming – Packed Floating-Point �� 237 ■ Chapter 10: X86-SSE Programming – Packed Integers �� 273 ■ Chapter 11: X86-SSE Programming – Text Strings �� 303 ■ Chapter 12: Advanced Vector Extensions (AVX) �� 327 ■ Chapter 13: X86-AVX Programming - Scalar Floating-Point �� 351 ■ Chapter 14: X86-AVX Programming - Packed Floating-Point ��377 ■ Chapter 15: X86-AVX Programming - Packed Integers �� 405 v

■ Contents at a GlanCe ■ Chapter 16: X86-AVX Programming - New Instructions �� 439 ■ Chapter 17: X86-64 Core Architecture �� 491 ■ Chapter 18: X86-64 Core Programming �� 503 ■ Chapter 19: X86-64 SIMD Architecture �� 557 ■ Chapter 20: X86-64 SIMD Programming �� 563 ■ Chapter 21: Advanced Topics and Optimization Techniques �� 623 ■ Chapter 22: Advanced Topics Programming �� 637 Index �� 657 vi

Introduction Since the invention of the personal computer, software developers have used assembly language to create innovative solutions for a wide variety of algorithmic challenges. During the early days of the PC era, it was common practice to code large portions of a program or complete applications using x86 assembly language. Even as the use of high-level languages such as C, C++, and C# became more prevalent, many software developers continued to employ assembly language to code performance-critical sections of their programs. And while compilers have improved remarkably over the years in terms of generating machine code that is both spatially and temporally efficient, situations still exist where it makes sense for software developers to exploit the benefits of assembly language programming. The inclusion of single-instruction multiple-data (SIMD) architectures in modern x86 processors provides another reason for the continued interest in assembly language programming. A SIMD-capable processor includes computational resources that facilitate concurrent calculations using multiple data values, which can significantly improve the performance of applications that must deliver real-time responsiveness. SIMD architectures are also well-suited for computationally-intense problem domains such as image processing, audio and video encoding, computer-aided design, computer graphics, and data mining. Unfortunately, many high-level languages and development tools are unable to fully (or even partially) exploit the SIMD capabilities of a modern x86 processor. Assembly language, on the other hand, enables the software developer to take full advantage of a processor’s entire computational resource suite. Modern X86 Assembly Language Programming Modern X86 Assembly Language Programming is an edifying text on the subject of x86 assembly language programming. Its primary purpose is to teach you how to code functions using x86 assembly language that can be invoked from a high-level language. The book includes informative material that explains the internal architecture of an x86 processor as viewed from the perspective of an application program. It also contains an abundance of sample code that is structured to help you quickly understand x86 assembly language programming and the computational resources of the x86 platform. Major topics of the book include the following: • • X86 32-bit core architecture, data types, internal registers, memory addressing modes, and the basic instruction set X87 core architecture, register stack, special purpose registers, floating-point encodings, and instruction set xxv

■ IntroduCtIon • MMX technology and the fundamentals of packed integer arithmetic • • • • Streaming SIMD extensions (SSE) and Advanced Vector Extensions (AVX), including internal registers, packed integer and floating-point arithmetic, and associated instruction sets X86 64-bit core architecture, data types, internal registers, memory addressing modes, and the basic instruction set 64-bit extensions to SSE and AVX technologies X86 microarchitecture and assembly language optimization techniques Before proceeding I should also explicitly mention some of the topics that are not covered. This book does not examine legacy aspects of x86 assembly language programming such as 16-bit real-mode applications or segmented memory models. Except for a few historical observations and comparisons, all of the discussions and sample code emphasize x86 protected-mode programming using a flat linear memory model. This book does not discuss x86 instructions or architectural features that are managed by operating systems or require elevated privileges. It also doesn’t explore how to use x86 assembly language to develop software that is intended for operating systems or device drivers. However, if your ultimate goal is to use x86 assembly language to create software for one of these environments, you will need to thoroughly understand the material presented in this book. While it is still theoretically possible to write an entire application program using assembly language, the demanding requirements of contemporary software development make such an approach impractical and ill advised. Instead, this book concentrates on creating x86 assembly language modules and functions that are callable from C++. All of the sample code and programing examples presented in this book use Microsoft Visual C++ and Microsoft Macro Assembler. Both of these tools are included with Microsoft’s Visual Studio development tool. Target Audience The target audience for this book is software developers, including: • • Software developers who are creating application programs for Windows-based platforms and want to learn how to write performance-enhancing algorithms and functions using x86 assembly language. Software developers who are creating application programs for non-Windows environments and want to learn x86 assembly language programming. xxvi

■ IntroduCtIon • • Software developers who have a basic understanding of x86 assembly language programming and want to learn how to use the x86’s SSE and AVX instruction sets. Software developers and computer science students who want or need to gain a better understanding of the x86 platform, including its internal architecture and instruction sets. The principal audience for Modern X86 Assembly Language Programming is Windows software developers since the sample code uses Visual C++ and Microsoft Macro Assembler. It is important to note, however, that this is not a book on how to use the Microsoft development tools. Software developers who are targeting non-Windows platforms also can learn from the book since most of the informative content is organized and communicated independent of any specific operating system. In order to understand the book’s subject material and sample code, a background that includes some programming experience using C or C++ will be helpful. Prior experience with Visual Studio or knowledge of a particular Windows API is not a prerequisite to benefit from the book. Outline of Book The primary objective of this book is to help you learn x86 assembly language programming. In order to achieve this goal, you must also thoroughly understand the internal architecture and execution environment of an x86 processor. The book’s chapters and content are organized with this in mind. The following paragraphs summarize the book’s major topics and each chapter’s content. X86-32 Core Architecture—Chapter 1 covers the core architecture of the x86-32 platform. It includes a discussion of the platform’s fundamental data types, internal architecture, instruction operands, and memory addressing modes. This chapter also presents an overview of the core x86-32 instruction set. Chapter 2 explains the fundamentals of x86-32 assembly language programming using the core x86-32 instruction set and common programming constructs. All of the sample code discussed in Chapter 2 (and subsequent chapters) is packaged as working programs, which means that you can run, modify, or otherwise experiment with the code in order to enhance your learning experience. X87 Floating-Point Unit—Chapter 3 surveys the architecture of the x87 floating- point unit (FPU) and includes operational descriptions of the x87 FPU’s register stack, control word register, status word register, and instruction set. This chapter also delves into the binary encodings that are used to represent floating-point numbers and certain special values. Chapter 4 contains an assortment of sample code that demonstrates how to perform floating-point calculations using the x87 FPU instruction set. Readers who need to maintain an existing x87 FPU code base or are targeting processors that lack the scalar floating-point capabilities of x86-SSE and x86-AVX (e.g., Intel’s Quark) will benefit the most from this chapter. MMX Technology—Chapter 5 describes the x86’s first SIMD extension, which is called MMX technology. It examines the architecture of MMX technology including its register set, operand types, and instruction set. This chapter also discusses a number of related topics, including SIMD processing concepts and the mechanics of packed- xxvii

■ IntroduCtIon integer arithmetic. Chapter 6 includes sample code that illustrates basic MMX operations, including packed-integer arithmetic (both wraparound and saturated), integer array processing, and how to properly handle transitions between MMX and x87 FPU code. Streaming SIMD Extensions—Chapter 7 focuses on the architecture of Streaming SIMD Extensions (SSE). X86-SSE adds a new set of 128-bit wide registers to the x86 platform and incorporates several instruction set additions that support computations using packed integers, packed floating-point (both single and double precision), and text strings. Chapter 7 also discusses the scalar floating-point capabilities of x86-SSE, which can be used to both simplify and improve the performance of algorithms that require scalar floating-point arithmetic. Chapters 8 - 11 contain an extensive collection of sample code that highlights use of the x86-SSE instruction set. Included in this chapter are several examples that demonstrate using the packed-integer capabilities of x86-SSE to perform common image-processing tasks, such as histogram construction and pixel thresholding. These chapters also include sample code that illustrates how to use the packed floating- point, scalar floating-point, and text string-processing instructions of x86-SSE. Advanced Vector Extensions—Chapter 12 explores the x86’s most recent SIMD extension, which is called Advanced Vector Extensions (AVX). This chapter explains the x86-AVX execution environment, its data types and register sets, and the new three- operand instruction syntax. It also discusses the data broadcast, gather, and permute capabilities of x86-AVX along with several x86-AVX concomitant extensions, including fused-multiply-add (FMA), half-precision floating-point, and new general-purpose register instructions. Chapters 13 - 16 contain sample code that depicts use of the various x86-AVX computational resources. Examples include using the x86-AVX instruction set with packed integers, packed floating-point, and scalar floating-point operands. These chapters also contain sample code that explicates use of the data broadcast, gather, permute, and FMA instructions. X86-64 Core Architecture—Chapter 17 peruses the x86-64 platform and includes a discussion of the platform’s core architecture, supported data types, general purpose registers, and status flags. It also explains the enhancements made to the x86-32 platform in order to support 64-bit operands and memory addressing. The chapter concludes with a discussion of the x86-64 instruction set, including those instructions that have been deprecated or are no longer available. Chapter 18 explores the fundamentals x86-64 assembly language programming using a variety of sample code. Examples include how to perform integer calculations using operands of various sizes, memory addressing modes, scalar floating-point arithmetic, and common programming constructs. Chapter 18 also explains the calling convention that must be observed in order to invoke an x86-64 assembly language function from C++. X86-64 SSE and AVX—Chapter 19 describes the enhancements to x86-SSE and x86- AVX that are available on the x86-64 platform. This includes a discussion of the respective execution environments and extended data register sets. Chapter 20 contains sample code that highlights use of the x86-SSE and x86-AVX instruction sets with the x86-64 core architecture. Advanced Topics—The last two chapters of this book consider advanced topics and optimization techniques related to x86 assembly language programming. Chapter 21 examines key elements of an x86 processor’s microarchitecture, including its front-end pipelines, out-of-order execution model, and internal execution units. It also includes a discussion of programming techniques that you can employ to write x86 assembly xxviii

分享到：

赞收藏

资料库

Modern.X86.Assembly.Language.Programming.32-bit.64-bit.SSE.and.A....pdf

相关推荐

开发技术

热门标签

最新资料