logo资料库

NEON编程指南.pdf

第1页 / 共411页
第2页 / 共411页
第3页 / 共411页
第4页 / 共411页
第5页 / 共411页
第6页 / 共411页
第7页 / 共411页
第8页 / 共411页
资料共411页,剩余部分请下载后查看
NEON Programmer’s Guide
Contents
Preface
References
Typographical conventions
Feedback on this book
Glossary
1: Introduction
1.1 Data processing technologies
1.1.1 Single Instruction Single Data
1.1.2 Single Instruction Multiple Data (vector mode)
1.1.3 Single Instruction Multiple Data (packed data mode)
1.2 Comparison between ARM NEON technology and other implementations
1.2.1 Comparison between NEON technology and the ARMv6 SIMD instructions
1.2.2 Comparison between NEON technology and other SIMD solutions
1.2.3 Comparison of NEON technology and Digital Signal Processors
1.3 Architecture support for NEON technology
1.3.1 Instruction timings
1.3.2 Support for VFP-only systems
1.3.3 Support for the Half-precision extension
1.3.4 Support for the Fused Multiply-Add instructions
1.3.5 Security and virtualization
1.3.6 Undefined instructions
1.3.7 Support for ARMv6 SIMD instructions
1.4 Fundamentals of NEON technology
1.4.1 Registers, vectors, lanes and elements
1.4.2 NEON data type specifiers
1.4.3 VFP views of the NEON and floating-point register file
2: Compiling NEON Instructions
2.1 Vectorization
2.1.1 Enabling auto-vectorization in ARM Compiler toolchain
2.1.2 Enabling auto-vectorization in GCC compiler
2.1.3 C pointer aliasing
2.1.4 Natural types
2.1.5 Array grouping
2.1.6 Inside knowledge
2.1.7 Enabling the NEON unit in bare-metal applications
2.1.8 Enabling the NEON unit in a Linux stock kernel
2.1.9 Enabling the NEON unit in a Linux custom kernel
2.1.10 Optimizing for vectorization
2.2 Generating NEON code using the vectorizing compiler
2.2.1 Compiler command line options
2.3 Vectorizing examples
2.3.1 Vectorization example on unrolling addition function
2.3.2 Vectorizing example with vectorizing compilation
2.3.3 Vectorizing examples with different command line switches
2.4 NEON assembler and ABI restrictions
2.4.1 Passing arguments in NEON and floating-point registers
2.5 NEON libraries
2.6 Intrinsics
2.7 Detecting presence of a NEON unit
2.7.1 Build-time NEON unit detection
2.7.2 Run-time NEON unit detection
2.8 Writing code to imply SIMD
2.8.1 Writing loops to imply SIMD
2.8.2 Tell the compiler where to unroll inner loops
2.8.3 Write structures to imply SIMD
2.9 GCC command line options
2.9.1 Option to specify the CPU
2.9.2 Option to specify the FPU
2.9.3 Option to enable use of NEON and floating-point instructions
2.9.4 Vectorizing floating-point operations
2.9.5 Example GCC command line usage for NEON code optimization
2.9.6 GCC information dump
3: NEON Instruction Set Architecture
3.1 Introduction to the NEON instruction syntax
3.2 Instruction syntax
3.2.1 Instruction modifiers
3.2.2 Instruction shape
3.3 Specifying data types
3.4 Packing and unpacking data
3.5 Alignment
3.6 Saturation arithmetic
3.7 Floating-point operations
3.7.1 Floating-point exceptions
3.8 Flush-to-zero mode
3.8.1 Denormals
3.8.2 The effects of using flush-to-zero mode
3.8.3 Operations not affected by flush-to-zero mode
3.9 Shift operations
3.9.1 Shifting vectors
3.9.2 Shifting and inserting
3.9.3 Shifting and accumulating
3.9.4 Instruction modifiers
3.9.5 Table of shifts available
3.10 Polynomials
3.10.1 Polynomial arithmetic over {0,1}
3.10.2 NEON instructions that can perform polynomial arithmetic
3.10.3 Difference between polynomial multiply and conventional multiply
3.11 Instructions to permute vectors
3.11.1 Alternatives
3.11.2 Instructions
4: NEON Intrinsics
4.1 Introduction
4.2 Vector data types for NEON intrinsics
4.3 Prototype of NEON Intrinsics
4.4 Using NEON intrinsics
4.5 Variables and constants in NEON code
4.5.1 Declaring a variable
4.5.2 Using constants
4.5.3 Moving results back to normal C variables
4.5.4 Accessing D registers from a Q register
4.5.5 Casting NEON variables between different types
4.6 Accessing vector types from C
4.7 Loading data from memory into vectors
4.8 Constructing a vector from a literal bit pattern
4.9 Constructing multiple vectors from interleaved memory
4.10 Loading a single lane of a vector from memory
4.11 Programming using NEON intrinsics
4.12 Instructions without an equivalent intrinsic
5: Optimizing NEON Code
5.1 Optimizing NEON assembler code
5.1.1 NEON pipeline differences between Cortex-A processors
5.1.2 Memory access optimizations
5.2 Scheduling
5.2.1 NEON instruction scheduling
5.2.2 Mixed ARM and NEON instruction sequences
5.2.3 Passing data between ARM general-purpose registers and NEON registers
5.2.4 Dual issue for NEON instructions
5.2.5 Example of how to read NEON instruction tables
5.2.6 Optimizations by variable spreading
5.2.7 Optimizations when using lengthening instructions
6: NEON Code Examples with Intrinsics
6.1 Swapping color channels
6.1.1 How de-interleave and interleave work
6.1.2 Single or multiple elements
6.1.3 Addressing
6.1.4 Other loads and stores
6.2 Handling non-multiple array lengths
6.2.1 Leftovers
6.2.2 Example problem
6.2.3 Larger arrays
6.2.4 Overlapping
6.2.5 Single element processing
6.2.6 Alignment
6.2.7 Using ARM instructions
7: NEON Code Examples with Mixed Operations
7.1 Matrix multiplication
7.1.1 Algorithm
7.1.2 Code
7.2 Cross product
7.2.1 Definition
7.2.2 Single cross product
7.2.3 Four cross products
7.2.4 Arbitrary input length
8: NEON Code Examples with Optimization
8.1 Converting color depth
8.1.1 Converting from RGB565 to RGB888
8.1.2 Converting from RGB888 to RGB565
8.2 Median filter
8.2.1 Implementation
8.2.2 Basic principles and bitonic sorting
8.2.3 Bitonic merging
8.2.4 Partitioning
8.2.5 Color planes
8.2.6 Padding
8.2.7 Rolling window
8.2.8 First pass sorting (bitonic sort)
8.2.9 Transpose
8.2.10 Second pass sorting
8.2.11 Re-use
8.3 FIR filter
8.3.1 Using NEON intrinsics
8.3.2 Using the vectorizing compiler
8.3.3 Adding inside knowledge
A: NEON Microarchitecture
A.1 The Cortex-A5 processor
A.1.1 The Cortex–A5 Media Processing Engine
A.1.2 VFPv4 architecture hardware support
A.2 The Cortex-A7 processor
A.2.1 The Cortex-A7 NEON unit
A.3 The Cortex-A8 processor
A.3.1 The Cortex-A8 Media Processing Engine
A.3.2 Cortex-A8 Data memory access
A.3.3 Cortex-A8 specific pipeline hazards
A.4 The Cortex-A9 processor
A.4.1 The Cortex-A9 Media Processing Engine
A.5 The Cortex-A15 processor
A.5.1 The Cortex-A15 Media Processing Engine
B: Operating System Support
B.1 FPSCR, the floating-point status and control register
B.2 FPEXC, the floating-point exception register
B.3 FPSID, the floating-point system ID register
B.4 MVFR0/1 Media and VFP Feature Registers
C: NEON and VFP Instruction Summary
C.1 List of all NEON and VFP instructions
C.2 List of doubling instructions
C.3 List of halving instructions
C.4 List of widening or long instructions
C.5 List of narrowing instructions
C.6 List of rounding instructions
C.7 List of saturating instructions
C.8 NEON general data processing instructions
C.8.1 VCVT (fixed-point or integer to floating-point)
C.8.2 VCVT (between half-precision and single-precision floating-point)
C.8.3 VDUP
C.8.4 VEXT
C.8.5 VMOV (immediate)
C.8.6 VMVN
C.8.7 VMOVL, V{Q}MOVN, VQMOVUN
C.8.8 VREV
C.8.9 VSWP
C.8.10 VTBL
C.8.11 VTBX
C.8.12 VTRN
C.8.13 VUZP
C.8.14 VZIP
C.9 NEON shift instructions
C.9.1 VSHL, VQSHL, VQSHLU, and VSHLL (by immediate)
C.9.2 V{Q}{R}SHL
C.9.3 V{R}SHR{N}, V{R}SRA
C.9.4 VQ{R}SHR{U}N
C.9.5 VSLI
C.9.6 VSRI
C.10 NEON logical and compare operations
C.10.1 VACGE and VACGT
C.10.2 VAND
C.10.3 VBIC (immediate)
C.10.4 VBIC (register)
C.10.5 VBIF
C.10.6 VBIT
C.10.7 VBSL
C.10.8 VCEQ, VCGE, VCGT, VCLE, and VCLT
C.10.9 VEOR
C.10.10 VMOV
C.10.11 VMVN
C.10.12 VORN
C.10.13 VORR (immediate)
C.10.14 VORR (register)
C.10.15 VTST
C.11 NEON arithmetic instructions
C.11.1 VABA{L}
C.11.2 VABD{L}
C.11.3 V{Q}ABS
C.11.4 V{Q}ADD, VADDL, VADDW
C.11.5 V{R}ADDHN
C.11.6 VCLS
C.11.7 VCLZ
C.11.8 VCNT
C.11.9 V{R}HADD
C.11.10 VHSUB
C.11.11 VMAX and VMIN
C.11.12 V{Q}NEG
C.11.13 VPADD{L}, VPADAL
C.11.14 VPMAX and VPMIN
C.11.15 VRECPE
C.11.16 VRECPS
C.11.17 VRSQRTE
C.11.18 VRSQRTS
C.11.19 V{Q}SUB, VSUBL and VSUBW
C.11.20 V{R}SUBHN
C.12 NEON multiply instructions
C.12.1 VFMA, VFMS
C.12.2 VMUL{L}, VMLA{L}, and VMLS{L}
C.12.3 VMUL{L}, VMLA{L}, and VMLS{L} (by scalar)
C.12.4 VQ{R}DMULH (by vector or by scalar)
C.12.5 VQDMULL, VQDMLAL, and VQDMLSL (by vector or by scalar)
C.13 NEON load and store instructions
C.13.1 Interleaving
C.13.2 Alignment restrictions in load and store, element and structure instructions
C.13.3 VLDn and VSTn (single n-element structure to one lane)
C.13.4 VLDn (single n-element structure to all lanes)
C.13.5 VLDn and VSTn (multiple n-element structures)
C.13.6 VLDR and VSTR
C.13.7 VLDM, VSTM, VPOP, and VPUSH
C.13.8 VMOV (between two ARM registers and a NEON register)
C.13.9 VMOV (between an ARM register and a NEON scalar)
C.13.10 VMRS and VMSR (between an ARM register and a NEON or VFP system register)
C.14 VFP instructions
C.14.1 VABS
C.14.2 VADD
C.14.3 VCMP (Floating-point compare)
C.14.4 VCVT (between single-precision and double-precision)
C.14.5 VCVT (between floating-point and integer)
C.14.6 VCVT (between floating-point and fixed-point)
C.14.7 VCVTB, VCVTT (half-precision extension)
C.14.8 VDIV
C.14.9 VFMA, VFMS, VFNMA, VFNMS (Fused floating-point multiply accumulate and fused floating-point multiply subtract with optional negation)
C.14.10 VMOV
C.14.11 VMOV
C.14.12 VMUL, VMLA, VMLS, VNMUL, VNMLA, and VNMLS
C.14.13 VNEG
C.14.14 VSQRT
C.14.15 VSUB
C.15 NEON and VFP pseudo-instructions
C.15.1 VACLE and VACLT
C.15.2 VAND (immediate)
C.15.3 VCLE and VCLT
C.15.4 VLDR pseudo-instruction
C.15.5 VLDR and VSTR (post-increment and pre-decrement)
C.15.6 VMOV2
C.15.7 VORN (immediate)
D: NEON Intrinsics Reference
D.1 NEON intrinsics description
D.2 Intrinsics type conversion
D.2.1 VREINTERPRET
D.2.2 VCOMBINE
D.2.3 VGET_HIGH
D.2.4 VGET_LOW
D.3 Arithmetic
D.3.1 VADD
D.3.2 VADDL
D.3.3 VADDW
D.3.4 VHADD
D.3.5 VRHADD
D.3.6 VQADD
D.3.7 VADDHN
D.3.8 VRADDHN
D.3.9 VSUB
D.3.10 VSUBL
D.3.11 VSUBW
D.3.12 VHSUB
D.3.13 VRHSUB
D.3.14 VQSUB
D.3.15 VSUBHN
D.3.16 VRSUBHN
D.4 Multiply
D.4.1 VMUL
D.4.2 VMLA
D.4.3 VMLAL
D.4.4 VMLS
D.4.5 VMLSL
D.4.6 VQDMULH
D.4.7 VQRDMULH
D.4.8 VQDMLAL
D.4.9 VQDMLSL
D.4.10 VMULL
D.4.11 VQDMULL
D.4.12 VMLA_LANE
D.4.13 VMLAL_LANE
D.4.14 VQDMLAL_LANE
D.4.15 VMLS_LANE
D.4.16 VMLSL_LANE
D.4.17 VQDMLSL_LANE
D.4.18 VMUL_N
D.4.19 VMULL_N
D.4.20 VMULL_LANE
D.4.21 VQDMULL_N
D.4.22 VQDMULL_LANE
D.4.23 VQDMULH_N
D.4.24 VQDMULH_LANE
D.4.25 VQRDMULH_N
D.4.26 VQRDMULH_LANE
D.4.27 VMLA_LANE
D.4.28 VMLAL_N
D.4.29 VQDMLAL_N
D.4.30 VMLSL_N
D.4.31 VQDMLSL_N
D.5 Data processing
D.5.1 VPADD
D.5.2 VPADDL
D.5.3 VPADAL
D.5.4 VPMAX
D.5.5 VPMIN
D.5.6 VABD
D.5.7 VABDL
D.5.8 VABA
D.5.9 VABAL
D.5.10 VMAX
D.5.11 VMIN
D.5.12 VABS
D.5.13 VQABS
D.5.14 VNEG
D.5.15 VQNEG
D.5.16 VCLS
D.5.17 VCLZ
D.5.18 VCNT
D.5.19 VRECPE
D.5.20 VRECPS
D.5.21 VRSQRTE
D.5.22 VRSQRTS
D.5.23 VMOVN
D.5.24 VMOVL
D.5.25 VQMOVN
D.5.26 VQMOVUN
D.6 Logical and compare
D.6.1 VCEQ
D.6.2 VCGE
D.6.3 VCLE
D.6.4 VCGT
D.6.5 VCLT
D.6.6 VCAGE
D.6.7 VCALE
D.6.8 VCAGT
D.6.9 VCALT
D.6.10 VTST
D.6.11 VMVN
D.6.12 VAND
D.6.13 VORR
D.6.14 VEOR
D.6.15 VBIC
D.6.16 VORN
D.6.17 VBSL
D.7 Shift
D.7.1 VSHL
D.7.2 VQSHL
D.7.3 VRSHL
D.7.4 VQRSHL
D.7.5 VSHR_N
D.7.6 VSHL_N
D.7.7 VRSHR_N
D.7.8 VSRA_N
D.7.9 VRSRA_N
D.7.10 VQSHL_N
D.7.11 VQSHLU_N
D.7.12 VSHRN_N
D.7.13 VQSHRUN_N
D.7.14 VQRSHRUN_N
D.7.15 VQSHRN_N
D.7.16 VRSHRN_N
D.7.17 VQRSHRN_N
D.7.18 VSHLL_N
D.7.19 VSRI_N
D.7.20 VSLI_N
D.8 Floating-point
D.8.1 VCVT
D.8.2 VCVT_N
D.8.3 VCVT_F32
D.8.4 VCVT_N_F32
D.8.5 VCVT_F16_F32
D.8.6 VCVT_F32_F16
D.8.7 VFMA
D.8.8 VFMS
D.9 Load and store
D.9.1 VLD1
D.9.2 VLD1_LANE
D.9.3 VLD1_DUP
D.9.4 VLD2
D.9.5 VLD2_LANE
D.9.6 VLD2_DUP
D.9.7 VLD3
D.9.8 VLD3_LANE
D.9.9 VLD3_DUP
D.9.10 VLD4
D.9.11 VLD4_LANE
D.9.12 VLD4_DUP
D.9.13 VST1
D.9.14 VST1_LANE
D.9.15 VST2
D.9.16 VST2_LANE
D.9.17 VST3
D.9.18 VST3_LANE
D.9.19 VST4
D.9.20 VST4_LANE
D.9.21 VGET_LANE
D.9.22 VSET_LANE
D.10 Permutation
D.10.1 VEXT
D.10.2 VTBL1
D.10.3 VTBL2
D.10.4 VTBL3
D.10.5 VTBL4
D.10.6 VTBX1
D.10.7 VTBX2
D.10.8 VTBX3
D.10.9 VTBX4
D.10.10 VREV64
D.10.11 VREV32
D.10.12 VREV16
D.10.13 VTRN
D.10.14 VZIP
D.10.15 VUZP
D.11 Miscellaneous
D.11.1 VCREATE
D.11.2 VDUP_N
D.11.3 VMOV_N
D.11.4 VDUP_LANE
NEON™ Version: 1.0 Programmer’s Guide Copyright © 2013 ARM. All rights reserved. ARM DEN0018A (ID071613)
NEON Programmer’s Guide Copyright © 2013 ARM. All rights reserved. Release Information The following changes have been made to this book. Change history Date Issue Confidentiality Change 28 June 2013 A Non-Confidential First release Proprietary Notice This document is protected by copyright and other related rights and the practice or implementation of the information contained in this document may be protected by one or more patents or pending patent applications. No part of this document may be reproduced in any form by any means without the express prior written permission of ARM. No license, express or implied, by estoppel or otherwise to any intellectual property rights is granted by this document unless specifically stated. Your access to the information in this document is conditional upon your acceptance that you will not use or permit others to use the information for the purposes of determining whether implementations infringe any third party patents. THIS DOCUMENT IS PROVIDED “AS IS”. ARM PROVIDES NO REPRESENTATIONS AND NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF MERCHANTABILITY, SATISFACTORY QUALITY, NON-INFRINGEMENT OR FITNESS FOR A PARTICULAR PURPOSE WITH RESPECT TO THE DOCUMENT. For the avoidance of doubt, ARM makes no representation with respect to, and has undertaken no analysis to identify or understand the scope and content of, third party patents, copyrights, trade secrets, or other rights. TO THE EXTENT NOT PROHIBITED BY LAW, IN NO EVENT WILL ARM BE LIABLE FOR ANY DAMAGES, INCLUDING WITHOUT LIMITATION ANY DIRECT, INDIRECT, SPECIAL, INCIDENTAL, PUNITIVE, OR CONSEQUENTIAL DAMAGES, HOWEVER CAUSED AND REGARDLESS OF THE THEORY OF LIABILITY, ARISING OUT ANY USE OF THIS DOCUMENT, EVEN IF ARM HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. This document may include technical inaccuracies or typographical errors. This document consists solely of commercial items. You shall be responsible for ensuring that any use, duplication or disclosure of this document complies fully with any relevant export laws and regulations to assure that this document or any portion thereof is not exported, directly or indirectly, in violation of such export laws. Use of the word “partner” is not intended to create or refer to any partnership relationship with any other company. ARM may make changes to this document at any time and without notice. If any of the provisions contained in these terms conflict with any of the provisions of any signed written agreement covering this document with ARM, then the signed written agreement prevails over and supersedes the conflicting provisions of these terms. Words and logos marked with ™ or ® are registered trademarks or trademarks of ARM Limited or its affiliates in the EU and/or elsewhere. All rights reserved. Other brands and names mentioned in this document may be the trademarks of their respective owners. Please follow ARM’s trademark usage guidelines at, http://www.arm.com/about/trademark-usage-guidelines.php. Copyright © 2013, ARM Limited or its affiliates. All rights reserved. ARM Limited. Company 02557590 registered in England. 110 Fulbourn Road, Cambridge, England CB1 9NJ. LES-PRE-20318 v0.1 Web Address http://www.arm.com ARM DEN0018A ID071613 Copyright © 2013 ARM. All rights reserved. Non-Confidential ii
Contents NEON Programmer’s Guide Chapter 1 Chapter 2 Chapter 3 Preface References ................................................................................................................ vii Typographical conventions ....................................................................................... viii Feedback on this book ................................................................................................ ix Glossary ....................................................................................................................... x Introduction Data processing technologies .................................................................................. 1-2 Comparison between ARM NEON technology and other implementations ............. 1-4 Architecture support for NEON technology .............................................................. 1-7 Fundamentals of NEON technology ...................................................................... 1-10 Compiling NEON Instructions Vectorization ............................................................................................................ 2-2 Generating NEON code using the vectorizing compiler .......................................... 2-9 Vectorizing examples ............................................................................................. 2-11 NEON assembler and ABI restrictions ................................................................... 2-17 NEON libraries ....................................................................................................... 2-19 Intrinsics ................................................................................................................. 2-20 Detecting presence of a NEON unit ....................................................................... 2-21 Writing code to imply SIMD ................................................................................... 2-22 GCC command line options ................................................................................... 2-24 NEON Instruction Set Architecture Introduction to the NEON instruction syntax ............................................................ 3-2 Instruction syntax ..................................................................................................... 3-4 Specifying data types ............................................................................................... 3-8 Packing and unpacking data .................................................................................... 3-9 Alignment ............................................................................................................... 3-10 1.1 1.2 1.3 1.4 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 3.1 3.2 3.3 3.4 3.5 ARM DEN0018A ID071613 Copyright © 2013 ARM. All rights reserved. Non-Confidential iii
3.6 3.7 3.8 3.9 3.10 3.11 Saturation arithmetic .............................................................................................. 3-11 Floating-point operations ....................................................................................... 3-12 Flush-to-zero mode ................................................................................................ 3-13 Shift operations ...................................................................................................... 3-14 Polynomials ........................................................................................................... 3-17 Instructions to permute vectors .............................................................................. 3-19 NEON Intrinsics 4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 4.9 4.10 4.11 4.12 Introduction .............................................................................................................. 4-2 Vector data types for NEON intrinsics ..................................................................... 4-3 Prototype of NEON Intrinsics ................................................................................... 4-5 Using NEON intrinsics ............................................................................................. 4-6 Variables and constants in NEON code .................................................................. 4-8 Accessing vector types from C ................................................................................ 4-9 Loading data from memory into vectors ................................................................ 4-10 Constructing a vector from a literal bit pattern ....................................................... 4-11 Constructing multiple vectors from interleaved memory ........................................ 4-12 Loading a single lane of a vector from memory ..................................................... 4-13 Programming using NEON intrinsics ..................................................................... 4-14 Instructions without an equivalent intrinsic ............................................................ 4-16 Optimizing NEON Code 5.1 5.2 Optimizing NEON assembler code .......................................................................... 5-2 Scheduling ............................................................................................................... 5-4 NEON Code Examples with Intrinsics 6.1 6.2 Swapping color channels ......................................................................................... 6-2 Handling non-multiple array lengths ........................................................................ 6-8 NEON Code Examples with Mixed Operations 7.1 7.2 Matrix multiplication ................................................................................................. 7-2 Cross product .......................................................................................................... 7-6 NEON Code Examples with Optimization 8.1 8.2 8.3 Converting color depth ............................................................................................. 8-2 Median filter ............................................................................................................. 8-5 FIR filter ................................................................................................................. 8-21 NEON Microarchitecture A.1 A.2 A.3 A.4 A.5 The Cortex-A5 processor ......................................................................................... A-2 The Cortex-A7 processor ......................................................................................... A-4 The Cortex-A8 processor ......................................................................................... A-5 The Cortex-A9 processor ......................................................................................... A-9 The Cortex-A15 processor ..................................................................................... A-11 Operating System Support B.1 B.2 B.3 B.4 FPSCR, the floating-point status and control register .............................................. B-2 FPEXC, the floating-point exception register ........................................................... B-4 FPSID, the floating-point system ID register ............................................................ B-5 MVFR0/1 Media and VFP Feature Registers .......................................................... B-6 NEON and VFP Instruction Summary C.1 C.2 C.3 C.4 C.5 C.6 List of all NEON and VFP instructions ..................................................................... C-2 List of doubling instructions ..................................................................................... C-7 List of halving instructions ........................................................................................ C-8 List of widening or long instructions ......................................................................... C-9 List of narrowing instructions ................................................................................. C-10 List of rounding instructions ................................................................................... C-11 Chapter 4 Chapter 5 Chapter 6 Chapter 7 Chapter 8 Appendix A Appendix B Appendix C ARM DEN0018A ID071613 Copyright © 2013 ARM. All rights reserved. Non-Confidential iv
Appendix D C.7 C.8 C.9 C.10 C.11 C.12 C.13 C.14 C.15 List of saturating instructions ................................................................................. C-12 NEON general data processing instructions .......................................................... C-14 NEON shift instructions .......................................................................................... C-25 NEON logical and compare operations ................................................................. C-31 NEON arithmetic instructions ................................................................................. C-41 NEON multiply instructions .................................................................................... C-55 NEON load and store instructions ......................................................................... C-60 VFP instructions ..................................................................................................... C-67 NEON and VFP pseudo-instructions ..................................................................... C-73 NEON Intrinsics Reference D.1 D.2 D.3 D.4 D.5 D.6 D.7 D.8 D.9 D.10 D.11 NEON intrinsics description ..................................................................................... D-2 Intrinsics type conversion ........................................................................................ D-3 Arithmetic ................................................................................................................. D-8 Multiply ................................................................................................................... D-24 Data processing ..................................................................................................... D-50 Logical and compare ............................................................................................. D-74 Shift ........................................................................................................................ D-93 Floating-point ....................................................................................................... D-114 Load and store ..................................................................................................... D-120 Permutation ......................................................................................................... D-151 Miscellaneous ...................................................................................................... D-166 ARM DEN0018A ID071613 Copyright © 2013 ARM. All rights reserved. Non-Confidential v
Preface This book provides a guide for programmers to effectively use NEON technology, the ARM Advanced SIMD architecture extension. The book provides information that will be useful to both assembly language and C programmers. This is not an introductory level book: • • • • • It assumes knowledge of the C and ARM assembler programming languages, but not any ARM-specific background. Some chapters suggest further reading (referring either to books or web sites) that can give a deeper level of background to the topic in hand, but this book focuses on the ARM-specific detail. No particular tool chain is assumed, and there are some examples for both GNU and ARM tools. This book complements other ARM documentation for these processors, including the processor Technical Reference Manuals (TRMs), documentation for specific devices or boards and the ARM Architecture Reference Manual, ARMv7-A and ARMv7-R edition (DDI0406). The Cortex™-A Series Programmer’s Guide covered basic principles of NEON technology, but this book provides more detailed information on using NEON technology. ARM DEN0018A ID071613 Copyright © 2013 ARM. All rights reserved. Non-Confidential vi
References Preface Hohl, William. “ARM Assembly Language: Fundamentals and Techniques” CRC Press, 2009. ISBN: 9781439806104. Sloss, Andrew N.; Symes, Dominic.; Wright, Chris. “ARM System Developer's Guide: Designing and Optimizing System Software”, Morgan Kaufmann, 2004, ISBN: 9781558608740. ANSI/IEEE Std 754-1985, “IEEE Standard for Binary Floating-Point Arithmetic”. ANSI/IEEE Std 754-2008, “IEEE Standard for Binary Floating-Point Arithmetic”. ANSI/IEEE Std 1003.1-1990, “Standard for Information Technology - Portable Operating System Interface (POSIX) Base Specifications, Issue 7”. ARM® Architecture Reference Manual, ARMv7-A and ARMv7-R edition (ARM DDI 0406), the ARM ARM. Note In the event of a contradiction between this book and the ARM ARM, the ARM ARM is definitive and must take precedence. ARM® Compiler Toolchain Assembler Reference (ARM DUI 0489). Cortex™-A Series Programmer’s Guide (ARM DEN0013B). Introducing NEON (ARM DHT 0002). NEON™ Support in Compilation Tools (ARM DHT 0004). ARM® Compiler Toolchain: Using the Assembler (ARM DUI 0473). Cortex™-A5 Technical Reference Manual (ARM DDI 0433). Cortex™-A5 NEON Media Processing Engine Technical Reference Manual (ARM DDI 0450). Cortex™-A8 Technical Reference Manual (ARM DDI 0344). Cortex™-A9 NEON Media Processing Engine Technical Reference Manual (ARM DDI 0409). Cortex™-A9 Technical Reference Manual (ARM DDI 0308). ARM® NEON™ support in the ARM compiler: White Paper Sept. 2008. ARM® C Language Extensions (IHI0053). ARM DEN0018A ID071613 Copyright © 2013 ARM. All rights reserved. Non-Confidential vii
Typographical conventions Preface This book uses the following typographical conventions: italic Highlights important notes, introduces special terminology, denotes internal cross-references, and citations. bold monospace Used for terms in descriptive lists, where appropriate. Denotes text that you can enter at the keyboard, such as commands, file and program names, instruction names, parameters and source code. monospace italic Denotes arguments to monospace text where the argument is to be replaced by a specific value. < and > Enclose replaceable terms for assembler syntax where they appear in code or code fragments. For example: MRC p15, 0, , , , ARM DEN0018A ID071613 Copyright © 2013 ARM. All rights reserved. Non-Confidential viii
分享到:
收藏