logo资料库

ARM cortex a系列架构详细介绍.pdf

第1页 / 共451页
第2页 / 共451页
第3页 / 共451页
第4页 / 共451页
第5页 / 共451页
第6页 / 共451页
第7页 / 共451页
第8页 / 共451页
资料共451页,剩余部分请下载后查看
Cortex-A Series Programmer’s Guide
Contents
Preface
References
Typographical conventions
Feedback on this book
Glossary
1: Introduction
1.1 History
1.2 System-on-Chip (SoC)
1.3 Embedded systems
2: ARM Architecture and Processors
2.1 Architecture versions
2.2 Architecture history and extensions
2.2.1 DSP multiply-accumulate and saturated arithmetic instructions
2.2.2 Jazelle
2.2.3 Thumb Execution Environment (ThumbEE)
2.2.4 Thumb-2
2.2.5 Security Extensions (TrustZone)
2.2.6 VFP
2.2.7 Advanced SIMD (NEON)
2.2.8 Large Physical Address Extension (LPAE)
2.2.9 Virtualization
2.2.10 big.LITTLE
2.3 Key architectural points of ARM Cortex-A series processors
2.4 Processors and pipelines
2.5 The Cortex-A series processors
2.5.1 The Cortex-A5 processor
2.5.2 The Cortex-A7 processor
2.5.3 The Cortex-A8 processor
2.5.4 The Cortex-A9 processor
2.5.5 The Cortex-A15 processor
2.5.6 Qualcomm Scorpion
3: Tools, Operating Systems and Boards
3.1 Linux distributions
3.1.1 Linux for ARM systems
3.1.2 Linux terminology
3.1.3 Embedded Linux
3.1.4 Board Support Package
3.1.5 Linaro
3.2 Useful tools
3.2.1 QEMU
3.2.2 BusyBox
3.2.3 Scratchbox
3.2.4 U-Boot
3.2.5 UEFI and Tianocore
3.3 Software toolchains for ARM processors
3.3.1 GNU toolchain
3.3.2 ARM Compiler toolchain
3.4 ARM DS-5
3.5 Example platforms
3.5.1 BeagleBoard
3.5.2 Pandora
3.5.3 ST Ericsson Snowball
3.5.4 Gumstix
3.5.5 PandaBoard
4: ARM Registers, Modes and Instruction Sets
4.1 Instruction sets
4.2 Modes
4.3 Registers
4.3.1 Program Status Registers
4.4 Instruction pipelines
4.4.1 Multi-issue pipelines
4.4.2 Register renaming
4.5 Branch prediction
4.5.1 Return stack
4.5.2 Programmer’s view
5: Introduction to Assembly Language
5.1 Comparison with other assembly languages
5.2 Instruction sets
5.3 Introduction to the GNU Assembler
5.3.1 Invoking the GNU Assembler
5.3.2 GNU Assembler syntax
5.3.3 Sections
5.3.4 Assembler directives
5.3.5 Expressions
5.3.6 GNU tools naming conventions
5.4 ARM tools assembly language
5.4.1 ARM assembler syntax
5.4.2 Label
5.4.3 Directives
5.5 Interworking
5.6 Identifying assembly code
6: ARM/Thumb Unified Assembly Language Instructions
6.1 Instruction set basics
6.1.1 Constant values
6.1.2 Conditional execution
6.1.3 Status flags and condition codes
6.2 Data processing operations
6.2.1 Operand 2 and the barrel shifter
6.3 Multiplication operations
6.3.1 Additional multiplies
6.4 Memory instructions
6.4.1 Addressing modes
6.4.2 Multiple transfers
6.5 Branches
6.6 Integer SIMD instructions
6.6.1 Integer register SIMD instructions
6.6.2 Integer register SIMD multiplies
6.6.3 Sum of absolute differences
6.6.4 Data packing and unpacking
6.6.5 Byte selection
6.7 Saturating arithmetic
6.7.1 Saturated math instructions
6.8 Miscellaneous instructions
6.8.1 Coprocessor instructions
6.8.2 Coprocessor 15
6.8.3 SVC
6.8.4 PSR modification
6.8.5 Bit manipulation
6.8.6 Cache preload
6.8.7 Byte reversal
6.8.8 Other instructions
7: Floating-Point
7.1 Floating-point basics and the IEEE-754 standard
7.1.1 Rounding algorithms
7.1.2 ARM VFP
7.1.3 Instructions
7.1.4 Enabling VFP
7.2 VFP support in GCC
7.3 VFP support in the ARM Compiler
7.4 VFP support in Linux
7.4.1 Context switching
7.5 Floating-point optimization
8: Introducing NEON
8.1 SIMD
8.1.1 ARMv6 SIMD instructions
8.2 NEON architecture overview
8.2.1 Commonality with VFP
8.2.2 Data types
8.2.3 NEON registers
8.2.4 NEON instruction set
9: Caches
9.1 Why do caches help?
9.2 Cache drawbacks
9.3 Memory hierarchy
9.4 Cache architecture
9.4.1 Cache controller
9.4.2 Direct mapped caches
9.4.3 Set associative caches
9.4.4 Cache terminology
9.4.5 A real-life example
9.4.6 Virtual and physical tags and indexes
9.5 Cache policies
9.5.1 Allocation policy
9.5.2 Replacement policy
9.5.3 Write policy
9.6 Write and Fetch buffers
9.7 Cache performance and hit rate
9.8 Invalidating and cleaning cache memory
9.9 Point of coherency and unification
9.10 Level 2 cache controller
9.10.1 Level 2 cache maintenance
9.11 Parity and ECC in caches
10: Memory Management Unit
10.1 Virtual memory
10.2 Level 1 page tables
10.3 Level 2 page tables
10.4 The Translation Lookaside Buffer
10.5 TLB coherency
10.6 Choice of page sizes
10.7 Memory attributes
10.7.1 Memory Access Permissions
10.7.2 Memory types
10.7.3 Domains
10.8 Multi-tasking and OS usage of page tables
10.8.1 Address Space ID
10.8.2 Page Table Base Register 0 and 1
10.8.3 The Fast Context Switch Extension
10.9 Large Physical Address Extensions
11: Memory Ordering
11.1 ARM memory ordering model
11.1.1 Strongly-ordered and Device memory
11.1.2 Normal memory
11.2 Memory barriers
11.2.1 Memory barrier use example
11.2.2 Avoiding deadlocks with a barrier
11.2.3 WFE and WFI Interaction with barriers
11.2.4 Linux use of barriers
11.3 Cache coherency implications
11.3.1 Issues with copying code
11.3.2 Compiler re-ordering optimizations
12: Exception Handling
12.1 Types of exception
12.2 Exception mode summary
12.2.1 Exception priorities
12.3 Entering an exception handler
12.4 Exit from an exception handler
12.5 Vector table
12.6 Return instruction
13: Interrupt Handling
13.1 External interrupt requests
13.1.1 Assigning interrupts
13.1.2 Simplistic interrupt handling
13.1.3 Nested interrupt handling
13.2 Generic Interrupt Controller
13.2.1 Configuration
13.2.2 Interrupt handling
14: Other Exception Handlers
14.1 Abort handler
14.2 Undefined instruction handling
14.3 SVC exception handling
14.4 Linux exception program flow
14.4.1 Boot process
14.4.2 Interrupt dispatch
15: Boot Code
15.1 Booting a bare-metal system
15.2 Configuration
15.3 Booting Linux
15.3.1 Reset handler
15.3.2 Bootloader
15.3.3 Initialize memory system
15.3.4 Kernel images
15.3.5 Kernel parameters
15.3.6 Kernel entry
15.3.7 Platform-specific actions
15.3.8 Kernel start-up code
16: Porting
16.1 Endianness
16.2 Alignment
16.3 Miscellaneous C porting issues
16.3.1 unsigned char and signed char
16.3.2 Compiler packing of structures
16.3.3 Use of the stack
16.3.4 Other issues
16.4 Porting ARM assembly code to ARMv7
16.4.1 Memory access ordering and memory barriers
16.5 Porting ARM code to Thumb
16.5.1 Use of PC as an operand
16.5.2 Branches and interworking
16.5.3 Operand combinations
16.5.4 Other ARM/Thumb differences
17: Application Binary Interfaces
17.1 Procedure Call Standard
17.1.1 VFP and NEON register usage
17.1.2 Linkage
17.1.3 Stack and heap
17.1.4 Returning results
17.2 Mixing C and assembly code
18: Profiling
18.1 Profiler output
18.1.1 Gprof
18.1.2 OProfile
18.1.3 DS-5 Streamline
18.1.4 ARM performance monitor
18.1.5 Linux perf events
18.1.6 Ftrace
18.1.7 Valgrind and Cachegrind
19: Optimizing Code to Run on ARM Processors
19.1 Compiler optimizations
19.1.1 Function inlining
19.1.2 Eliminating common sub-expressions
19.1.3 Loop unrolling
19.1.4 GCC optimization options
19.1.5 armcc optimization options
19.2 ARM memory system optimization
19.2.1 Data cache optimization
19.2.2 Loop tiling
19.2.3 Loop interchange
19.2.4 Structure alignment
19.2.5 Associativity effects
19.2.6 Optimizing instruction cache usage
19.2.7 Optimizing L2 and outer cache usage
19.2.8 Optimizing TLB usage
19.2.9 Data abort optimization
19.2.10 Prefetching a memory block access
19.3 Source code modifications
19.3.1 Loop termination
19.3.2 Loop fusion
19.3.3 Reducing stack and heap usage
19.3.4 Variable selection
19.3.5 Pointer aliasing
19.3.6 Division and modulo
19.3.7 Extern data
19.3.8 Inline or embedded assembler
19.3.9 Complex addressing modes
19.3.10 Unaligned access
19.3.11 Linker optimizations
20: Writing NEON Code
20.1 NEON C Compiler and assembler
20.1.1 Vectorization
20.1.2 NEON libraries
20.1.3 Intrinsics
20.1.4 NEON types in C
20.1.5 Variables and constants
20.1.6 Generating NEON instructions from C/C++ code
20.1.7 NEON assembler and ABI restrictions
20.1.8 Detecting NEON
20.2 Optimizing NEON assembler code
20.2.1 Memory access optimizations
20.2.2 Alignment
20.2.3 Scheduling
20.3 NEON power saving
21: Introduction to Multi-processing
21.1 Multi-processing ARM systems
21.2 Symmetric multi-processing
21.3 Asymmetric multi-processing
22: SMP Architectural Considerations
22.1 Cache coherency
22.1.1 MESI protocol
22.1.2 MOESI protocol
22.1.3 Accelerator Coherency Port (ACP)
22.2 TLB and cache maintenance broadcast
22.3 Handling interrupts in an SMP system
22.4 Exclusive accesses
22.5 Booting SMP systems
22.5.1 Processor ID
22.5.2 SMP boot in Linux
22.6 Private memory region
22.6.1 Timers and watchdogs
23: Parallelizing Software
23.1 Decomposition methods
23.2 Threading models
23.3 Threading libraries
23.3.1 Inter-thread communications
23.3.2 Threaded performance
23.3.3 Thread affinity
23.4 Synchronization mechanisms in the Linux kernel
23.4.1 Completions
23.4.2 Spinlocks
23.4.3 Semaphores
23.4.4 Lock-free synchronization
24: Issues with Parallelizing Software
24.1 Thread safety and reentrancy
24.2 Performance issues
24.2.1 Bandwidth concerns
24.2.2 Thread dependencies
24.2.3 Cache thrashing
24.2.4 False sharing
24.2.5 Deadlock and livelock
24.3 Profiling in SMP systems
25: Power Management
25.1 Power and clocking
25.1.1 Standby mode
25.1.2 Dormant mode
25.1.3 Assembly language power instructions
25.1.4 Dynamic Voltage and Frequency Scaling
26: Security
26.1 TrustZone hardware architecture
26.1.1 Multi-processor systems with security extensions
26.1.2 Interaction of Normal and Secure worlds
27: Virtualization
27.1 ARMv7-A Virtualization Extensions
27.1.1 Privilege model in ARMv7-A Virtualization Extensions
27.1.2 Hypervisor mode
27.1.3 Memory translation
27.2 Hypervisor exception model
27.3 Relationship between virtualization and ARM Security Extensions
28: Introducing big.LITTLE
28.1 big.LITTLE configuration
28.2 Structure of a big.LITTLE system
28.3 Execution models in big.LITTLE
28.3.1 big.LITTLE migration models
28.3.2 Cluster migration
28.3.3 CPU migration
28.4 big.LITTLE MP operation
29: Debug
29.1 ARM debug hardware
29.1.1 Debug events
29.2 ARM trace hardware
29.2.1 CoreSight
29.3 Debug monitor
29.4 Debugging Linux applications
29.5 DS-5 debug and trace
29.5.1 Debugging Linux applications using DS-5
29.5.2 Debugging Linux kernel modules
29.5.3 Debugging Linux kernels using DS-5
29.5.4 Debugging a multi-threaded applications using DS-5
29.5.5 Debugging shared libraries
29.5.6 Trace support in DS-5
A: Instruction Summary
A.1 Instruction Summary
A.1.1 ADC
A.1.2 ADD
A.1.3 ADR
A.1.4 ADRL
A.1.5 AND
A.1.6 ASR
A.1.7 B
A.1.8 BFC
A.1.9 BFI
A.1.10 BIC
A.1.11 BKPT
A.1.12 BL
A.1.13 BLX
A.1.14 BX
A.1.15 BXJ
A.1.16 CBNZ
A.1.17 CBZ
A.1.18 CDP
A.1.19 CDP2
A.1.20 CHKA
A.1.21 CLREX
A.1.22 CLZ
A.1.23 CMN
A.1.24 CMP
A.1.25 CPS
A.1.26 DBG
A.1.27 DMB
A.1.28 DSB
A.1.29 ENTERX
A.1.30 EOR
A.1.31 ERET
A.1.32 HB
A.1.33 ISB
A.1.34 IT
A.1.35 LDC
A.1.36 LDC2
A.1.37 LDM
A.1.38 LDR
A.1.39 LDR (pseudo-instruction)
A.1.40 LDRD
A.1.41 LDREX
A.1.42 LEAVEX
A.1.43 LSL
A.1.44 LSR
A.1.45 MCR
A.1.46 MCR2
A.1.47 MCRR
A.1.48 MCRR2
A.1.49 MLA
A.1.50 MLS
A.1.51 MOV
A.1.52 MOVT
A.1.53 MOV32
A.1.54 MRC
A.1.55 MRC2
A.1.56 MRRC
A.1.57 MRRC2
A.1.58 MRS
A.1.59 MSR
A.1.60 MUL
A.1.61 MVN
A.1.62 NOP
A.1.63 ORN
A.1.64 ORR
A.1.65 PKHBT
A.1.66 PKHTB
A.1.67 PLD
A.1.68 PLDW
A.1.69 PLI
A.1.70 POP
A.1.71 PUSH
A.1.72 QADD
A.1.73 QADD8
A.1.74 QADD16
A.1.75 QASX
A.1.76 QDADD
A.1.77 QDSUB
A.1.78 QSAX
A.1.79 QSUB
A.1.80 QSUB8
A.1.81 QSUB16
A.1.82 RBIT
A.1.83 REV
A.1.84 REV16
A.1.85 REVSH
A.1.86 RFE
A.1.87 ROR
A.1.88 RRX
A.1.89 RSB
A.1.90 RSC
A.1.91 SADD8
A.1.92 SADD16
A.1.93 SASX
A.1.94 SBC
A.1.95 SBFX
A.1.96 SDIV
A.1.97 SEL
A.1.98 SETEND
A.1.99 SEV
A.1.100 SHADD8
A.1.101 SHADD16
A.1.102 SHASX
A.1.103 SHSAX
A.1.104 SHSUB8
A.1.105 SHSUB16
A.1.106 SMC
A.1.107 SMLAxy
A.1.108 SMLAD
A.1.109 SMLAL
A.1.110 SMLALxy
A.1.111 SMLALD
A.1.112 SMLAWy
A.1.113 SMLSLD
A.1.114 SMMLA
A.1.115 SMMLS
A.1.116 SMMUL
A.1.117 SMUAD
A.1.118 SMUSD
A.1.119 SMULxy
A.1.120 SMULL
A.1.121 SMULWy
A.1.122 SRS
A.1.123 SSAT
A.1.124 SSAT16
A.1.125 SSAX
A.1.126 SSUB8
A.1.127 SSUB16
A.1.128 STC
A.1.129 STC2
A.1.130 STM
A.1.131 STR
A.1.132 STRD
A.1.133 STREX
A.1.134 SUB
A.1.135 SVC
A.1.136 SWP
A.1.137 SXT
A.1.138 SXTA
A.1.139 SYS
A.1.140 TBB
A.1.141 TBH
A.1.142 TEQ
A.1.143 TST
A.1.144 UADD8
A.1.145 UADD16
A.1.146 UASX
A.1.147 UBFX
A.1.148 UDIV
A.1.149 UHADD8
A.1.150 UHADD16
A.1.151 UHASX
A.1.152 UHSAX
A.1.153 UHSUB8
A.1.154 UHSUB16
A.1.155 UMAAL
A.1.156 UMLAL
A.1.157 UMULL
A.1.158 UQADD8
A.1.159 UQADD16
A.1.160 UQASX
A.1.161 UQSAX
A.1.162 UQSUB8
A.1.163 UQSUB16
A.1.164 USAD8
A.1.165 USADA8
A.1.166 USAT
A.1.167 USAT16
A.1.168 USAX
A.1.169 USUB8
A.1.170 USUB16
A.1.171 UXT
A.1.172 UXTA
A.1.173 WFE
A.1.174 WFI
A.1.175 YIELD
B: NEON and VFP Instruction Summary
B.1 NEON general data processing instructions
B.1.1 VCVT (fixed-point or integer to floating-point)
B.1.2 VCVT (between half-precision and single-precision floating-point)
B.1.3 VDUP
B.1.4 VEXT
B.1.5 VMOV
B.1.6 VMVN
B.1.7 VMOVL, V{Q}MOVN, VQMOVUN
B.1.8 VREV
B.1.9 VSWP
B.1.10 VTBL
B.1.11 VTBX
B.1.12 VTRN
B.1.13 VUZP
B.1.14 VZIP
B.2 NEON shift instructions
B.2.1 VSHL, VQSHL, VQSHLU, and VSHLL (by immediate)
B.2.2 V{Q}{R}SHL
B.2.3 V{R}SHR{N}, V{R}SRA
B.2.4 VQ{R}SHR{U}N
B.2.5 VSLI
B.2.6 VSRI
B.3 NEON logical and compare operations
B.3.1 VACGE and VACGT
B.3.2 VAND
B.3.3 VBIC (immediate)
B.3.4 VBIC (register)
B.3.5 VBIF
B.3.6 VBIT
B.3.7 VBSL
B.3.8 VCEQ, VCGE, VCGT, VCLE, and VCLT
B.3.9 VEOR
B.3.10 VMOV
B.3.11 VMVN
B.3.12 VORN
B.3.13 VORR (immediate)
B.3.14 VORR (register)
B.3.15 VTST
B.4 NEON arithmetic instructions
B.4.1 VABA{L}
B.4.2 VABD{L}
B.4.3 V{Q}ABS
B.4.4 V{Q}ADD, VADDL, VADDW
B.4.5 V{R}ADDHN
B.4.6 VCLS
B.4.7 VCLZ
B.4.8 VCNT
B.4.9 V{R}HADD
B.4.10 VHSUB
B.4.11 VMAX and VMIN
B.4.12 V{Q}NEG
B.4.13 VPADD{L}, VPADAL
B.4.14 VPMAX and VPMIN
B.4.15 VRECPE
B.4.16 VRECPS
B.4.17 VRSQRTE
B.4.18 VRSQRTS
B.4.19 V{Q}SUB, VSUBL and VSUBW
B.4.20 V{R}SUBHN
B.5 NEON multiply instructions
B.5.1 VFMA, VFMS
B.5.2 VMUL{L}, VMLA{L}, and VMLS{L}
B.5.3 VMUL{L}, VMLA{L}, and VMLS{L} (by scalar)
B.5.4 VQ{R}DMULH (by vector or by scalar)
B.5.5 VQDMULL, VQDMLAL, and VQDMLSL (by vector or by scalar)
B.6 NEON load and store element and structure instructions
B.6.1 VLDn and VSTn (single n-element structure to one lane)
B.6.2 VLDn (single n-element structure to all lanes)
B.6.3 VLDn and VSTn (multiple n-element structures)
B.6.4 VLDR and VSTR
B.6.5 VLDM, VSTM, VPOP, and VPUSH
B.6.6 VMOV (between two ARM registers and an extension register)
B.6.7 VMOV (between an ARM register and a NEON scalar)
B.6.8 VMRS and VMSR
B.7 VFP instructions
B.7.1 VABS
B.7.2 VADD
B.7.3 VCMP
B.7.4 VCVT (between single-precision and double-precision)
B.7.5 VCVT (between floating-point and integer)
B.7.6 VCVT (between floating-point and fixed-point)
B.7.7 VCVTB, VCVTT (half-precision extension)
B.7.8 VDIV
B.7.9 VFMA, VFNMA, VFMS, VFNMS
B.7.10 VMOV
B.7.11 VMOV
B.7.12 VMUL, VMLA, VMLS, VNMUL, VNMLA, and VNMLS
B.7.13 VNEG
B.7.14 VSQRT
B.7.15 VSUB
B.8 NEON and VFP pseudo-instructions
B.8.1 VACLE and VACLT
B.8.2 VAND (immediate)
B.8.3 VCLE and VCLT
B.8.4 VLDR pseudo-instruction
B.8.5 VLDR and VSTR (post-increment and pre-decrement)
B.8.6 VMOV2
B.8.7 VORN
C: Building Linux for ARM Systems
C.1 Building the Linux kernel
C.2 Creating the Linux filesystem
C.3 Putting it together
Index
A
B
C
D
E
F
G
H
I
J
K
L
M
N
O
P
Q
R
S
T
U
V
W
Z
Cortex™-A Series Version: 3.0 Programmer’s Guide Copyright © 2011, 2012 ARM. All rights reserved. ARM DEN0013C (ID071612)
Cortex-A Series Programmer’s Guide Copyright © 2011, 2012 ARM. All rights reserved. Release Information The following changes have been made to this book. Date Issue Confidentiality Change Change history 25 March 2011 10 August 2011 25 June 2012 A B C Proprietary Notice Non-Confidential First release Non-Confidential Second release. Virtualization chapter added. Updated to include Cortex-A15 processor, and LPAE. Corrected and revised throughout Non-Confidential Updated for third release. Updated to include Cortex-A7 processor, and big.LITTLE. Index added. Corrected and revised throughout. This Cortex-A Series Programmer’s Guide is protected by copyright and the practice or implementation of the information herein may be protected by one or more patents or pending applications. No part of this Cortex-A Series Programmer’s Guide may be reproduced in any form by any means without the express prior written permission of ARM. No license, express or implied, by estoppel or otherwise to any intellectual property rights is granted by this Cortex-A Series Programmer’s Guide. Your access to the information in this Cortex-A Series Programmer’s Guide is conditional upon your acceptance that you will not use or permit others to use the information for the purposes of determining whether implementations of the information herein infringe any third party patents. This Cortex-A Series Programmer’s Guide is provided “as is”. ARM makes no representations or warranties, either express or implied, included but not limited to, warranties of merchantability, fitness for a particular purpose, or non-infringement, that the content of this Cortex-A Series Programmer’s Guide is suitable for any particular purpose or that any practice or implementation of the contents of the Cortex-A Series Programmer’s Guide will not infringe any third party patents, copyrights, trade secrets, or other rights. This Cortex-A Series Programmer’s Guide may include technical inaccuracies or typographical errors. To the extent not prohibited by law, in no event will ARM be liable for any damages, including without limitation any direct loss, lost revenue, lost profits or data, special, indirect, consequential, incidental or punitive damages, however caused and regardless of the theory of liability, arising out of or related to any furnishing, practicing, modifying or any use of this Programmer’s Guide, even if ARM has been advised of the possibility of such damages. The information provided herein is subject to U.S. export control laws, including the U.S. Export Administration Act and its associated regulations, and may be subject to export or import regulations in other countries. You agree to comply fully with all laws and regulations of the United States and other countries (“Export Laws”) to assure that neither the information herein, nor any direct products thereof are; (i) exported, directly or indirectly, in violation of Export Laws, either to any countries that are subject to U.S. export restrictions or to any end user who has been prohibited from participating in the U.S. export transactions by any federal agency of the U.S. government; or (ii) intended to be used for any purpose prohibited by Export Laws, including, without limitation, nuclear, chemical, or biological weapons proliferation. Words and logos marked with ® or ™ are registered trademarks or trademarks of ARM Limited, except as otherwise stated below in this proprietary notice. Other brands and names mentioned herein may be the trademarks of their respective owners. Copyright © 2011, 2012 ARM Limited, 110 Fulbourn Road Cambridge, CB1 9NJ, England This document is Non-Confidential but any disclosure by you is subject to you providing notice to and the acceptance by the recipient of, the conditions set out above. In this document, where the term ARM is used to refer to the company it means “ARM or any of its subsidiaries as appropriate”. Web Address http://www.arm.com ARM DEN0013C ID071612 Copyright © 2011, 2012 ARM. All rights reserved. Non-Confidential ii
Contents Cortex-A Series Programmer’s Guide Chapter 1 Chapter 2 Chapter 3 Preface References ....................................................................................................................... x Typographical conventions .............................................................................................. xi Feedback on this book .................................................................................................... xii Glossary ......................................................................................................................... xiii Introduction 1.1 1.2 1.3 History ........................................................................................................................... 1-2 System-on-Chip (SoC) .................................................................................................. 1-3 Embedded systems ...................................................................................................... 1-4 ARM Architecture and Processors 2.1 2.2 2.3 2.4 2.5 Architecture versions .................................................................................................... 2-3 Architecture history and extensions .............................................................................. 2-4 Key architectural points of ARM Cortex-A series processors ....................................... 2-8 Processors and pipelines .............................................................................................. 2-9 The Cortex-A series processors ................................................................................. 2-11 Tools, Operating Systems and Boards 3.1 3.2 3.3 3.4 3.5 Linux distributions ......................................................................................................... 3-2 Useful tools ................................................................................................................... 3-6 Software toolchains for ARM processors ...................................................................... 3-8 ARM DS-5 ................................................................................................................... 3-11 Example platforms ...................................................................................................... 3-13 Chapter 4 ARM Registers, Modes and Instruction Sets 4.1 4.2 Instruction sets .............................................................................................................. 4-2 Modes ........................................................................................................................... 4-3 ARM DEN0013C ID071612 Copyright © 2011, 2012 ARM. All rights reserved. Non-Confidential iii
Chapter 5 Chapter 6 Chapter 7 Chapter 8 Chapter 9 Chapter 10 Contents 4.3 4.4 4.5 Registers ....................................................................................................................... 4-4 Instruction pipelines ...................................................................................................... 4-7 Branch prediction ........................................................................................................ 4-10 Introduction to Assembly Language 5.1 5.2 5.3 5.4 5.5 5.6 Comparison with other assembly languages ................................................................ 5-2 Instruction sets .............................................................................................................. 5-4 Introduction to the GNU Assembler .............................................................................. 5-5 ARM tools assembly language ..................................................................................... 5-9 Interworking ................................................................................................................ 5-11 Identifying assembly code .......................................................................................... 5-12 ARM/Thumb Unified Assembly Language Instructions 6.1 6.2 6.3 6.4 6.5 6.6 6.7 6.8 Instruction set basics .................................................................................................... 6-2 Data processing operations .......................................................................................... 6-6 Multiplication operations ............................................................................................... 6-9 Memory instructions .................................................................................................... 6-10 Branches ..................................................................................................................... 6-13 Integer SIMD instructions ........................................................................................... 6-14 Saturating arithmetic ................................................................................................... 6-18 Miscellaneous instructions .......................................................................................... 6-19 Floating-Point 7.1 7.2 7.3 7.4 7.5 Floating-point basics and the IEEE-754 standard ........................................................ 7-2 VFP support in GCC ..................................................................................................... 7-8 VFP support in the ARM Compiler ................................................................................ 7-9 VFP support in Linux .................................................................................................. 7-10 Floating-point optimization .......................................................................................... 7-11 Introducing NEON 8.1 8.2 SIMD ............................................................................................................................. 8-2 NEON architecture overview ........................................................................................ 8-4 Caches 9.1 9.2 9.3 9.4 9.5 9.6 9.7 9.8 9.9 9.10 9.11 Why do caches help? ................................................................................................... 9-3 Cache drawbacks ......................................................................................................... 9-4 Memory hierarchy ......................................................................................................... 9-5 Cache architecture ........................................................................................................ 9-6 Cache policies ............................................................................................................ 9-12 Write and Fetch buffers .............................................................................................. 9-14 Cache performance and hit rate ................................................................................. 9-15 Invalidating and cleaning cache memory .................................................................... 9-16 Point of coherency and unification .............................................................................. 9-17 Level 2 cache controller .............................................................................................. 9-18 Parity and ECC in caches ........................................................................................... 9-19 Memory Management Unit 10.1 10.2 10.3 10.4 10.5 10.6 10.7 10.8 10.9 Virtual memory ............................................................................................................ 10-3 Level 1 page tables ..................................................................................................... 10-4 Level 2 page tables ..................................................................................................... 10-7 The Translation Lookaside Buffer ............................................................................. 10-10 TLB coherency .......................................................................................................... 10-11 Choice of page sizes ................................................................................................ 10-12 Memory attributes ..................................................................................................... 10-13 Multi-tasking and OS usage of page tables .............................................................. 10-16 Large Physical Address Extensions ......................................................................... 10-19 ARM DEN0013C ID071612 Copyright © 2011, 2012 ARM. All rights reserved. Non-Confidential iv
Chapter 11 Chapter 12 Chapter 13 Chapter 14 Chapter 15 Chapter 16 Chapter 17 Chapter 18 Chapter 19 Chapter 20 Chapter 21 Contents Memory Ordering 11.1 11.2 11.3 ARM memory ordering model ..................................................................................... 11-4 Memory barriers .......................................................................................................... 11-7 Cache coherency implications .................................................................................. 11-12 Exception Handling 12.1 12.2 12.3 12.4 12.5 12.6 Types of exception ...................................................................................................... 12-3 Exception mode summary .......................................................................................... 12-5 Entering an exception handler .................................................................................... 12-7 Exit from an exception handler ................................................................................... 12-8 Vector table ................................................................................................................. 12-9 Return instruction ...................................................................................................... 12-10 Interrupt Handling 13.1 13.2 External interrupt requests .......................................................................................... 13-2 Generic Interrupt Controller ........................................................................................ 13-4 Other Exception Handlers 14.1 14.2 14.3 14.4 Abort handler .............................................................................................................. 14-2 Undefined instruction handling ................................................................................... 14-3 SVC exception handling ............................................................................................. 14-4 Linux exception program flow ..................................................................................... 14-5 Boot Code 15.1 15.2 15.3 Booting a bare-metal system ...................................................................................... 15-2 Configuration .............................................................................................................. 15-6 Booting Linux .............................................................................................................. 15-7 Porting 16.1 16.2 16.3 16.4 16.5 Endianness ................................................................................................................. 16-2 Alignment .................................................................................................................... 16-5 Miscellaneous C porting issues .................................................................................. 16-7 Porting ARM assembly code to ARMv7 .................................................................... 16-10 Porting ARM code to Thumb .................................................................................... 16-11 Application Binary Interfaces 17.1 17.2 Procedure Call Standard ............................................................................................ 17-2 Mixing C and assembly code ...................................................................................... 17-7 Profiling 18.1 Profiler output ............................................................................................................. 18-3 Optimizing Code to Run on ARM Processors 19.1 19.2 19.3 Compiler optimizations ............................................................................................... 19-3 ARM memory system optimization ............................................................................. 19-8 Source code modifications ........................................................................................ 19-13 Writing NEON Code 20.1 20.2 20.3 NEON C Compiler and assembler .............................................................................. 20-2 Optimizing NEON assembler code ............................................................................. 20-7 NEON power saving ................................................................................................... 20-9 Introduction to Multi-processing 21.1 21.2 21.3 Multi-processing ARM systems .................................................................................. 21-3 Symmetric multi-processing ........................................................................................ 21-5 Asymmetric multi-processing ...................................................................................... 21-7 ARM DEN0013C ID071612 Copyright © 2011, 2012 ARM. All rights reserved. Non-Confidential v
Chapter 22 Chapter 23 Chapter 24 Chapter 25 Chapter 26 Chapter 27 Chapter 28 Chapter 29 Appendix A Appendix B Contents SMP Architectural Considerations 22.1 22.2 22.3 22.4 22.5 22.6 Cache coherency ........................................................................................................ 22-2 TLB and cache maintenance broadcast ..................................................................... 22-5 Handling interrupts in an SMP system ........................................................................ 22-6 Exclusive accesses ..................................................................................................... 22-7 Booting SMP systems ............................................................................................... 22-10 Private memory region .............................................................................................. 22-12 Parallelizing Software 23.1 23.2 23.3 23.4 Decomposition methods ............................................................................................. 23-2 Threading models ....................................................................................................... 23-4 Threading libraries ...................................................................................................... 23-5 Synchronization mechanisms in the Linux kernel ....................................................... 23-8 Issues with Parallelizing Software 24.1 24.2 24.3 Thread safety and reentrancy ..................................................................................... 24-2 Performance issues .................................................................................................... 24-3 Profiling in SMP systems ............................................................................................ 24-5 Power Management 25.1 Power and clocking ..................................................................................................... 25-2 Security 26.1 TrustZone hardware architecture ................................................................................ 26-2 Virtualization 27.1 27.2 27.3 ARMv7-A Virtualization Extensions ............................................................................ 27-3 Hypervisor exception model ....................................................................................... 27-5 Relationship between virtualization and ARM Security Extensions ............................ 27-6 Introducing big.LITTLE 28.1 28.2 28.3 28.4 big.LITTLE configuration ............................................................................................. 28-2 Structure of a big.LITTLE system ............................................................................... 28-3 Execution models in big.LITTLE ................................................................................. 28-4 big.LITTLE MP operation ............................................................................................ 28-9 Debug 29.1 29.2 29.3 29.4 29.5 ARM debug hardware ................................................................................................. 29-2 ARM trace hardware ................................................................................................... 29-4 Debug monitor ............................................................................................................ 29-7 Debugging Linux applications ..................................................................................... 29-8 DS-5 debug and trace ................................................................................................. 29-9 Instruction Summary A.1 Instruction Summary ..................................................................................................... A-2 NEON and VFP Instruction Summary B.1 B.2 B.3 B.4 B.5 B.6 B.7 B.8 NEON general data processing instructions ................................................................. B-6 NEON shift instructions ............................................................................................... B-13 NEON logical and compare operations ...................................................................... B-17 NEON arithmetic instructions ...................................................................................... B-23 NEON multiply instructions ......................................................................................... B-32 NEON load and store element and structure instructions ........................................... B-35 VFP instructions .......................................................................................................... B-41 NEON and VFP pseudo-instructions .......................................................................... B-47 ARM DEN0013C ID071612 Copyright © 2011, 2012 ARM. All rights reserved. Non-Confidential vi
Appendix C Building Linux for ARM Systems C.1 C.2 C.3 Building the Linux kernel ............................................................................................... C-2 Creating the Linux filesystem ........................................................................................ C-6 Putting it together .......................................................................................................... C-8 Contents ARM DEN0013C ID071612 Copyright © 2011, 2012 ARM. All rights reserved. Non-Confidential vii
Preface This book provides an introduction to ARM technology for programmers using ARM Cortex-A series processors that conform to the ARM ARMv7–A architecture. The v7 refers to version 7 of the architecture, while the A indicates the architecture profile that describes Application processors. This includes the Cortex-A5, Cortex-A7, Cortex-A8, Cortex-A9 and Cortex-A15 processors. The book complements rather than replaces other ARM documentation that is available for Cortex-A series processors, such as the ARM Technical Reference Manuals (TRMs) for the processors themselves, documentation for individual devices or boards and, most importantly, the ARM Architecture Reference Manual (or the “ARM ARM”). The purpose of this book is to bring together information from a wide variety of sources to provide a single guide for programmers who want to develop applications for the latest Cortex-A series of processors. We will cover hardware concepts such as caches and Memory Management Units, but only where this is valuable to the application writer. The book is intended to provide information that will be useful to both assembly language and C programmers. We will look at how complex operating systems, such as Linux, make use of ARM features, and how to take full advantage of the many advanced capabilities of the ARM processor, in particular writing software for multi-processing and using the SIMD capabilities of the device. Although much of the book is also applicable to other ARM processors, we do not explicitly cover processors that implement older versions of the Architecture. The Cortex-R series and M-series processors are mentioned but not described. Our intention is to provide an approachable introduction to the ARM architecture, covering the feature set in detail and providing practical advice on writing both C and assembly language programs to run efficiently on a Cortex-A series processor. This is not an introductory level book. We assume knowledge of the C programming language and microprocessors, but not of any ARM-specific background. In the allotted space, we cannot hope to cover every topic in detail. In some chapters, we suggest further reading (referring either to books or websites) that can give a deeper level of background to the topic in hand, but in this book we will focus on the ARM-specific detail. We do not assume the use of any particular tool chain. ARM DEN0013C ID071612 Copyright © 2011, 2012 ARM. All rights reserved. Non-Confidential viii
分享到:
收藏