ARM cortex a系列架构详细介绍.pdf

发布时间：2022-06-23 发布人：admin 分类：说明书资料大小：3.76M 资料格式：pdf 举报版权申诉

4f605c73-c079-441c-b26c-b9024c142da7.pdf-第1页.png

第1页 / 共451页

4f605c73-c079-441c-b26c-b9024c142da7.pdf-第2页.png

第2页 / 共451页

4f605c73-c079-441c-b26c-b9024c142da7.pdf-第3页.png

第3页 / 共451页

4f605c73-c079-441c-b26c-b9024c142da7.pdf-第4页.png

第4页 / 共451页

4f605c73-c079-441c-b26c-b9024c142da7.pdf-第5页.png

第5页 / 共451页

4f605c73-c079-441c-b26c-b9024c142da7.pdf-第6页.png

第6页 / 共451页

4f605c73-c079-441c-b26c-b9024c142da7.pdf-第7页.png

第7页 / 共451页

4f605c73-c079-441c-b26c-b9024c142da7.pdf-第8页.png

第8页 / 共451页

Cortex-A Series Programmer’s Guide

Contents

Preface

References

Typographical conventions

Feedback on this book

Glossary

1: Introduction

1.1 History

1.2 System-on-Chip (SoC)

1.3 Embedded systems

2: ARM Architecture and Processors

2.1 Architecture versions

2.2 Architecture history and extensions

2.2.1 DSP multiply-accumulate and saturated arithmetic instructions

2.2.2 Jazelle

2.2.3 Thumb Execution Environment (ThumbEE)

2.2.4 Thumb-2

2.2.5 Security Extensions (TrustZone)

2.2.6 VFP

2.2.7 Advanced SIMD (NEON)

2.2.8 Large Physical Address Extension (LPAE)

2.2.9 Virtualization

2.2.10 big.LITTLE

2.3 Key architectural points of ARM Cortex-A series processors

2.4 Processors and pipelines

2.5 The Cortex-A series processors

2.5.1 The Cortex-A5 processor

2.5.2 The Cortex-A7 processor

2.5.3 The Cortex-A8 processor

2.5.4 The Cortex-A9 processor

2.5.5 The Cortex-A15 processor

2.5.6 Qualcomm Scorpion

3: Tools, Operating Systems and Boards

3.1 Linux distributions

3.1.1 Linux for ARM systems

3.1.2 Linux terminology

3.1.3 Embedded Linux

3.1.4 Board Support Package

3.1.5 Linaro

3.2 Useful tools

3.2.1 QEMU

3.2.2 BusyBox

3.2.3 Scratchbox

3.2.4 U-Boot

3.2.5 UEFI and Tianocore

3.3 Software toolchains for ARM processors

3.3.1 GNU toolchain

3.3.2 ARM Compiler toolchain

3.4 ARM DS-5

3.5 Example platforms

3.5.1 BeagleBoard

3.5.2 Pandora

3.5.3 ST Ericsson Snowball

3.5.4 Gumstix

3.5.5 PandaBoard

4: ARM Registers, Modes and Instruction Sets

4.1 Instruction sets

4.2 Modes

4.3 Registers

4.3.1 Program Status Registers

4.4 Instruction pipelines

4.4.1 Multi-issue pipelines

4.4.2 Register renaming

4.5 Branch prediction

4.5.1 Return stack

4.5.2 Programmer’s view

5: Introduction to Assembly Language

5.1 Comparison with other assembly languages

5.2 Instruction sets

5.3 Introduction to the GNU Assembler

5.3.1 Invoking the GNU Assembler

5.3.2 GNU Assembler syntax

5.3.3 Sections

5.3.4 Assembler directives

5.3.5 Expressions

5.3.6 GNU tools naming conventions

5.4 ARM tools assembly language

5.4.1 ARM assembler syntax

5.4.2 Label

5.4.3 Directives

5.5 Interworking

5.6 Identifying assembly code

6: ARM/Thumb Unified Assembly Language Instructions

6.1 Instruction set basics

6.1.1 Constant values

6.1.2 Conditional execution

6.1.3 Status flags and condition codes

6.2 Data processing operations

6.2.1 Operand 2 and the barrel shifter

6.3 Multiplication operations

6.3.1 Additional multiplies

6.4 Memory instructions

6.4.1 Addressing modes

6.4.2 Multiple transfers

6.5 Branches

6.6 Integer SIMD instructions

6.6.1 Integer register SIMD instructions

6.6.2 Integer register SIMD multiplies

6.6.3 Sum of absolute differences

6.6.4 Data packing and unpacking

6.6.5 Byte selection

6.7 Saturating arithmetic

6.7.1 Saturated math instructions

6.8 Miscellaneous instructions

6.8.1 Coprocessor instructions

6.8.2 Coprocessor 15

6.8.3 SVC

6.8.4 PSR modification

6.8.5 Bit manipulation

6.8.6 Cache preload

6.8.7 Byte reversal

6.8.8 Other instructions

7: Floating-Point

7.1 Floating-point basics and the IEEE-754 standard

7.1.1 Rounding algorithms

7.1.2 ARM VFP

7.1.3 Instructions

7.1.4 Enabling VFP

7.2 VFP support in GCC

7.3 VFP support in the ARM Compiler

7.4 VFP support in Linux

7.4.1 Context switching

7.5 Floating-point optimization

8: Introducing NEON

8.1 SIMD

8.1.1 ARMv6 SIMD instructions

8.2 NEON architecture overview

8.2.1 Commonality with VFP

8.2.2 Data types

8.2.3 NEON registers

8.2.4 NEON instruction set

9: Caches

9.1 Why do caches help?

9.2 Cache drawbacks

9.3 Memory hierarchy

9.4 Cache architecture

9.4.1 Cache controller

9.4.2 Direct mapped caches

9.4.3 Set associative caches

9.4.4 Cache terminology

9.4.5 A real-life example

9.4.6 Virtual and physical tags and indexes

9.5 Cache policies

9.5.1 Allocation policy

9.5.2 Replacement policy

9.5.3 Write policy

9.6 Write and Fetch buffers

9.7 Cache performance and hit rate

9.8 Invalidating and cleaning cache memory

9.9 Point of coherency and unification

9.10 Level 2 cache controller

9.10.1 Level 2 cache maintenance

9.11 Parity and ECC in caches

10: Memory Management Unit

10.1 Virtual memory

10.2 Level 1 page tables

10.3 Level 2 page tables

10.4 The Translation Lookaside Buffer

10.5 TLB coherency

10.6 Choice of page sizes

10.7 Memory attributes

10.7.1 Memory Access Permissions

10.7.2 Memory types

10.7.3 Domains

10.8 Multi-tasking and OS usage of page tables

10.8.1 Address Space ID

10.8.2 Page Table Base Register 0 and 1

10.8.3 The Fast Context Switch Extension

10.9 Large Physical Address Extensions

11: Memory Ordering

11.1 ARM memory ordering model

11.1.1 Strongly-ordered and Device memory

11.1.2 Normal memory

11.2 Memory barriers

11.2.1 Memory barrier use example

11.2.2 Avoiding deadlocks with a barrier

11.2.3 WFE and WFI Interaction with barriers

11.2.4 Linux use of barriers

11.3 Cache coherency implications

11.3.1 Issues with copying code

11.3.2 Compiler re-ordering optimizations

12: Exception Handling

12.1 Types of exception

12.2 Exception mode summary

12.2.1 Exception priorities

12.3 Entering an exception handler

12.4 Exit from an exception handler

12.5 Vector table

12.6 Return instruction

13: Interrupt Handling

13.1 External interrupt requests

13.1.1 Assigning interrupts

13.1.2 Simplistic interrupt handling

13.1.3 Nested interrupt handling

13.2 Generic Interrupt Controller

13.2.1 Configuration

13.2.2 Interrupt handling

14: Other Exception Handlers

14.1 Abort handler

14.2 Undefined instruction handling

14.3 SVC exception handling

14.4 Linux exception program flow

14.4.1 Boot process

14.4.2 Interrupt dispatch

15: Boot Code

15.1 Booting a bare-metal system

15.2 Configuration

15.3 Booting Linux

15.3.1 Reset handler

15.3.2 Bootloader

15.3.3 Initialize memory system

15.3.4 Kernel images

15.3.5 Kernel parameters

15.3.6 Kernel entry

15.3.7 Platform-specific actions

15.3.8 Kernel start-up code

16: Porting

16.1 Endianness

16.2 Alignment

16.3 Miscellaneous C porting issues

16.3.1 unsigned char and signed char

16.3.2 Compiler packing of structures

16.3.3 Use of the stack

16.3.4 Other issues

16.4 Porting ARM assembly code to ARMv7

16.4.1 Memory access ordering and memory barriers

16.5 Porting ARM code to Thumb

16.5.1 Use of PC as an operand

16.5.2 Branches and interworking

16.5.3 Operand combinations

16.5.4 Other ARM/Thumb differences

17: Application Binary Interfaces

17.1 Procedure Call Standard

17.1.1 VFP and NEON register usage

17.1.2 Linkage

17.1.3 Stack and heap

17.1.4 Returning results

17.2 Mixing C and assembly code

18: Profiling

18.1 Profiler output

18.1.1 Gprof

18.1.2 OProfile

18.1.3 DS-5 Streamline

18.1.4 ARM performance monitor

18.1.5 Linux perf events

18.1.6 Ftrace

18.1.7 Valgrind and Cachegrind

19: Optimizing Code to Run on ARM Processors

19.1 Compiler optimizations

19.1.1 Function inlining

19.1.2 Eliminating common sub-expressions

19.1.3 Loop unrolling

19.1.4 GCC optimization options

19.1.5 armcc optimization options

19.2 ARM memory system optimization

19.2.1 Data cache optimization

19.2.2 Loop tiling

19.2.3 Loop interchange

19.2.4 Structure alignment

19.2.5 Associativity effects

19.2.6 Optimizing instruction cache usage

19.2.7 Optimizing L2 and outer cache usage

19.2.8 Optimizing TLB usage

19.2.9 Data abort optimization

19.2.10 Prefetching a memory block access

19.3 Source code modifications

19.3.1 Loop termination

19.3.2 Loop fusion

19.3.3 Reducing stack and heap usage

19.3.4 Variable selection

19.3.5 Pointer aliasing

19.3.6 Division and modulo

19.3.7 Extern data

19.3.8 Inline or embedded assembler

19.3.9 Complex addressing modes

19.3.10 Unaligned access

19.3.11 Linker optimizations

20: Writing NEON Code

20.1 NEON C Compiler and assembler

20.1.1 Vectorization

20.1.2 NEON libraries

20.1.3 Intrinsics

20.1.4 NEON types in C

20.1.5 Variables and constants

20.1.6 Generating NEON instructions from C/C++ code

20.1.7 NEON assembler and ABI restrictions

20.1.8 Detecting NEON

20.2 Optimizing NEON assembler code

20.2.1 Memory access optimizations

20.2.2 Alignment

20.2.3 Scheduling

20.3 NEON power saving

21: Introduction to Multi-processing

21.1 Multi-processing ARM systems

21.2 Symmetric multi-processing

21.3 Asymmetric multi-processing

22: SMP Architectural Considerations

22.1 Cache coherency

22.1.1 MESI protocol

22.1.2 MOESI protocol

22.1.3 Accelerator Coherency Port (ACP)

22.2 TLB and cache maintenance broadcast

22.3 Handling interrupts in an SMP system

22.4 Exclusive accesses

22.5 Booting SMP systems

22.5.1 Processor ID

22.5.2 SMP boot in Linux

22.6 Private memory region

22.6.1 Timers and watchdogs

23: Parallelizing Software

23.1 Decomposition methods

23.2 Threading models

23.3 Threading libraries

23.3.1 Inter-thread communications

23.3.2 Threaded performance

23.3.3 Thread affinity

23.4 Synchronization mechanisms in the Linux kernel

23.4.1 Completions

23.4.2 Spinlocks

23.4.3 Semaphores

23.4.4 Lock-free synchronization

24: Issues with Parallelizing Software

24.1 Thread safety and reentrancy

24.2 Performance issues

24.2.1 Bandwidth concerns

24.2.2 Thread dependencies

24.2.3 Cache thrashing

24.2.4 False sharing

24.2.5 Deadlock and livelock

24.3 Profiling in SMP systems

25: Power Management

25.1 Power and clocking

25.1.1 Standby mode

25.1.2 Dormant mode

25.1.3 Assembly language power instructions

25.1.4 Dynamic Voltage and Frequency Scaling

26: Security

26.1 TrustZone hardware architecture

26.1.1 Multi-processor systems with security extensions

26.1.2 Interaction of Normal and Secure worlds

27: Virtualization

27.1 ARMv7-A Virtualization Extensions

27.1.1 Privilege model in ARMv7-A Virtualization Extensions

27.1.2 Hypervisor mode

27.1.3 Memory translation

27.2 Hypervisor exception model

27.3 Relationship between virtualization and ARM Security Extensions

28: Introducing big.LITTLE

28.1 big.LITTLE configuration

28.2 Structure of a big.LITTLE system

28.3 Execution models in big.LITTLE

28.3.1 big.LITTLE migration models

28.3.2 Cluster migration

28.3.3 CPU migration

28.4 big.LITTLE MP operation

29: Debug

29.1 ARM debug hardware

29.1.1 Debug events

29.2 ARM trace hardware

29.2.1 CoreSight

29.3 Debug monitor

29.4 Debugging Linux applications

29.5 DS-5 debug and trace

29.5.1 Debugging Linux applications using DS-5

29.5.2 Debugging Linux kernel modules

29.5.3 Debugging Linux kernels using DS-5

29.5.4 Debugging a multi-threaded applications using DS-5

29.5.5 Debugging shared libraries

29.5.6 Trace support in DS-5

A: Instruction Summary

A.1 Instruction Summary

A.1.1 ADC

A.1.2 ADD

A.1.3 ADR

A.1.4 ADRL

A.1.5 AND

A.1.6 ASR

A.1.7 B

A.1.8 BFC

A.1.9 BFI

A.1.10 BIC

A.1.11 BKPT

A.1.12 BL

A.1.13 BLX

A.1.14 BX

A.1.15 BXJ

A.1.16 CBNZ

A.1.17 CBZ

A.1.18 CDP

A.1.19 CDP2

A.1.20 CHKA

A.1.21 CLREX

A.1.22 CLZ

A.1.23 CMN

A.1.24 CMP

A.1.25 CPS

A.1.26 DBG

A.1.27 DMB

A.1.28 DSB

A.1.29 ENTERX

A.1.30 EOR

A.1.31 ERET

A.1.32 HB

A.1.33 ISB

A.1.34 IT

A.1.35 LDC

A.1.36 LDC2

A.1.37 LDM

A.1.38 LDR

A.1.39 LDR (pseudo-instruction)

A.1.40 LDRD

A.1.41 LDREX

A.1.42 LEAVEX

A.1.43 LSL

A.1.44 LSR

A.1.45 MCR

A.1.46 MCR2

A.1.47 MCRR

A.1.48 MCRR2

A.1.49 MLA

A.1.50 MLS

A.1.51 MOV

A.1.52 MOVT

A.1.53 MOV32

A.1.54 MRC

A.1.55 MRC2

A.1.56 MRRC

A.1.57 MRRC2

A.1.58 MRS

A.1.59 MSR

A.1.60 MUL

A.1.61 MVN

A.1.62 NOP

A.1.63 ORN

A.1.64 ORR

A.1.65 PKHBT

A.1.66 PKHTB

A.1.67 PLD

A.1.68 PLDW

A.1.69 PLI

A.1.70 POP

A.1.71 PUSH

A.1.72 QADD

A.1.73 QADD8

A.1.74 QADD16

A.1.75 QASX

A.1.76 QDADD

A.1.77 QDSUB

A.1.78 QSAX

A.1.79 QSUB

A.1.80 QSUB8

A.1.81 QSUB16

A.1.82 RBIT

A.1.83 REV

A.1.84 REV16

A.1.85 REVSH

A.1.86 RFE

A.1.87 ROR

A.1.88 RRX

A.1.89 RSB

A.1.90 RSC

A.1.91 SADD8

A.1.92 SADD16

A.1.93 SASX

A.1.94 SBC

A.1.95 SBFX

A.1.96 SDIV

A.1.97 SEL

A.1.98 SETEND

A.1.99 SEV

A.1.100 SHADD8

A.1.101 SHADD16

A.1.102 SHASX

A.1.103 SHSAX

A.1.104 SHSUB8

A.1.105 SHSUB16

A.1.106 SMC

A.1.107 SMLAxy

A.1.108 SMLAD

A.1.109 SMLAL

A.1.110 SMLALxy

A.1.111 SMLALD

A.1.112 SMLAWy

A.1.113 SMLSLD

A.1.114 SMMLA

A.1.115 SMMLS

A.1.116 SMMUL

A.1.117 SMUAD

A.1.118 SMUSD

A.1.119 SMULxy

A.1.120 SMULL

A.1.121 SMULWy

A.1.122 SRS

A.1.123 SSAT

A.1.124 SSAT16

A.1.125 SSAX

A.1.126 SSUB8

A.1.127 SSUB16

A.1.128 STC

A.1.129 STC2

A.1.130 STM

A.1.131 STR

A.1.132 STRD

A.1.133 STREX

A.1.134 SUB

A.1.135 SVC

A.1.136 SWP

A.1.137 SXT

A.1.138 SXTA

A.1.139 SYS

A.1.140 TBB

A.1.141 TBH

A.1.142 TEQ

A.1.143 TST

A.1.144 UADD8

A.1.145 UADD16

A.1.146 UASX

A.1.147 UBFX

A.1.148 UDIV

A.1.149 UHADD8

A.1.150 UHADD16

A.1.151 UHASX

A.1.152 UHSAX

A.1.153 UHSUB8

A.1.154 UHSUB16

A.1.155 UMAAL

A.1.156 UMLAL

A.1.157 UMULL

A.1.158 UQADD8

A.1.159 UQADD16

A.1.160 UQASX

A.1.161 UQSAX

A.1.162 UQSUB8

A.1.163 UQSUB16

A.1.164 USAD8

A.1.165 USADA8

A.1.166 USAT

A.1.167 USAT16

A.1.168 USAX

A.1.169 USUB8

A.1.170 USUB16

A.1.171 UXT

A.1.172 UXTA

A.1.173 WFE

A.1.174 WFI

A.1.175 YIELD

B: NEON and VFP Instruction Summary

B.1 NEON general data processing instructions

B.1.1 VCVT (fixed-point or integer to floating-point)

B.1.2 VCVT (between half-precision and single-precision floating-point)

B.1.3 VDUP

B.1.4 VEXT

B.1.5 VMOV

B.1.6 VMVN

B.1.7 VMOVL, V{Q}MOVN, VQMOVUN

B.1.8 VREV

B.1.9 VSWP

B.1.10 VTBL

B.1.11 VTBX

B.1.12 VTRN

B.1.13 VUZP

B.1.14 VZIP

B.2 NEON shift instructions

B.2.1 VSHL, VQSHL, VQSHLU, and VSHLL (by immediate)

B.2.2 V{Q}{R}SHL

B.2.3 V{R}SHR{N}, V{R}SRA

B.2.4 VQ{R}SHR{U}N

B.2.5 VSLI

B.2.6 VSRI

B.3 NEON logical and compare operations

B.3.1 VACGE and VACGT

B.3.2 VAND

B.3.3 VBIC (immediate)

B.3.4 VBIC (register)

B.3.5 VBIF

B.3.6 VBIT

B.3.7 VBSL

B.3.8 VCEQ, VCGE, VCGT, VCLE, and VCLT

B.3.9 VEOR

B.3.10 VMOV

B.3.11 VMVN

B.3.12 VORN

B.3.13 VORR (immediate)

B.3.14 VORR (register)

B.3.15 VTST

B.4 NEON arithmetic instructions

B.4.1 VABA{L}

B.4.2 VABD{L}

B.4.3 V{Q}ABS

B.4.4 V{Q}ADD, VADDL, VADDW

B.4.5 V{R}ADDHN

B.4.6 VCLS

B.4.7 VCLZ

B.4.8 VCNT

B.4.9 V{R}HADD

B.4.10 VHSUB

B.4.11 VMAX and VMIN

B.4.12 V{Q}NEG

B.4.13 VPADD{L}, VPADAL

B.4.14 VPMAX and VPMIN

B.4.15 VRECPE

B.4.16 VRECPS

B.4.17 VRSQRTE

B.4.18 VRSQRTS

B.4.19 V{Q}SUB, VSUBL and VSUBW

B.4.20 V{R}SUBHN

B.5 NEON multiply instructions

B.5.1 VFMA, VFMS

B.5.2 VMUL{L}, VMLA{L}, and VMLS{L}

B.5.3 VMUL{L}, VMLA{L}, and VMLS{L} (by scalar)

B.5.4 VQ{R}DMULH (by vector or by scalar)

B.5.5 VQDMULL, VQDMLAL, and VQDMLSL (by vector or by scalar)

B.6 NEON load and store element and structure instructions

B.6.1 VLDn and VSTn (single n-element structure to one lane)

B.6.2 VLDn (single n-element structure to all lanes)

B.6.3 VLDn and VSTn (multiple n-element structures)

B.6.4 VLDR and VSTR

B.6.5 VLDM, VSTM, VPOP, and VPUSH

B.6.6 VMOV (between two ARM registers and an extension register)

B.6.7 VMOV (between an ARM register and a NEON scalar)

B.6.8 VMRS and VMSR

B.7 VFP instructions

B.7.1 VABS

B.7.2 VADD

B.7.3 VCMP

B.7.4 VCVT (between single-precision and double-precision)

B.7.5 VCVT (between floating-point and integer)

B.7.6 VCVT (between floating-point and fixed-point)

B.7.7 VCVTB, VCVTT (half-precision extension)

B.7.8 VDIV

B.7.9 VFMA, VFNMA, VFMS, VFNMS

B.7.10 VMOV

B.7.11 VMOV

B.7.12 VMUL, VMLA, VMLS, VNMUL, VNMLA, and VNMLS

B.7.13 VNEG

B.7.14 VSQRT

B.7.15 VSUB

B.8 NEON and VFP pseudo-instructions

B.8.1 VACLE and VACLT

B.8.2 VAND (immediate)

B.8.3 VCLE and VCLT

B.8.4 VLDR pseudo-instruction

B.8.5 VLDR and VSTR (post-increment and pre-decrement)

B.8.6 VMOV2

B.8.7 VORN

C: Building Linux for ARM Systems

C.1 Building the Linux kernel

C.2 Creating the Linux filesystem

C.3 Putting it together

Index

Cortex-A Series Programmer’s Guide Copyright © 2011, 2012 ARM. All rights reserved. Release Information The following changes have been made to this book. Date Issue Confidentiality Change Change history 25 March 2011 10 August 2011 25 June 2012 A B C Proprietary Notice Non-Confidential First release Non-Confidential Second release. Virtualization chapter added. Updated to include Cortex-A15 processor, and LPAE. Corrected and revised throughout Non-Confidential Updated for third release. Updated to include Cortex-A7 processor, and big.LITTLE. Index added. Corrected and revised throughout. This Cortex-A Series Programmer’s Guide is protected by copyright and the practice or implementation of the information herein may be protected by one or more patents or pending applications. No part of this Cortex-A Series Programmer’s Guide may be reproduced in any form by any means without the express prior written permission of ARM. No license, express or implied, by estoppel or otherwise to any intellectual property rights is granted by this Cortex-A Series Programmer’s Guide. Your access to the information in this Cortex-A Series Programmer’s Guide is conditional upon your acceptance that you will not use or permit others to use the information for the purposes of determining whether implementations of the information herein infringe any third party patents. This Cortex-A Series Programmer’s Guide is provided “as is”. ARM makes no representations or warranties, either express or implied, included but not limited to, warranties of merchantability, fitness for a particular purpose, or non-infringement, that the content of this Cortex-A Series Programmer’s Guide is suitable for any particular purpose or that any practice or implementation of the contents of the Cortex-A Series Programmer’s Guide will not infringe any third party patents, copyrights, trade secrets, or other rights. This Cortex-A Series Programmer’s Guide may include technical inaccuracies or typographical errors. To the extent not prohibited by law, in no event will ARM be liable for any damages, including without limitation any direct loss, lost revenue, lost profits or data, special, indirect, consequential, incidental or punitive damages, however caused and regardless of the theory of liability, arising out of or related to any furnishing, practicing, modifying or any use of this Programmer’s Guide, even if ARM has been advised of the possibility of such damages. The information provided herein is subject to U.S. export control laws, including the U.S. Export Administration Act and its associated regulations, and may be subject to export or import regulations in other countries. You agree to comply fully with all laws and regulations of the United States and other countries (“Export Laws”) to assure that neither the information herein, nor any direct products thereof are; (i) exported, directly or indirectly, in violation of Export Laws, either to any countries that are subject to U.S. export restrictions or to any end user who has been prohibited from participating in the U.S. export transactions by any federal agency of the U.S. government; or (ii) intended to be used for any purpose prohibited by Export Laws, including, without limitation, nuclear, chemical, or biological weapons proliferation. Words and logos marked with ® or ™ are registered trademarks or trademarks of ARM Limited, except as otherwise stated below in this proprietary notice. Other brands and names mentioned herein may be the trademarks of their respective owners. Copyright © 2011, 2012 ARM Limited, 110 Fulbourn Road Cambridge, CB1 9NJ, England This document is Non-Confidential but any disclosure by you is subject to you providing notice to and the acceptance by the recipient of, the conditions set out above. In this document, where the term ARM is used to refer to the company it means “ARM or any of its subsidiaries as appropriate”. Web Address http://www.arm.com ARM DEN0013C ID071612 Copyright © 2011, 2012 ARM. All rights reserved. Non-Confidential ii

Contents Cortex-A Series Programmer’s Guide Chapter 1 Chapter 2 Chapter 3 Preface References ....................................................................................................................... x Typographical conventions .............................................................................................. xi Feedback on this book .................................................................................................... xii Glossary ......................................................................................................................... xiii Introduction 1.1 1.2 1.3 History ........................................................................................................................... 1-2 System-on-Chip (SoC) .................................................................................................. 1-3 Embedded systems ...................................................................................................... 1-4 ARM Architecture and Processors 2.1 2.2 2.3 2.4 2.5 Architecture versions .................................................................................................... 2-3 Architecture history and extensions .............................................................................. 2-4 Key architectural points of ARM Cortex-A series processors ....................................... 2-8 Processors and pipelines .............................................................................................. 2-9 The Cortex-A series processors ................................................................................. 2-11 Tools, Operating Systems and Boards 3.1 3.2 3.3 3.4 3.5 Linux distributions ......................................................................................................... 3-2 Useful tools ................................................................................................................... 3-6 Software toolchains for ARM processors ...................................................................... 3-8 ARM DS-5 ................................................................................................................... 3-11 Example platforms ...................................................................................................... 3-13 Chapter 4 ARM Registers, Modes and Instruction Sets 4.1 4.2 Instruction sets .............................................................................................................. 4-2 Modes ........................................................................................................................... 4-3 ARM DEN0013C ID071612 Copyright © 2011, 2012 ARM. All rights reserved. Non-Confidential iii

Chapter 5 Chapter 6 Chapter 7 Chapter 8 Chapter 9 Chapter 10 Contents 4.3 4.4 4.5 Registers ....................................................................................................................... 4-4 Instruction pipelines ...................................................................................................... 4-7 Branch prediction ........................................................................................................ 4-10 Introduction to Assembly Language 5.1 5.2 5.3 5.4 5.5 5.6 Comparison with other assembly languages ................................................................ 5-2 Instruction sets .............................................................................................................. 5-4 Introduction to the GNU Assembler .............................................................................. 5-5 ARM tools assembly language ..................................................................................... 5-9 Interworking ................................................................................................................ 5-11 Identifying assembly code .......................................................................................... 5-12 ARM/Thumb Unified Assembly Language Instructions 6.1 6.2 6.3 6.4 6.5 6.6 6.7 6.8 Instruction set basics .................................................................................................... 6-2 Data processing operations .......................................................................................... 6-6 Multiplication operations ............................................................................................... 6-9 Memory instructions .................................................................................................... 6-10 Branches ..................................................................................................................... 6-13 Integer SIMD instructions ........................................................................................... 6-14 Saturating arithmetic ................................................................................................... 6-18 Miscellaneous instructions .......................................................................................... 6-19 Floating-Point 7.1 7.2 7.3 7.4 7.5 Floating-point basics and the IEEE-754 standard ........................................................ 7-2 VFP support in GCC ..................................................................................................... 7-8 VFP support in the ARM Compiler ................................................................................ 7-9 VFP support in Linux .................................................................................................. 7-10 Floating-point optimization .......................................................................................... 7-11 Introducing NEON 8.1 8.2 SIMD ............................................................................................................................. 8-2 NEON architecture overview ........................................................................................ 8-4 Caches 9.1 9.2 9.3 9.4 9.5 9.6 9.7 9.8 9.9 9.10 9.11 Why do caches help? ................................................................................................... 9-3 Cache drawbacks ......................................................................................................... 9-4 Memory hierarchy ......................................................................................................... 9-5 Cache architecture ........................................................................................................ 9-6 Cache policies ............................................................................................................ 9-12 Write and Fetch buffers .............................................................................................. 9-14 Cache performance and hit rate ................................................................................. 9-15 Invalidating and cleaning cache memory .................................................................... 9-16 Point of coherency and unification .............................................................................. 9-17 Level 2 cache controller .............................................................................................. 9-18 Parity and ECC in caches ........................................................................................... 9-19 Memory Management Unit 10.1 10.2 10.3 10.4 10.5 10.6 10.7 10.8 10.9 Virtual memory ............................................................................................................ 10-3 Level 1 page tables ..................................................................................................... 10-4 Level 2 page tables ..................................................................................................... 10-7 The Translation Lookaside Buffer ............................................................................. 10-10 TLB coherency .......................................................................................................... 10-11 Choice of page sizes ................................................................................................ 10-12 Memory attributes ..................................................................................................... 10-13 Multi-tasking and OS usage of page tables .............................................................. 10-16 Large Physical Address Extensions ......................................................................... 10-19 ARM DEN0013C ID071612 Copyright © 2011, 2012 ARM. All rights reserved. Non-Confidential iv

Chapter 11 Chapter 12 Chapter 13 Chapter 14 Chapter 15 Chapter 16 Chapter 17 Chapter 18 Chapter 19 Chapter 20 Chapter 21 Contents Memory Ordering 11.1 11.2 11.3 ARM memory ordering model ..................................................................................... 11-4 Memory barriers .......................................................................................................... 11-7 Cache coherency implications .................................................................................. 11-12 Exception Handling 12.1 12.2 12.3 12.4 12.5 12.6 Types of exception ...................................................................................................... 12-3 Exception mode summary .......................................................................................... 12-5 Entering an exception handler .................................................................................... 12-7 Exit from an exception handler ................................................................................... 12-8 Vector table ................................................................................................................. 12-9 Return instruction ...................................................................................................... 12-10 Interrupt Handling 13.1 13.2 External interrupt requests .......................................................................................... 13-2 Generic Interrupt Controller ........................................................................................ 13-4 Other Exception Handlers 14.1 14.2 14.3 14.4 Abort handler .............................................................................................................. 14-2 Undefined instruction handling ................................................................................... 14-3 SVC exception handling ............................................................................................. 14-4 Linux exception program flow ..................................................................................... 14-5 Boot Code 15.1 15.2 15.3 Booting a bare-metal system ...................................................................................... 15-2 Configuration .............................................................................................................. 15-6 Booting Linux .............................................................................................................. 15-7 Porting 16.1 16.2 16.3 16.4 16.5 Endianness ................................................................................................................. 16-2 Alignment .................................................................................................................... 16-5 Miscellaneous C porting issues .................................................................................. 16-7 Porting ARM assembly code to ARMv7 .................................................................... 16-10 Porting ARM code to Thumb .................................................................................... 16-11 Application Binary Interfaces 17.1 17.2 Procedure Call Standard ............................................................................................ 17-2 Mixing C and assembly code ...................................................................................... 17-7 Profiling 18.1 Profiler output ............................................................................................................. 18-3 Optimizing Code to Run on ARM Processors 19.1 19.2 19.3 Compiler optimizations ............................................................................................... 19-3 ARM memory system optimization ............................................................................. 19-8 Source code modifications ........................................................................................ 19-13 Writing NEON Code 20.1 20.2 20.3 NEON C Compiler and assembler .............................................................................. 20-2 Optimizing NEON assembler code ............................................................................. 20-7 NEON power saving ................................................................................................... 20-9 Introduction to Multi-processing 21.1 21.2 21.3 Multi-processing ARM systems .................................................................................. 21-3 Symmetric multi-processing ........................................................................................ 21-5 Asymmetric multi-processing ...................................................................................... 21-7 ARM DEN0013C ID071612 Copyright © 2011, 2012 ARM. All rights reserved. Non-Confidential v

Chapter 22 Chapter 23 Chapter 24 Chapter 25 Chapter 26 Chapter 27 Chapter 28 Chapter 29 Appendix A Appendix B Contents SMP Architectural Considerations 22.1 22.2 22.3 22.4 22.5 22.6 Cache coherency ........................................................................................................ 22-2 TLB and cache maintenance broadcast ..................................................................... 22-5 Handling interrupts in an SMP system ........................................................................ 22-6 Exclusive accesses ..................................................................................................... 22-7 Booting SMP systems ............................................................................................... 22-10 Private memory region .............................................................................................. 22-12 Parallelizing Software 23.1 23.2 23.3 23.4 Decomposition methods ............................................................................................. 23-2 Threading models ....................................................................................................... 23-4 Threading libraries ...................................................................................................... 23-5 Synchronization mechanisms in the Linux kernel ....................................................... 23-8 Issues with Parallelizing Software 24.1 24.2 24.3 Thread safety and reentrancy ..................................................................................... 24-2 Performance issues .................................................................................................... 24-3 Profiling in SMP systems ............................................................................................ 24-5 Power Management 25.1 Power and clocking ..................................................................................................... 25-2 Security 26.1 TrustZone hardware architecture ................................................................................ 26-2 Virtualization 27.1 27.2 27.3 ARMv7-A Virtualization Extensions ............................................................................ 27-3 Hypervisor exception model ....................................................................................... 27-5 Relationship between virtualization and ARM Security Extensions ............................ 27-6 Introducing big.LITTLE 28.1 28.2 28.3 28.4 big.LITTLE configuration ............................................................................................. 28-2 Structure of a big.LITTLE system ............................................................................... 28-3 Execution models in big.LITTLE ................................................................................. 28-4 big.LITTLE MP operation ............................................................................................ 28-9 Debug 29.1 29.2 29.3 29.4 29.5 ARM debug hardware ................................................................................................. 29-2 ARM trace hardware ................................................................................................... 29-4 Debug monitor ............................................................................................................ 29-7 Debugging Linux applications ..................................................................................... 29-8 DS-5 debug and trace ................................................................................................. 29-9 Instruction Summary A.1 Instruction Summary ..................................................................................................... A-2 NEON and VFP Instruction Summary B.1 B.2 B.3 B.4 B.5 B.6 B.7 B.8 NEON general data processing instructions ................................................................. B-6 NEON shift instructions ............................................................................................... B-13 NEON logical and compare operations ...................................................................... B-17 NEON arithmetic instructions ...................................................................................... B-23 NEON multiply instructions ......................................................................................... B-32 NEON load and store element and structure instructions ........................................... B-35 VFP instructions .......................................................................................................... B-41 NEON and VFP pseudo-instructions .......................................................................... B-47 ARM DEN0013C ID071612 Copyright © 2011, 2012 ARM. All rights reserved. Non-Confidential vi

Appendix C Building Linux for ARM Systems C.1 C.2 C.3 Building the Linux kernel ............................................................................................... C-2 Creating the Linux filesystem ........................................................................................ C-6 Putting it together .......................................................................................................... C-8 Contents ARM DEN0013C ID071612 Copyright © 2011, 2012 ARM. All rights reserved. Non-Confidential vii

Preface This book provides an introduction to ARM technology for programmers using ARM Cortex-A series processors that conform to the ARM ARMv7–A architecture. The v7 refers to version 7 of the architecture, while the A indicates the architecture profile that describes Application processors. This includes the Cortex-A5, Cortex-A7, Cortex-A8, Cortex-A9 and Cortex-A15 processors. The book complements rather than replaces other ARM documentation that is available for Cortex-A series processors, such as the ARM Technical Reference Manuals (TRMs) for the processors themselves, documentation for individual devices or boards and, most importantly, the ARM Architecture Reference Manual (or the “ARM ARM”). The purpose of this book is to bring together information from a wide variety of sources to provide a single guide for programmers who want to develop applications for the latest Cortex-A series of processors. We will cover hardware concepts such as caches and Memory Management Units, but only where this is valuable to the application writer. The book is intended to provide information that will be useful to both assembly language and C programmers. We will look at how complex operating systems, such as Linux, make use of ARM features, and how to take full advantage of the many advanced capabilities of the ARM processor, in particular writing software for multi-processing and using the SIMD capabilities of the device. Although much of the book is also applicable to other ARM processors, we do not explicitly cover processors that implement older versions of the Architecture. The Cortex-R series and M-series processors are mentioned but not described. Our intention is to provide an approachable introduction to the ARM architecture, covering the feature set in detail and providing practical advice on writing both C and assembly language programs to run efficiently on a Cortex-A series processor. This is not an introductory level book. We assume knowledge of the C programming language and microprocessors, but not of any ARM-specific background. In the allotted space, we cannot hope to cover every topic in detail. In some chapters, we suggest further reading (referring either to books or websites) that can give a deeper level of background to the topic in hand, but in this book we will focus on the ARM-specific detail. We do not assume the use of any particular tool chain. ARM DEN0013C ID071612 Copyright © 2011, 2012 ARM. All rights reserved. Non-Confidential viii

分享到：

赞收藏

资料库

ARM cortex a系列架构详细介绍.pdf

相关推荐

操作系统

热门标签

最新资料