logo资料库

Multicore DSP_From Algorithms to Real-time Implementation.pdf

第1页 / 共645页
第2页 / 共645页
第3页 / 共645页
第4页 / 共645页
第5页 / 共645页
第6页 / 共645页
第7页 / 共645页
第8页 / 共645页
资料共645页,剩余部分请下载后查看
fmatter
ch1
ch2
ch3
ch4
ch5
ch6
ch7
ch8
ch9
ch10
ch11
ch12
ch13
ch14
ch15
ch16
ch17
ch18
ch19
ch20
index
Multicore DSP
Multicore DSP From Algorithms to Real-time Implementation on the TMS320C66x SoC Naim Dahnoun University of Bristol UK
This edition first published 2018 © 2018 John Wiley & Sons Ltd All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, except as permitted by law. Advice on how to obtain permission to reuse material from this title is available at http://www.wiley.com/go/permissions. The right of Naim Dahnoun to be identified as the author of this work has been asserted in accordance with law. Registered Office(s) John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, USA John Wiley & Sons Ltd, The Atrium, Southern Gate, Chichester, West Sussex, PO19 8SQ, UK Editorial Office The Atrium, Southern Gate, Chichester, West Sussex, PO19 8SQ, UK For details of our global editorial offices, customer services, and more information about Wiley products visit us at www.wiley.com. Wiley also publishes its books in a variety of electronic formats and by print-on-demand. Some content that appears in standard print versions of this book may not be available in other formats. Limit of Liability/Disclaimer of Warranty While the publisher and authors have used their best efforts in preparing this work, they make no representations or warranties with respect to the accuracy or completeness of the contents of this work and specifically disclaim all warranties, including without limitation any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives, written sales materials or promotional statements for this work. The fact that an organization, website, or product is referred to in this work as a citation and/or potential source of further information does not mean that the publisher and authors endorse the information or services the organization, website, or product may provide or recommendations it may make. This work is sold with the understanding that the publisher is not engaged in rendering professional services. The advice and strategies contained herein may not be suitable for your situation. You should consult with a specialist where appropriate. Further, readers should be aware that websites listed in this work may have changed or disappeared between when this work was written and when it is read. Neither the publisher nor authors shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages. Library of Congress Cataloging-in-Publication data applied for ISBN: 9781119003823 Cover design by Wiley Cover image: © matejmo/Gettyimages Set in 10/12pt Warnock by SPi Global, Pondicherry, India 10 9 8 7 6 5 4 3 2 1
I dedicate this book to my children Zahra, Yasmin and Riyad and in memory of my parents
vii 3 6 9 Contents Preface xviii Acknowledgements xxi Foreword xxii About the Companion Website xxiii 1 1.1 1.2 1.2.1 1.2.2 1.3 1.4 1.5 1.6 1.7 3 Introduction to DSP 1 Introduction 1 Multicore processors Can any algorithm benefit from a multicore processor? How many cores do I need for my application? Key applications of high-performance multicore devices 8 FPGAs, Multicore DSPs, GPUs and Multicore CPUs Challenges faced for programming a multicore processor Texas Instruments DSP roadmap 10 Conclusion 11 12 References 5 Functional units 21 Register file A and file B 20 The TMS320C66x architecture overview 14 Overview 14 The CPU 15 Cross paths 2 2.1 2.2 16 2.2.1 2.2.1.1 Data cross paths 2.2.1.2 Address cross paths 2.2.2 2.2.2.1 Operands 20 2.2.3 2.2.3.1 Condition registers 21 2.2.3.2 .L units 2.2.3.3 .M units 2.2.3.4 .S units 2.2.3.5 .D units 2.3 2.3.1 2.4 2.4.1 2.4.2 Memory protection and extension 29 2.4.3 Memory throughput Single instruction, multiple data (SIMD) instructions Control registers 24 24 The KeyStone memory Using the internal memory 27 22 22 23 23 17 18 29 24
viii Contents 2.5 2.5.1 2.5.2 2.5.3 2.5.4 2.5.5 2.6 30 32 Peripherals Navigator Enhanced Direct Memory Access (EDMA) Controller Universal Asynchronous Receiver/Transmitter (UART) General purpose input–output (GPIO) Internal timers Conclusion 33 33 References 32 32 32 32 37 38 39 Software development tools and the TMS320C6678 EVM 35 Introduction 35 Software development tools Compiler Assembler 40 Linker 3 3.1 3.2 3.2.1 3.2.2 3.2.3 3.2.3.1 Linker command file 3.2.4 3.2.5 3.2.5.1 Platform update using the XDCtools 42 3.2.6 3.3 3.3.1 3.4 KeyStone Multicore Software Development Kit Hardware development tools EVM features Laboratory experiments based on the C6678 EVM: introduction 51 to Code Composer Studio (CCS) Software and hardware requirements Compile, assemble and link Using the Real-Time Software Components (RTSC) tools 40 42 42 47 47 47 51 53 52 Laboratory experiments with the CCS6 3.4.1 3.4.1.1 Key features 3.4.1.2 Download sites 3.4.2 3.4.2.1 Introduction to CCS 55 3.4.2.2 Implementation of a DOTP algorithm 63 3.4.3 3.4.4 3.5 3.6 Profiling using the clock Considerations when measuring time Loading different applications to different cores Conclusion 72 72 References 53 65 67 67 76 75 Numerical issues 74 Introduction 74 Fixed- and floating-point representations Fixed-point arithmetic 76 4 4.1 4.2 4.2.1 4.2.1.1 Unsigned integer 77 4.2.1.2 Signed integer 4.2.1.3 Fractional numbers 4.2.2 4.2.2.1 Special numbers for the 32-bit and 64-bit floating-point formats 4.3 4.4 4.5 77 Floating-point arithmetic Dynamic range and accuracy Laboratory exercise Conclusion 85 85 References 78 82 83 81
Contents ix 88 91 91 90 92 88 98 99 Software optimisation 86 5 Introduction 86 5.1 Hindrance to software scalability for a multicore processor 5.2 Single-core code optimisation procedure 5.3 The C compiler options 5.3.1 Interfacing C with intrinsics, linear assembly and assembly 5.4 Intrinsics 5.4.1 Interfacing C and assembly 5.4.2 Assembly optimisation 97 5.5 Parallel instructions 5.5.1 Removing the NOPs 5.5.2 99 Loop unrolling 5.5.3 Double-Word Access 100 5.5.4 Optimisation summary 5.5.5 Software pipelining 101 5.6 105 Software-pipelining procedure 5.6.1 105 5.6.1.1 Writing linear assembly code 5.6.1.2 Creating a dependency graph 105 5.6.1.3 Resource allocation 108 5.6.1.4 Scheduling table 5.6.1.5 Generating assembly code 5.7 5.7.1 5.8 5.9 5.10 5.11 Linear assembly Hand optimisation of the dotp function using linear assembly Avoiding memory banks Optimisation using the tools 123 Laboratory experiments Conclusion 126 126 References 100 108 109 111 118 118 112 6 6.1 6.1.1 6.2 6.3 6.3.1 6.3.2 6.4 129 The TMS320C66x interrupts 127 Introduction 127 Chip-level interrupt controller 135 The interrupt controller 140 Laboratory experiment Experiment 1: Using the GIPIOs to trigger some functions Experiment 2: Using the console to trigger an interrupt 140 Conclusion 143 144 References 140 Real-time operating system: TI-RTOS 145 Introduction 146 TI-RTOS 146 148 Real-time scheduling Hardware interrupts (Hwis) 148 7 7.1 7.2 7.3 7.3.1 149 7.3.1.1 Setting an Hwi 7.3.1.2 Hwi hook functions 7.3.2 7.3.3 7.3.3.1 Task hook functions 155 157 149 Software interrupts (Swis), including clock, periodic or single-shot functions Tasks 155
x Contents 158 159 163 163 159 159 Idle functions 158 Clock functions 158 Timer functions Synchronisation 158 Events Summary Dynamic memory management Stack allocation 165 Heap allocation 165 Heap implementation 165 7.3.4 7.3.5 7.3.6 7.3.7 7.3.7.1 Semaphores 7.3.7.2 Semaphore_pend 159 159 7.3.7.3 Semaphore_post 7.3.7.4 How to configure the semaphores 7.3.8 7.3.9 7.4 7.4.1 7.4.2 7.4.3 7.4.3.1 HeapMin implementation 165 7.4.3.2 HeapMem implementation 165 7.4.3.3 HeapBuf implementation 167 7.4.3.4 HeapMultiBuf implementation 171 7.5 7.5.1 7.5.2 7.5.3 7.5.4 7.5.5 7.6 Laboratory experiments 172 Lab 1: Manual setup of the clock (part 1) Lab 2: Manual setup of the clock (part 2) Lab 3: Using Hwis, Swis, tasks and clocks Lab 4: Using events Lab 5: Using the heaps Conclusion 190 191 References References (further reading) 187 189 191 172 172 174 Enhanced Direct Memory Access (EDMA3) controller 192 Introduction 192 193 Type of DMAs available EDMA controllers architecture 194 The EDMA3 Channel Controller (EDMA3CC) The EDMA3 transfer controller (EDMA3TC) 201 EDMA prioritisation 201 194 202 8 8.1 8.2 8.3 8.3.1 8.3.2 8.3.3 8.3.3.1 Trigger source priority 203 8.3.3.2 Channel priority 203 8.3.3.3 Dequeue priority 8.3.3.4 System (transfer controller) priority 8.4 8.4.1 8.5 8.5.1 8.5.2 8.6 8.7 8.8 8.9 Parameter RAM (PaRAM) Channel options parameter (OPT) Transfer synchronisation dimensions A – Synchronisation 204 AB – Synchronisation 204 Simple EDMA transfer 204 Chaining EDMA transfers Linked EDMAs Laboratory experiments 210 203 208 208 203 203 203
分享到:
收藏