logo资料库

introduction to audio analysis: a matlab approach.pdf

第1页 / 共262页
第2页 / 共262页
第3页 / 共262页
第4页 / 共262页
第5页 / 共262页
第6页 / 共262页
第7页 / 共262页
第8页 / 共262页
资料共262页,剩余部分请下载后查看
List of Tables
List of Figures
1 Introduction
1.1 The MATLAB Audio Analysis Library
1.2 Outline of Chapters
1.3 A Note on Exercises
2 Getting Familiar with Audio Signals
2.1 Sampling
2.2 Playback
2.3 Mono and Stereo Audio Signals
2.4 Reading and Writing Audio Files
2.5 Reading Audio Files in Blocks
3 Signal Transforms and Filtering Essentials
4 Audio Features
5 Audio Classification
6 Audio Segmentation
7 Audio Alignment and Temporal Modeling
8 Music Information Retrieval
Appendix A: The Matlab Audio Analysis Library
Appendix B: Audio-Related Libraries and Software
B.1 MATLAB
B.2 Python
B.3 C/C++
Appendix C: Audio Datasets
Bibliography
Introduction to AUDIO ANALYSIS
Introduction to AUDIO ANALYSIS: A MATLAB Approach THEODOROS GIANNAKOPOULOS AGGELOS PIKRAKIS Amsterdam • Boston • Heidelberg • London New York • Oxford • Paris • San Diego San Francisco • Singapore • Sydney • Tokyo Academic Press is an imprint of Elsevier
Academic Press is an imprint of Elsevier The Boulevard, Langford Lane, Kidlington, Oxford OX5 1GB, UK 225 Wyman Street, Waltham, MA 02451, USA 525 B Street, Suite 1800, San Diego, CA 92101-4495, USA First edition 2014 Copyright © 2014 Elsevier Ltd. All rights reserved. MATLAB® is a registered trademarks of The MathWorks, Inc. For MATLAB and Simulink product information, please contact: The MathWorks, Inc. 3 Apple Hill Drive Natick, MA, 01760-2098 USA Tel: 508-647-7000 Fax: 508-647-7001 E-mail: info@mathworks.com Web: mathworks.com No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means electronic, mechanical, photocopying, recording or otherwise without the prior written permission of the publisher. Permissions may be sought directly from Elsevier’s Science & Technology Rights Department in Oxford, UK: phone (+44) (0) 1865 843830; fax (+44) (0) 1865 853333; email permissions@elsevier. com. Alternatively you can submit your request online by visiting the Elsevier website at http:// elsevier.com/locate/permissions, and selecting Obtaining permission to use Elsevier material. Notice No responsibility is assumed by the publisher for any injury and/or damage to persons or property as a matter of products liability, negligence or otherwise, or from any use or operation of any methods, products, instructions or ideas contained in the material herein. Because of rapid advances in the medical sciences, in particular, independent verification of diagnoses and drug dosages should be made. British Library Cataloguing in Publication Data A catalogue record for this book is available from the British Library Library of Congress Cataloging-in-Publication Data A catalog record for this book is available from the Library of Congress ISBN: 978-0-08-099388-1 For information on all Academic Press publications visit our web site at books.elsevier.com Printed and bound in United States of America 14 15 16 17 18 10 9 8 7 6 5 4 3 2 1
PREFACE This book attempts to provide a gentle introduction to the field of audio analysis using the MATLAB programming environment as the vehicle of presentation. Audio analysis is a multidisciplinary field, which requires the reader to be familiar with concepts from diverse research disciplines, including digital signal processing and machine learning. As a result, it is a great challenge to write a book that can provide sufficient coverage of the important concepts in the field of audio analysis and, at the same time, be accessible to readers who do not necessarily possess the required scientific background. Our main goal has been to provide a standalone introduction, involv- ing a balanced presentation of theoretical descriptions and reproducible MATLAB examples. Our philosophy is that readers with diverse scientific backgrounds can gain an understanding of the field of audio analysis, if they are provided with basic theory, in conjunction with reproducible experiments that can help them deal with the theory from a more practical perspective. In addition, this type of approach allows the reader to acquire certain technical skills that are useful in the context of developing real- world audio analysis applications. To this end, we also provide an accompa- nying software library which can be downloaded from the companion site and includes the MATLAB functions and related data files that have been used throughout the text. We believe that this book is suitable for students, researchers, and professionals alike, who need to develop practical skills, along with a basic understanding of the field. The book does not assume previous knowledge of digital signal processing and machine learning concepts, as it provides introductory material for the necessary topics for both disciplines. We expect that, after reading this book, the reader will feel comfortable with various key processing stages of the audio analysis chain, including audio content creation, representation, feature extraction, classification, segmenta- tion, sequence alignment and temporal modeling. Furthermore, we believe that the study of the presented case studies will provide further insight into the development of real-world applications. This book is the product of several years of teaching and research and reflects our teaching philosophy, which has been shaped via our interaction with our students and colleagues, and to whom we are both grateful. We vii
viii Preface hope that the will prove useful to all readers who are making their first steps in the field of audio analysis. Although we have made an effort to eliminate errors during the writing stage, we encourage the reader to contact us with any comments and suggestions for improvement, in either the text or the accompanying software library. Theodoros Giannakopoulos and Aggelos Pikrakis Athens, 2013 For access to the software library and other supporting materials, please visit the companion website at: htpp://booksite.elsevier.com/9780080993881
ACKNOWLEDGMENTS This book has improved thanks to the support of a number of colleagues, students, and friends, who have provided generous feedback and construc- tive comments, during the writing process. Above all, T. Giannakopoulos would like to thank his wife, Maria, and his daughter, Eleni, for always being cheerful and supportive. A. Pikrakis would like to thank his family for their patience and generous support and dedicates this book to all the teachers who have shaped his life. ix
LIST OF TABLES Table 1.1 Table 2.1 Table 2.2 Table 4.1 Table 5.1 Table 5.2 Table 5.3 Table 5.4 Table 5.5 Table A.1 Table A.2 Table B.1 Table B.2 Table B.3 Table B.4 Table C.1 Difficulty Levels of the Exercises Execution Times for Different Loading Techniques Sound Recording Using the Data Acquisition Toolbox Class Descriptions for the Multi-Class Task of Movie Segments Classification Tasks and Files Row-Wise Normalized Confusion Matrix for the 8-Class Audio Segment Classification Task Row-Wise Normalized Confusion Matrix for the Speech vs Music Binary Classification Task Row-Wise Normalized Confusion Matrix for the 3-Class Musical Genre Classification Task Row-Wise Normalized Confusion Matrix for the Speech vs Non-Speech Classification Task List of All Functions Included in the MATLAB Audio Analysis Library Provided with the Book List of Data Files that are Available in the Library that Accompanies the Book MATLAB Libraries—Audio and Speech MATLAB Libraries—Pattern Recognition and Machine Learning A List of Python Packages and Libraries that can be Used for Audio Analysis and Pattern Recognition Applications Representative Audio Analysis and Pattern Recognition Libraries and Packages Written in C++ A Short List of Available Datasets for Selected Audio Analysis Tasks 7 20 20 69 131 144 144 146 147 233 239 242 242 244 245 247 xi
LIST OF FIGURES Figure 2.1 Figure 2.2 Figure 2.3 Figure 3.1 Figure 3.2 Figure 3.3 Figure 3.4 Figure 3.5 Figure 3.6 Figure 3.7 Figure 3.8 Figure 3.9 Figure 4.1 Figure 4.2 Figure 4.3 Figure 4.4 Figure 4.5 A synthetic audio signal. A STEREO audio signal. Short-term processing of an audio signal. Plots of the magnitude of the spectrum of a signal consisting of three frequencies at 200, 500, and 1200 Hz. A synthetic signal consisting of three frequencies is corrupted by additive noise. The spectrogram of a speech signal. Spectrograms of a synthetic, frequency-modulated signal for three short-term frame lengths. Spectrum representations of (a) an analog signal, (b) a sampled version when the sampling frequency exceeds the Nyquist rate, and (c) a sampled version with insufficient sampling frequency. In the last case, the shifted versions of the analog spectrum are overlapping, hence the aliasing effect. Spectral representations of the same three-tone (200, 500 and 3000 HZ) signal for two different sampling frequencies (8 kHz and 4 kHz). Frequency response of a pre-emphasis filter for a = −0.95. An example of the application of a lowpass filter on a synthetic signal consisting of three tones. Example of a simple speech denoising technique applied on a segment of the diarizationExample.wav file, found in the data folder of the library of the book. Mid-term feature extraction: each mid-term segment is short-term processed and statistics are computed based on the extracted feature sequence. Plotting the results of featureExtractionFile(), using plotFeaturesFile(), for the six feature statistics drawn from the 6th adopted audio feature. Histograms of the standard deviation by mean ratio ( σ 2 μ ) of the short-term energy for two classes: music and speech. Example of a speech segment and the respective sequence of ZCR values. Histograms of the standard deviation of the ZCR for music and speech classes. 12 14 26 38 40 41 42 43 44 51 53 55 63 68 72 74 75 xiii
分享到:
收藏