logo资料库

digital speech processing using matlab.pdf

第1页 / 共188页
第2页 / 共188页
第3页 / 共188页
第4页 / 共188页
第5页 / 共188页
第6页 / 共188页
第7页 / 共188页
第8页 / 共188页
资料共188页,剩余部分请下载后查看
Preface
Acknowledgments
Contents
m-Files
1 Pattern Recognition for Speech Detection
1.1 Introduction
1.2 Back-Propagation Neural Network
1.2.1 Back-Propagation Algorithm
1.2.2 ANN Illustration
1.3 Support Vector Machine
1.3.1 Dual Problem to Solve (1.25)--(1.28)
1.3.2 ``Kernel-Trick'' for Nonlinear Separation in SVM
1.3.3 Illustration for Support Vector Machine
1.4 Hidden Markov Model
1.4.1 Baum--Welch Technique to Obtain the Unknown Parameters in HMM
1.4.2 Steps to Compute the Unknown Parameters of HMM Using Expectation--Maximization Algorithm
1.4.3 Viterbi Algorithm to Compute the Generating Probability of the Arbitrary Speech Segment WORDX
1.4.4 Isolated Word Recognition Using HMM
1.4.5 Alignment Method to Model HMM
1.4.6 Illustration of Hidden Markov Model
1.5 Gaussian Mixture Model
1.5.1 Steps Involved to Model GMM Using Expectation--Maximization Algorithm
1.5.2 Isolated Word Recognition Using GMM
1.5.3 Illustration of GMM
1.6 Unsupervised Learning System
1.6.1 Need for Unsupervised Learning System
1.6.2 K-Means Algorithm
1.6.3 Illustration of k-Means Algorithm
1.6.4 Fuzzy k-Means Algorithm
1.6.5 Steps Involved in Fuzzy k-Means Clustering
1.6.6 Illustration of Fuzzy k-Means Algorithm
1.6.7 Kohonen Self-Organizing Map
1.6.8 Illustration of KSOM
1.7 Dimensionality Reduction Techniques
1.7.1 Principal Component Analysis
1.7.2 Illustration of PCA Using 2D to 1D Conversion
1.7.3 Illustration of PCA
1.7.4 Linear Discriminant Analysis
1.7.5 Small Sample Size Problem in LDA
1.7.6 Null-Space LDA
1.7.7 Kernel LDA
1.7.8 Kernel-Trick to Execute LDA in the Higher-Dimensional Space
1.7.9 Illustration of Dimensionality Reduction Using LDA
1.8 Independent Component Analysis
1.8.1 Solving ICA Bases Using Kurtosis Measurement
1.8.2 Steps to Obtain the ICA Bases
1.8.3 Illustration of Dimensionality Reduction Using ICA
2 Speech Production Model
2.1 Introduction
2.2 1-D Sound Waves
2.2.1 Physics on Sound Wave Travelling Through the Tube with Uniform Cross-Sectional Area A
2.2.2 Solution to (2.9) and (2.18)
2.3 Vocal Tract Model as the Cascade Connections of Identical Length Tubes with Different Cross-Sections
2.4 Modelling the Vocal Tract from the Speech Signal
2.4.1 Autocorrelation Method
2.4.2 Auto Covariance Method
2.5 Lattice Structure to Obtain Excitation Source for the Typical Speech Signal
2.5.1 Computation of Lattice Co-efficient from LPC Co-efficients
3 Feature Extraction of the Speech Signal
3.1 Endpoint Detection
3.2 Dynamic Time Warping
3.3 Linear Predictive Co-efficients
3.4 Poles of the Vocal Tract
3.5 Reflection Co-efficients
3.6 Log Area Ratio
3.7 Cepstrum
3.8 Line Spectral Frequencies
3.9 Mel-Frequency Cepstral Co-efficients
3.9.1 Gibbs Phenomenon
3.9.2 Discrete Cosine Transformation
3.10 Spectrogram
3.10.1 Time Resolution Versus Frequency Resolution in Spectrogram
3.11 Discrete Wavelet Transformation
3.12 Pitch Frequency Estimation
3.12.1 Autocorrelation Approach
3.12.2 Homomorphic Filtering Approach
3.13 Formant Frequency Estimation
3.13.1 Formant Extraction Using Vocal Tract Model
3.13.2 Formant Extraction Using Homomorphic Filtering
4 Speech Compression
4.1 Uniform Quantization
4.2 Nonuniform Quantization
4.3 Adaptive Quantization
4.4 Differential Pulse Code Modulation
4.4.1 Illustrations of the Prediction of Speech Signal Using lpc
4.5 Code-Excited Linear Prediction
4.5.1 Estimation of the Delay Constant D
4.5.2 Estimation of the Gain Constants G1 and G2
4.6 Assessment of the Quality of the Compressed Speech Signal
A Constrained Optimization Using Lagrangian Techniques
A.1 Constrained Optimization with Equality Constraints
A.2 Constrained Optimization with Inequality Constraints
A.3 Kuhn--Tucker Conditions
B Expectation--Maximization Algorithm
C Diagonalization of the Matrix
C.1 Positive Definite Matrix
D Condition Number
D.1 Preemphasis
E Spectral Flatness
E.1 Demonstration on Spectral Flatness
F Functional Blocks of the Vocal Tract and the Ear
F.1 Vocal Tract Model
F.2 Ear Model
About the Author
About the Book
Index
Signals and Communication Technology E. S. Gopi Digital Speech Processing Using Matlab
Signals and Communication Technology For further volumes: http://www.springer.com/series/4748
E. S. Gopi Digital Speech Processing Using Matlab 123
E. S. Gopi Electronics and Communication Engineering National Institute of Technology Trichy, Tamil Nadu India ISSN 1860-4862 ISBN 978-81-322-1676-6 DOI 10.1007/978-81-322-1677-3 Springer New Delhi Heidelberg New York Dordrecht London ISSN 1860-4870 (electronic) ISBN 978-81-322-1677-3 (eBook) Library of Congress Control Number: 2013953196 Ó Springer India 2014 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. Exempted from this legal reservation are brief excerpts in connection with reviews or scholarly analysis or material supplied specifically for the purpose of being entered and executed on a computer system, for exclusive use by the purchaser of the work. Duplication of this publication or parts thereof is permitted only under the provisions of the Copyright Law of the Publisher’s location, in its current version, and permission for use must always be obtained from Springer. Permissions for use may be obtained through RightsLink at the Copyright Clearance Center. Violations are liable to prosecution under the respective Copyright Law. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. While the advice and information in this book are believed to be true and accurate at the date of publication, neither the authors nor the editors nor the publisher can accept any legal responsibility for any errors or omissions that may be made. The publisher makes no warranty, express or implied, with respect to the material contained herein. Printed on acid-free paper Springer is part of Springer Science+Business Media (www.springer.com)
Dedicated to my wife G. Viji, my son A. G. Vasig and my daughter A. G. Desna
Preface The most of the applications of digital speech processing deal with speech or speaker pattern recognition. To understand the practical implementation of the speech or speaker recognition techniques, there is the need to understand the concepts of digital speech processing and the pattern recognition. This book aims in giving the balanced treatment of both the concepts. This book deals with speech processing concepts like speech production model, speech feature extraction, speech compression, etc., and the basic pattern recognition concepts applied to speech signals like PCA, LDA, ICA, SVM, HMM, GMM, BPN, KSOM, etc. The book is written such that it is suitable for the beginners who are doing basic research in digital speech processing. All the topics covered in this book are illustrated using Matlab in almost all the topics for better understanding. vii
Acknowledgments I would like to thank Profs. S. Soundararajan (Director, NITT, Trichy), M. Chidambaram (IITM, Chennai), K. M. M. Prabhu (IITM, Chennai), P. Palanisamy, P. Somaskandan, B. Venkataramani, and S. Raghavan (NITT, Trichy) for their support. I would also like to thank those who helped directly or indirectly in bringing out this book successfully. Special thanks to my parents Mr. E. Sankara Subbu and Mrs. E. S. Meena. Thanks E. S. Gopi ix
Contents 1.1 1.2 1 Pattern Recognition for Speech Detection . . . . . . . . . . . . . . . . . . . Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Back-propagation Neural Network . . . . . . . . . . . . . . . . . . . . . Back-propagation Algorithm . . . . . . . . . . . . . . . . . . . 1.2.1 1.2.2 ANN Illustration . . . . . . . . . . . . . . . . . . . . . . . . . . . Support Vector Machine . . . . . . . . . . . . . . . . . . . . . . . . . . . . Dual Problem to Solve (1.25)–(1.28) . . . . . . . . . . . . . 1.3.1 ‘‘Kernel-Trick’’ for Nonlinear Separation in SVM . . . . 1.3.2 1.3.3 Illustration for Support Vector Machine . . . . . . . . . . . Hidden Markov Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4.1 1.3 1.4 1.4.3 1.4.2 Baum–Welch Technique to Obtain the Unknown Parameters in HMM . . . . . . . . . . . . . . . . . . . . . . . . . Steps to Compute the Unknown Parameters of HMM Using Expectation–Maximization Algorithm . . . . . . . . Viterbi Algorithm to Compute the Generating Probability of the Arbitrary Speech Segment. . . . . . . . 1.4.4 Isolated Word Recognition Using HMM . . . . . . . . . . . 1.4.5 Alignment Method to Model HMM . . . . . . . . . . . . . . 1.4.6 Illustration of Hidden Markov Model . . . . . . . . . . . . . Gaussian Mixture Model. . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.5.1 Steps Involved to Model GMM Using Expectation–Maximization Algorithm. . . . . . . . . . . . . Isolated Word Recognition Using GMM . . . . . . . . . . . 1.5.2 1.5.3 Illustration of GMM . . . . . . . . . . . . . . . . . . . . . . . . . Unsupervised Learning System . . . . . . . . . . . . . . . . . . . . . . . Need for Unsupervised Learning System . . . . . . . . . . 1.6.1 1.6.2 k-Means Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . Illustration of k-Means Algorithm . . . . . . . . . . . . . . . 1.6.3 Fuzzy k-Means Algorithm. . . . . . . . . . . . . . . . . . . . . 1.6.4 Steps Involved in Fuzzy k-Means Clustering . . . . . . . . 1.6.5 1.6.6 Illustration of Fuzzy k-Means Algorithm . . . . . . . . . . Kohonen Self-Organizing Map . . . . . . . . . . . . . . . . . 1.6.7 1.6.8 Illustration of KSOM . . . . . . . . . . . . . . . . . . . . . . . . 1.5 1.6 1 1 3 5 7 10 14 15 16 23 26 28 29 30 31 31 37 39 39 40 43 43 43 44 44 46 46 48 51 xi
分享到:
收藏