Springer Handbook of Speech Processing.pdf-资料库

b1679841-b6f0-4dc9-a34a-07739ce461c7.pdf-第1页.png

第1页 / 共1188页

b1679841-b6f0-4dc9-a34a-07739ce461c7.pdf-第2页.png

第2页 / 共1188页

b1679841-b6f0-4dc9-a34a-07739ce461c7.pdf-第3页.png

第3页 / 共1188页

b1679841-b6f0-4dc9-a34a-07739ce461c7.pdf-第4页.png

第4页 / 共1188页

b1679841-b6f0-4dc9-a34a-07739ce461c7.pdf-第5页.png

第5页 / 共1188页

b1679841-b6f0-4dc9-a34a-07739ce461c7.pdf-第6页.png

第6页 / 共1188页

b1679841-b6f0-4dc9-a34a-07739ce461c7.pdf-第7页.png

第7页 / 共1188页

b1679841-b6f0-4dc9-a34a-07739ce461c7.pdf-第8页.png

第8页 / 共1188页

Springer Handbook of Speech Processing

Springer Handbooks provide a concise compilation of approved key information on methods of research, general principles, and functional relationships in physi- cal sciences and engineering. The world’s leading experts in the ﬁelds of physics and engineering will be assigned by one or several renowned editors to write the chapters com- prising each volume. The content is selected by these experts from Springer sources (books, journals, online content) and other systematic and approved recent publications of physical and technical information. The volumes are designed to be useful as readable desk reference books to give a fast and comprehen- sive overview and easy retrieval of essential reliable key information, including tables, graphs, and bibli- ographies. References to extensive sources are provided.

Springer Handbook of Speech Processing Jacob Benesty, M. Mohan Sondhi, Yiteng Huang (Eds.) With DVD-ROM, 456 Figures and 113 Tables 123

Editors: Jacob Benesty INRS-EMT, University of Quebec 800 de la Gauchetiere Ouest, Suite 6900 Montreal, Quebec, H5A 1K6, Canada benesty@emt.inrs.ca M. Mohan Sondhi Avayalabs Research 233 Mount Airy Road Basking Ridge, NJ 07920, USA mms@research.avayalabs.com Yiteng Huang Bell Laboratories, Alcatel-Lucent 600 Mountain Avenue Murray Hill, NJ 07974, USA arden_huang@ieee.org Library of Congress Control Number: 2007931999 e-ISBN: 978-3-540-49127-9 ISBN: 978-3-540-49125-5 This work is subject to copyright. All rights reserved, whether the whole or part of the material is concerned, speciﬁcally the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microﬁlm or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September, 9, 1965, in its current version, and permission for use must always be obtained from Springer-Verlag. Violations are liable for prosecution under the German Copyright Law. Springer is a part of Springer Science+Business Media springer.com c⃝ Springer-Verlag Berlin Heidelberg 2008 The use of designations, trademarks, etc. in this publication does not imply, even in the absence of a speciﬁc statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. Product liability: The publisher cannot guarantee the accuracy of any information about dosage and application contained in this book. In every individual case the user must check such information by consulting the relevant literature. Typesetting and production: LE-TEX Jelonek, Schmidt&Vöckler GbR, Leipzig Senior Manager Springer Handbook: Dr. W. Skolaut, Heidelberg Typography and layout: schreiberVIS, Seeheim Illustrations: Hippmann GbR, Schwarzenbruck Cover design: eStudio Calamar Steinen, Barcelona Cover production: WMXDesign GmbH, Heidelberg Printing and binding: Stürtz GmbH, Würzburg Printed on acid free paper SPIN 11544036 60/3180/YL 5 4 3 2 1 0

Foreword V J. L. Flanagan Professor Emeritus Electrical and Computer Engineering Rutgers University Over the past three decades digital signal processing has emerged as a recognized discipline. Much of the impetus for this advance stems from research in representation, coding, transmission, storage and reproduction of speech and image information. In particular, interest in voice communication has stimulated central contributions to digital ﬁltering and discrete-time spectral transforms. This dynamic development was built upon the convergence of three then-evolving technologies: (i) sampled-data theory and representation of information signals (which led directly to digital telecommunication that provides signal quality independent of transmission distance); (ii) electronic binary computation (aided in early implementa- tion by pulse-circuit techniques from radar design); and, (iii) invention of solid-state devices for exquisite control of electronic current (transistors – which now, through mi- croelectronic materials, scale to systems of enormous size and complexity). This timely convergence was soon followed by optical ﬁber methods for broadband information transport. These advances impact an important aspect of human activity – information ex- change. And, over man’s existence, speech has played a principal role in human communication. Now, speech is playing an increasing role in human interaction with complex information systems. Automatic services of great variety exploit the comfort of voice exchange, and, in the corporate sector, sophisticated audio/video teleconfer- encing is reducing the necessity of expensive, time-consuming business travel. In each instance an overarching target is a user environment that captures some of the nat- uralness and spatial realism of face-to-face communication. Again, speech is a core element, and new understanding from diverse research sectors can be brought to bear. Editors-in-Chief Benesty, Sondhi and Huang have organized a timely engineer- ing handbook to answer this need. They have assembled a remarkable compendium of current knowledge in speech processing. And, this accumulated understanding can be focused upon enlarging the human capacity to deal with a world ever increasing in complexity. Benesty, Sondhi and Huang are renowned researchers in their own right, and they have attracted an international cadre of over 80 fellow authors and collab- orators who constitute a veritable Who’s Who of world leaders in speech processing research. The resulting book provides under one cover authoritative treatments that commence with the basic physics and psychophysics of speech and hearing, and range through the related topics of computational tools, coding, synthesis, recognition, and signal enhancement, concluding with discussions on capture and projection of sound in enclosures. The book can be expected to become a valuable resource for researchers, engineers and speech scientists throughout the global community. It should equally serve teachers and students in human communication, especially delimiting knowledge frontiers where graduate thesis research may be appropriate. Warren, New Jersey October 2007 Jim Flanagan

Preface VII The achievement of this Springer Handbook is the result of a wonderful journey that started in March 2005 at the 30th International Conference on Acoustics, Speech, and Signal Processing (ICASSP). Two of the editors-in-chief (Benesty and Huang) met in one of the long corridors of the Pennsylvania Convention Center in Philadelphia with Dr Dieter Merkle from Springer. Together we had a very nice discussion about the con- ference and immediately an idea came up for a handbook. After a short discussion we converged without too much hesitation on a handbook of speech processing. It was quite surprising to see that, even after 30 years of ICASSP and more than half a century of research in this fundamental area, there was still no major book summarizing the im- portant aspects of speech processing. We thought that the time was ripe for such a large project. Soon after we got home, a third editor-in-chief (Sondhi) joined the efforts. We had a very clear objective in our minds: to summarize, in a reasonable number of pages, the most important and useful aspects of speech processing. The content was then organized accordingly. This task was not easy since we had to ﬁnd a good balance between feasible ideas and new trends. As we all know, practical ideas can be viewed as old stuff while emerging ideas can be criticized for not having passed the test of time; we hope that we have succeeded in ﬁnding a good compromise. For this we relied on many authors who are well established and are recognized as experts in their ﬁeld, from all over the world, and from academia as well as from industry. From simple consumer products such as cell phones and MP3 players to more- sophisticated projects such as human–machine interfaces and robots that can obey orders, speech technologies are now everywhere. We believe that it is just a matter of time before more applications of the science of speech become impossible to miss in our daily life. So we believe that this Springer Handbook will play a fundamental role in the sustainable progress of speech research and development. This handbook is targeted at three categories of readers: graduate students of speech processing, professors and researchers in academia and research labs who are active in this ﬁeld, and engineers in industry who need to understand or implement speciﬁc algorithms for their speech-related products. The handbook could also be used as a text for one or more graduate courses on signal processing for speech and various aspects of speech processing and applications. For the completion of such an ambitious project we have many people to thank. First, we would like to thank the many authors who did a terriﬁc job in delivering very high-quality chapters. Second, we are very grateful to the members of the editorial board who helped us so much in organizing the content and structure of this book, tak- ing part in all phases of this project from conception to completion. Third, we would like to thank all the reviewers, who helped us to improve the quality of the mater- ial. Last, but not least, we would like to thank the Springer team for their availability and very professional work. In particular, we appreciated the help of Dieter Merkle, Christoph Baumann, Werner Skolaut, Petra Jantzen, and Claudia Rau. We hope this Springer Handbook will inspire many great minds to ﬁnd new research ideas or to implement algorithms in products. Montreal, Basking Ridge, Murray Hill October 2007 Jacob Benesty M. Mohan Sondhi Yiteng Huang Jacob Benesty M. Mohan Sondhi Yiteng Huang

IX List of Editors Editors-in-Chief Jacob Benesty, Montreal M. Mohan Sondhi, Basking Ridge Yiteng (Arden) Huang, Murray Hill Part Editors Part A: Production, Perception, and Modeling of Speech M. M. Sondhi, Basking Ridge Part B: Signal Processing for Speech Y. Huang, Murray Hill; J. Benesty, Montreal Part C: Speech Coding W. B. Kleijn, Stockholm Part D: Text-to-Speech Synthesis S. Narayanan, Los Angeles Part E: Speech Recognition L. Rabiner, Piscataway; B.-H. Juang, Atlanta Part F: Speaker Recognition S. Parthasarathy, Sunnyvale Part G: Language Recognition C.-H. Lee, Atlanta Part H: Speech Enhancement J. Chen, Murray Hill; S. Gannot, Ramat-Gan; J. Benesty, Montreal Part I: Multichannel Speech Processing J. Benesty, Montreal; I. Cohen, Haifa; Y. Huang, Murray Hill

XI List of Authors Alex Acero Microsoft Research One Microsoft Way Redmond, WA 98052, USA e-mail: alexac@microsoft.com Jont B. Allen University of Illinois ECE Urbana, IL 61801, USA e-mail: JontAllen@ieee.org Jacob Benesty University of Quebec INRS-EMT 800 de la Gauchetiere Ouest Montreal, Quebec H5A 1K6, Canada e-mail: benesty@emt.inrs.ca Frédéric Bimbot IRISA (CNRS & INRIA) - METISS Pièce C 320 - Campus Universitaire de Beaulieu 35042 Rennes, France e-mail: bimbot@irisa.fr Thomas Brand Carl von Ossietzky Universität Oldenburg Sektion Medizinphysik Haus des Hörens, Marie-Curie-Str. 2 26121 Oldenburg, Germany e-mail: thomas.brand@uni-oldenburg.de Nick Campbell Knowledge Creating Communication Research Centre Acoustics & Speech Research Project, Spoken Language Communication Group 2-2-2 Hikaridai 619-0288 Keihanna Science City, Japan e-mail: nick@nict.go.jp William M. Campbell MIT Lincoln Laboratory Information Systems Technology Group 244 Wood Street Lexington, MA 02420-9108, USA e-mail: wcampbell@ll.mit.edu Rolf Carlson Royal Institute of Technology (KTH) Department of Speech, Music and Hearing Lindstedtsvägen 24 10044 Stockholm, Sweden e-mail: rolf@speech.kth.se Jingdong Chen Bell Laboratories Alcatel-Lucent 600 Mountain Ave Murray Hill, NJ 07974, USA e-mail: jingdong@research.bell-labs.com Juin-Hwey Chen Broadcom Corp. 5300 California Avenue Irvine, CA 92617, USA e-mail: rchen@broadcom.com Israel Cohen Technion–Israel Institute of Technology Department of Electrical Engineering Technion City Haifa 32000, Israel e-mail: icohen@ee.technion.ac.il Jordan Cohen SRI International 300 Ravenswood Drive Menlo Park, CA 94019, USA e-mail: jrc@speech.sri.com Corinna Cortes Google, Inc. Google Research 76 9th Avenue, 4th Floor New York, NY 10011, USA e-mail: corinna@google.com Eric J. Diethorn Avaya Labs Research Multimedia Technologies Research Department 233 Mt. Airy Road Basking Ridge, NJ 07920, USA e-mail: ejd@avaya.com

资料库

Springer Handbook of Speech Processing.pdf

相关推荐

音视频

热门标签

最新资料