Springer Handbook
of Speech Processing
Springer Handbooks provide
a concise compilation of approved
key information on methods of
research, general principles, and
functional relationships in physi-
cal sciences and engineering. The
world’s leading experts in the fields
of physics and engineering will be
assigned by one or several renowned
editors to write the chapters com-
prising each volume. The content
is selected by these experts from
Springer sources (books, journals,
online content) and other systematic
and approved recent publications of
physical and technical information.
The volumes are designed to be
useful as readable desk reference
books to give a fast and comprehen-
sive overview and easy retrieval of
essential reliable key information,
including tables, graphs, and bibli-
ographies. References to extensive
sources are provided.
Springer
Handbook
of Speech Processing
Jacob Benesty, M. Mohan Sondhi, Yiteng Huang
(Eds.)
With DVD-ROM, 456 Figures and 113 Tables
123
Editors:
Jacob Benesty
INRS-EMT, University of Quebec
800 de la Gauchetiere Ouest, Suite 6900
Montreal, Quebec, H5A 1K6, Canada
benesty@emt.inrs.ca
M. Mohan Sondhi
Avayalabs Research
233 Mount Airy Road
Basking Ridge, NJ 07920, USA
mms@research.avayalabs.com
Yiteng Huang
Bell Laboratories, Alcatel-Lucent
600 Mountain Avenue
Murray Hill, NJ 07974, USA
arden_huang@ieee.org
Library of Congress Control Number:
2007931999
e-ISBN: 978-3-540-49127-9
ISBN: 978-3-540-49125-5
This work is subject to copyright. All rights reserved, whether the whole
or part of the material is concerned, specifically the rights of translation,
reprinting, reuse of illustrations, recitation, broadcasting, reproduction on
microfilm or in any other way, and storage in data banks. Duplication of
this publication or parts thereof is permitted only under the provisions of
the German Copyright Law of September, 9, 1965, in its current version,
and permission for use must always be obtained from Springer-Verlag.
Violations are liable for prosecution under the German Copyright Law.
Springer is a part of Springer Science+Business Media
springer.com
c⃝ Springer-Verlag Berlin Heidelberg 2008
The use of designations, trademarks, etc. in this publication does not imply,
even in the absence of a specific statement, that such names are exempt from
the relevant protective laws and regulations and therefore free for general
use.
Product liability: The publisher cannot guarantee the accuracy of any
information about dosage and application contained in this book. In every
individual case the user must check such information by consulting the
relevant literature.
Typesetting and production:
LE-TEX Jelonek, Schmidt&Vöckler GbR, Leipzig
Senior Manager Springer Handbook: Dr. W. Skolaut, Heidelberg
Typography and layout: schreiberVIS, Seeheim
Illustrations: Hippmann GbR, Schwarzenbruck
Cover design: eStudio Calamar Steinen, Barcelona
Cover production: WMXDesign GmbH, Heidelberg
Printing and binding: Stürtz GmbH, Würzburg
Printed on acid free paper
SPIN 11544036
60/3180/YL
5 4 3 2 1 0
Foreword
V
J. L. Flanagan
Professor Emeritus
Electrical and Computer
Engineering
Rutgers University
Over the past three decades digital signal processing has emerged as a recognized
discipline. Much of the impetus for this advance stems from research in representation,
coding, transmission, storage and reproduction of speech and image information. In
particular, interest in voice communication has stimulated central contributions to
digital filtering and discrete-time spectral transforms.
This dynamic development was built upon the convergence of three then-evolving
technologies: (i) sampled-data theory and representation of information signals (which
led directly to digital telecommunication that provides signal quality independent of
transmission distance); (ii) electronic binary computation (aided in early implementa-
tion by pulse-circuit techniques from radar design); and, (iii) invention of solid-state
devices for exquisite control of electronic current (transistors – which now, through mi-
croelectronic materials, scale to systems of enormous size and complexity). This timely
convergence was soon followed by optical fiber methods for broadband information
transport.
These advances impact an important aspect of human activity – information ex-
change. And, over man’s existence, speech has played a principal role in human
communication. Now, speech is playing an increasing role in human interaction with
complex information systems. Automatic services of great variety exploit the comfort
of voice exchange, and, in the corporate sector, sophisticated audio/video teleconfer-
encing is reducing the necessity of expensive, time-consuming business travel. In each
instance an overarching target is a user environment that captures some of the nat-
uralness and spatial realism of face-to-face communication. Again, speech is a core
element, and new understanding from diverse research sectors can be brought to bear.
Editors-in-Chief Benesty, Sondhi and Huang have organized a timely engineer-
ing handbook to answer this need. They have assembled a remarkable compendium
of current knowledge in speech processing. And, this accumulated understanding can
be focused upon enlarging the human capacity to deal with a world ever increasing in
complexity. Benesty, Sondhi and Huang are renowned researchers in their own right,
and they have attracted an international cadre of over 80 fellow authors and collab-
orators who constitute a veritable Who’s Who of world leaders in speech processing
research. The resulting book provides under one cover authoritative treatments that
commence with the basic physics and psychophysics of speech and hearing, and range
through the related topics of computational tools, coding, synthesis, recognition, and
signal enhancement, concluding with discussions on capture and projection of sound
in enclosures. The book can be expected to become a valuable resource for researchers,
engineers and speech scientists throughout the global community. It should equally
serve teachers and students in human communication, especially delimiting knowledge
frontiers where graduate thesis research may be appropriate.
Warren, New Jersey
October 2007
Jim Flanagan
Preface
VII
The achievement of this Springer Handbook is the result of a wonderful journey that
started in March 2005 at the 30th International Conference on Acoustics, Speech, and
Signal Processing (ICASSP). Two of the editors-in-chief (Benesty and Huang) met in
one of the long corridors of the Pennsylvania Convention Center in Philadelphia with
Dr Dieter Merkle from Springer. Together we had a very nice discussion about the con-
ference and immediately an idea came up for a handbook. After a short discussion we
converged without too much hesitation on a handbook of speech processing. It was
quite surprising to see that, even after 30 years of ICASSP and more than half a century
of research in this fundamental area, there was still no major book summarizing the im-
portant aspects of speech processing. We thought that the time was ripe for such a large
project. Soon after we got home, a third editor-in-chief (Sondhi) joined the efforts.
We had a very clear objective in our minds: to summarize, in a reasonable number
of pages, the most important and useful aspects of speech processing. The content was
then organized accordingly. This task was not easy since we had to find a good balance
between feasible ideas and new trends. As we all know, practical ideas can be viewed
as old stuff while emerging ideas can be criticized for not having passed the test of
time; we hope that we have succeeded in finding a good compromise. For this we relied
on many authors who are well established and are recognized as experts in their field,
from all over the world, and from academia as well as from industry.
From simple consumer products such as cell phones and MP3 players to more-
sophisticated projects such as human–machine interfaces and robots that can obey
orders, speech technologies are now everywhere. We believe that it is just a matter of
time before more applications of the science of speech become impossible to miss in
our daily life. So we believe that this Springer Handbook will play a fundamental role
in the sustainable progress of speech research and development.
This handbook is targeted at three categories of readers: graduate students of speech
processing, professors and researchers in academia and research labs who are active
in this field, and engineers in industry who need to understand or implement specific
algorithms for their speech-related products. The handbook could also be used as a text
for one or more graduate courses on signal processing for speech and various aspects
of speech processing and applications.
For the completion of such an ambitious project we have many people to thank.
First, we would like to thank the many authors who did a terrific job in delivering very
high-quality chapters. Second, we are very grateful to the members of the editorial
board who helped us so much in organizing the content and structure of this book, tak-
ing part in all phases of this project from conception to completion. Third, we would
like to thank all the reviewers, who helped us to improve the quality of the mater-
ial. Last, but not least, we would like to thank the Springer team for their availability
and very professional work. In particular, we appreciated the help of Dieter Merkle,
Christoph Baumann, Werner Skolaut, Petra Jantzen, and Claudia Rau.
We hope this Springer Handbook will inspire many great minds to find new research
ideas or to implement algorithms in products.
Montreal, Basking Ridge, Murray Hill
October 2007
Jacob Benesty
M. Mohan Sondhi
Yiteng Huang
Jacob Benesty
M. Mohan Sondhi
Yiteng Huang
IX
List of Editors
Editors-in-Chief
Jacob Benesty, Montreal
M. Mohan Sondhi, Basking Ridge
Yiteng (Arden) Huang, Murray Hill
Part Editors
Part A: Production, Perception, and Modeling of Speech
M. M. Sondhi, Basking Ridge
Part B: Signal Processing for Speech
Y. Huang, Murray Hill; J. Benesty, Montreal
Part C: Speech Coding
W. B. Kleijn, Stockholm
Part D: Text-to-Speech Synthesis
S. Narayanan, Los Angeles
Part E: Speech Recognition
L. Rabiner, Piscataway; B.-H. Juang, Atlanta
Part F: Speaker Recognition
S. Parthasarathy, Sunnyvale
Part G: Language Recognition
C.-H. Lee, Atlanta
Part H: Speech Enhancement
J. Chen, Murray Hill; S. Gannot, Ramat-Gan; J. Benesty, Montreal
Part I: Multichannel Speech Processing
J. Benesty, Montreal; I. Cohen, Haifa; Y. Huang, Murray Hill
XI
List of Authors
Alex Acero
Microsoft Research
One Microsoft Way
Redmond, WA 98052, USA
e-mail: alexac@microsoft.com
Jont B. Allen
University of Illinois
ECE
Urbana, IL 61801, USA
e-mail: JontAllen@ieee.org
Jacob Benesty
University of Quebec
INRS-EMT
800 de la Gauchetiere Ouest
Montreal, Quebec H5A 1K6, Canada
e-mail: benesty@emt.inrs.ca
Frédéric Bimbot
IRISA (CNRS & INRIA) - METISS
Pièce C 320 - Campus Universitaire de Beaulieu
35042 Rennes, France
e-mail: bimbot@irisa.fr
Thomas Brand
Carl von Ossietzky Universität Oldenburg
Sektion Medizinphysik
Haus des Hörens, Marie-Curie-Str. 2
26121 Oldenburg, Germany
e-mail: thomas.brand@uni-oldenburg.de
Nick Campbell
Knowledge Creating Communication Research
Centre
Acoustics & Speech Research Project, Spoken
Language Communication Group
2-2-2 Hikaridai
619-0288 Keihanna Science City, Japan
e-mail: nick@nict.go.jp
William M. Campbell
MIT Lincoln Laboratory
Information Systems Technology Group
244 Wood Street
Lexington, MA 02420-9108, USA
e-mail: wcampbell@ll.mit.edu
Rolf Carlson
Royal Institute of Technology (KTH)
Department of Speech, Music and Hearing
Lindstedtsvägen 24
10044 Stockholm, Sweden
e-mail: rolf@speech.kth.se
Jingdong Chen
Bell Laboratories
Alcatel-Lucent
600 Mountain Ave
Murray Hill, NJ 07974, USA
e-mail: jingdong@research.bell-labs.com
Juin-Hwey Chen
Broadcom Corp.
5300 California Avenue
Irvine, CA 92617, USA
e-mail: rchen@broadcom.com
Israel Cohen
Technion–Israel Institute of Technology
Department of Electrical Engineering
Technion City
Haifa 32000, Israel
e-mail: icohen@ee.technion.ac.il
Jordan Cohen
SRI International
300 Ravenswood Drive
Menlo Park, CA 94019, USA
e-mail: jrc@speech.sri.com
Corinna Cortes
Google, Inc.
Google Research
76 9th Avenue, 4th Floor
New York, NY 10011, USA
e-mail: corinna@google.com
Eric J. Diethorn
Avaya Labs Research
Multimedia Technologies Research Department
233 Mt. Airy Road
Basking Ridge, NJ 07920, USA
e-mail: ejd@avaya.com