分类号:
UDC:
学校代号:l 1845
密级:
学号:112 1 1 040 10
广东工业大学博士学位论文
(工学博士)
声纹识别鲁棒性技术及应用研究
张晶
指导教师姓名、职称:
专业或领域名称: 控带鲤论皇控剑王猩
学生所属学院名称:
自动化学院
昌星筮塾握
论文答辩日期:
2015年l 2月
A Dissertatio n Submitted to Guangdo ng University of Technolo gy
for the Degree of Doctor of Philosophy
(PhD in Engineering Science)
Research on voicep ri n t recog n ition rob u st
technology and ltsapplICatIOnII
。
…
o‘
-o
P h D.C a n d i d a t e:Z HA N G J i n g
S u p e rvi s O r:P r Of.Y U S i—m i n
D e c e m b e r 2 0 1 5
F a c u lt Y Of A Ut O m a t i O n
G u a n g d O n g U n i ve rs i t Y Of Te C h rl o I O g Y
G u a n g zh O U,G U a n g d O n g,P.R.C h i n a,5 1 0 0 0 6
摘要
摘要
声纹识别技术通过对说话者语音和数据库中登记的声纹作比较,对用户进行身份校验和鉴
别,从而确定该说话人是否为本人或是否为集群中的哪个人。声纹识别所提供的安全性可与其
他生物识别技术(指纹、掌形和虹膜)相媲美,且只需电话或麦克风即可,数据采集极为方便,
造价低廉,是最为经济、可靠、简便和安全的身份识别方式,目前在市场上有了很大的发展前
景。但背景噪声环境下识别率低以及实时性差等问题制约着其走向实际应用,提高系统鲁棒性
和实时性是声纹识别技术广泛应用的关键所在。
声纹识别系统主要由语音信号预处理及端点检测,特征参数提取,声纹模型的训练及匹配
识别等几部分组成,论文以提高系统的鲁棒性和实时性为目的,分别对这几部分的实现技术或
算法进行深入研究,并通过开发设计的两个应用系统对所提算法进行验证,实验结果证明所提
出的算法能有效提高系统的鲁棒性和实时性。论文共分六章。第一章概述声纹识别系统各个组
成部分实现技术或算法的研究现状。从第二章开始,主要研究内容从五个部分展开,每一部分
作为一章,其中二章、三章、四章主要研究系统鲁棒性的提高,五章研究系统实时性的提高,
六章给出了作者开发的两个应用系统,将上述研究的技术应用其中,验证其对系统鲁棒性和实
时性提高的有效性。
第二章研究了带噪语音的预处理技术及端点检测算法,鉴于语音和噪声在语谱图上表现出
的直观差异,论文采用语谱图端点检测方法。语谱图端点检测的技术难点在于如何用数学量将
语谱图上的直观差异表述出来,根据自相关系数对图像纹理特性的描述能力,论文选用自相关
函数描述这一差异,提出列自相关语谱图检测法。通过语谱图自相关函数的分布,找到区分语
音和噪声的分界点,作为带噪语音端点检测的阈值。由于论文采用的是宽带语谱图,频率分辨
率差,所以经过列自相关语谱图检测之后,语音列中仍然残留噪声,为了在不同频段进一步去
嗓,论文结合经验模态分解EMD的多分辨性,将带噪语音先进行多分辨分析,分解为不同的频
率尺度之后再进行列自相关语谱图分析,实验证明带噪语音的降噪效果比较理想。
第三章研究了说话人语音特征参数的提取。最理想的声纹识别语音特征参数是只反映说话
人特征,不反映语义信息,而且数据总量小。实验表明要区分说话人身份信息,选择的参数既
要包括声门特征,也要包括声道特征。因此论文将声道特征和声门特征结合,从而使说话人之
间具有良好的区分性。对常用的说话人声道特征和声门特征进行分析对比,选取美尔倒谱系数
MFCC表征声道特征,选取基音周期表征声门特征,并将两个特征参数结合,结合的方式为MFCC
T
广东工业大学博士学位论文
三角滤波器组所包含的Mel滤波器的个数以及组内各滤波器的中心频率由基音周期动态决定,
称之为基于基因周期的MFCC特征参数。为了进一步提升声纹识别系统的识别率,通过引入Delta
特征获取语音各帧之间的时变要素,在基于基音周期的MFCC特征参数的基础上,扩展Delta
特征。扩展的特征参数的表现力增强了,但随之而来的是维数增加导致后续计算复杂度增加,
论文提出一种分块合并映射降维处理算法。实验证明了所提取的特征参数及处理方法有助于系
统鲁棒性提高。
第四章研究了声纹识别模型。针对文本相关的声纹识别系统,主要研究了隐马尔可夫(HMM)
模型,包括实现过程中问题的解决以及其鲁棒性的分析研究。文本无关的声纹识别模型系统主
要研究了高斯混合(GMM)模型,分别从训练阶段和识别阶段对GMM模型进行改进。训练阶段提
出一种基于邻近规则的k-means算法获取GMM初值,克服了传统方法因过分关注少数指标而造
成系统整体性能不佳的缺点,通过简化最大期望算法(EM)推导过程,提高系统的训练速度和
识别率。识别阶段,为了避免坏帧对判决结果的影响,提出基于熵值的帧匹配权重法,提高系
统的鲁棒性。
第五章研究了提高声纹识别系统效率的方法。从模型聚类的思想出发,提出基于模型生长
聚类的GMM模型快速识别法和基于特征参数统计分组的HMM快速识别法。模型生长聚类算法的
聚类策略是由起初的一个类生长出多个类,实现对说话人模型的聚类,核心算法是基于亲密度
概念的分组策略、基于近似熵的相似性准则和类GMM的产生;对于利用HMM模型实现的声纹识
别系统,针对HMM模型与GMM模型结构的不同,采用将特征参数序列进行聚类分组的策略,将
聚在同一组的语音特征参数训练得到的HMM模型归为一组,达到了将模型库进行分组的目的,
巧妙地避开了由于HMM模型结构所带来的直接将模型进行聚类分组的难度,核心算法为基于邻
近规则的K—means算法、二次平滑分组算法、基于DTW的相似性准则和基于特征参数的类选择。
第六章给出了两个声纹识别技术应用系统:基于HMM的移动终端声纹签到系统和基于GMM
的手机声纹锁系统。所用为上述章节所研究的提高鲁棒性和实时性的技术和算法,介绍系统的
开发设计过程,分别对两个应用系统环境适应鲁棒性和实时性进行了实验,结果验证了所研究
的技术和算法的有效性。
关键词:声纹识别;多分辨语谱图;特征参数优化;特征参数降维;GMM模型;HMM模型;聚
类生长;亲密度;特征参数分组;鲁棒性;实时性
本研究工作得到了广东省科技技术项目(20138040401015)的资助。
II
ABSTRACT
ABSTRACT
By comparing with the voiceprint from the speaker or enrolled in the database,the
voiceprint recognition technology checks and identifies the user’s status and determines
whether the speaker is the target people or not.As the most economical,reliable,
convenient and safe way of identity,the security that voiceprint recognition provided can
rival other biometric technologies(fingerprints,palm and iris),and need no special
equipment but a phone or a microphone,and the data acquisition is extremely convenient
with low cost.At present the technology got great development prospects.But the low
recognition rate and limited real—time character problem in the environment with bad
background noise restrict its practical application,SO the key to a voiceprint recognition
technology is to improve the system robustness and real—time performance
Voiceprint recognition system is mainly composed of speech signal preprocessing
and endpoint detection and feature parameter extraction as well as the training and
matching of the voiceprint model.In order to improve the robustness of the system and
real time,the paper deeply studied the technology or algorithm afore mentioned,and the
idea is further tested by two application systems.The experimental results show that the
proposed algorithm is effective.The paper is divided into six chapters.The first chapter
is an overview of voiceprint recognition which described the current research status of
imp lementation technology or algorithm.Starting from the second chapter,the main
research content spread successive ly with five parts,chapter two,chapter three and
chapter four mainly study the improvement ofthe system robustness,chapter five studies
the improvement of system real-time performance,while chapter six gives the two
application system developed by the author,in which the technology mentioned in the
paper is applied,and the robustness and the real—time performance of the system are
verified.
In the second chapter,the speech pretreatment technology and endpoint detection
algorithm in noise background were studied,in view ofthe intuitive difference between
the speech and noise in spectrogram,this paper puts forward the endpoint detec tion
methods based spectrogram.The technical difficulty of the endpoint detection of the
ⅡI
广东工业大学博士学位论文
Spectrogram is how to express the visual differences.According to the characteristic of
the autocorrelation,the paper chooses the autocon'elation function to describe the
difference.The column autocorrelation function of spectrogram can describe
the
difference obviously.By the autocorrelation function distribution of speech and noise
spectra,the cut-off point of distinction between speech and noise can be found,and this
point can be taken as the endpoint detection threshold of speech with noise.Because
paper uses a broadband spectrogram,frequency resolution will be somewhat less,SO
after row autocorrelation spectrogram detection,speech column still have residual noise
In order to further remove the noise in different frequency,the paper combined
multi—resolution of empirical mode decomposition analyzed the speech with noise and
decomp osed the speech into d ifferent frequency scale b efore column autocorrelation
spectra analysi s,the experiments proved that the effect
is ideal.
In the third chapter,the extraction of speech feature parameter was studied.For the
voiceprint recognition system the most ideal speech feature parameters does not reflect
the semantic information but the characteristics of the speaker,and owns a small data
amount.Experiments show that the voice is the excitation signal from the sound source
through the resonance of soundtrack and radiated by the mouth and nose.The speaker
identity information was reflected by the characteristics of both glottis and the channel
So the paper proposed to combine channel characteristics and characteristics of the
glottis,SO as to make good distinguish between speakers.Through the comparison
analysis of common channel characteristics and glottis characteristics,the cepslrum
coefficient MFCC was selected to represent a channel characteristics and the pitch
represent glottis features,and comb ined the two characteristic parameters,the concreted
combination ways is:the center frequency of each filter contained in Mel filter MFCC
triangular
filter group is no
longer
fixed,but according to
pitch frequency of
corresponding point in the actual frequency domain,the number of filter
is also a
dynamic,this feature is called MFCC feature parameters based on pitch cycle.。In order
to further improve the recognition rate of voiceprint recognition system,by introducing
the Delta characteristics to get time—varying elements between each speech flame,and
based the MFCC feature parameters to extend Delta features.With the expression of
IV
ABSTRACT
extended characteristic parameters improved,the subsequent calculation time increased,
SO the paper puts forward a kind of dimension reduction algorithm based mapping block
combination.Experiments show that the feature parameters and processing methods are
helpful to improve the yobustness of the system_
In the fourth chapter,the voicep rint recognition model was studied.Aimed at the
voiceprint recognition system described in the paper,mainly studied the Hidden Markov
Model(HNM),including the solution of the problem in the process of implementation
and the system robustness analysis.For the
voiceprint recognition system with
text-independent mainly studied the Gaussian Mixture Model(GMM),and improved the
GMM model respective ly in the stage of training and recognition.In the training stage a
k—means algorithm based on adjacent rules for GMM initial value was presented,which
overcame the disadvantage caused by traditional method that focus too much o n a few
indicators,by simplifying the maximum expected algorithm(EM)derivation and adding
a correction coefficient method,the training speed and the recognition rate of the system
improved.In the recognition stage,in order to avoid the influence to the verdict caused
by the bad flame,a weighted flame scoring algorithm based on entropy was put forward
and improved the robustness of the system.
In the fifth chapter,the method of improving the efficiency of voiceprint recognition
system was studied.Based on the idea of model clustering,a GMM model rapid
recognition method based the model growth clustering is proposed,and a HMM rapid
recognition method based statistical grouping of characteristic parameters is proposed as
well.The clustering strategy of growth model clustering algorithm is to grow a multiple
classes from a initial class to realize the speaker model clustering,the core algorithm of
which is symmetric grouping strategy based on the concept of density set and the
similarity criterion based on relative entropy as well as the class of GMM.For the
voiceprint recognition system implementation by using the HMM model,in view of the
difference b etween the HMM model and GMM mod el structure,the strategy is to cluster
and grouping the characteristic parameters series,and gather HMM model obtained by
training speech feature parameters in the same group into a group,which achieved the
purpose of grouping the model fibrary,and cleverly avoided the difficulty of clustering
V
广东工业大学博士学位论文
and grouping directly the model due to the HMM model structure,and the core algorithm
is K—means algorithm based on aajacent rules and secondary smooth grouping algorithm
and the similarity criterion based DTW(dynamic time warping)and class choice of
characteri stic parameters
In the sixth chapter,two voiceprint recognition technology application systems were
proposed,which are mobile terminal voiceprint sign in system based on HMM and
mobile phone voice print lock system based on GMM,Both systems used the techniques
and algorithms about real time and robustness mentioned in the above chapter,and the
development of the system was introduced at the same time,and the robustness and
real—time performance of the two application system was tested respectively,the results
proved the validity of the researched techniques and algorithms
Keywords:voiceprint
recognition;The multi—resolution
spectrum;Characteristic
parameters optimization;Characteristic parameters dimension reduction;GMM model;
HMM model;The clustering growth;Density set;Characteristic parameters grouping;
Robustnes s;teal time
VI