声纹识别鲁棒性技术及应用研究.pdf

发布时间：2022-05-31 发布人：admin 分类：说明书资料大小：18.13M 资料格式：pdf 举报版权申诉

aynurnurtay130-10416166-4744302543393714049.pdf-第1页.png

第1页 / 共134页

aynurnurtay130-10416166-4744302543393714049.pdf-第2页.png

第2页 / 共134页

aynurnurtay130-10416166-4744302543393714049.pdf-第3页.png

第3页 / 共134页

aynurnurtay130-10416166-4744302543393714049.pdf-第4页.png

第4页 / 共134页

aynurnurtay130-10416166-4744302543393714049.pdf-第5页.png

第5页 / 共134页

aynurnurtay130-10416166-4744302543393714049.pdf-第6页.png

第6页 / 共134页

aynurnurtay130-10416166-4744302543393714049.pdf-第7页.png

第7页 / 共134页

aynurnurtay130-10416166-4744302543393714049.pdf-第8页.png

第8页 / 共134页

封面

摘要

英文摘要

第一章绪论

1．1 研究背景及意义

1．1．1 声纹识别技术概述

1．1．2 声纹识别技术应用及研究意义

1．2 国内外研究动态

1．2．1 语音端点检测方法研究现状及分析

1．2．2 特征提取技术发展现状及分析

1．2．3 声纹识别模型发展现状及分析

1．3 声纹识别技术中的实用性研究现状及分析

1．4 实用需求

1．5 本文主要研究内容与章节安排

1．5．1 主要研究内容

1．5．2 论文章节安排

第二章语音端点检测及去噪

2．1 引言

2．2 带噪语音的预处理

2．2．1 采样

2．2．2 量化

2．2．3 预加重

2．2．4 分帧处理

2．2．5 加窗函数

2．3 基于多分辨语谱图的端点检测及去噪方法

2．3．1 语谱图预处理——维纳滤波处理

2．3．2 基于自相关函数的语谱图端点检测

2．3．3 多分辨分析与EMD分解算法

2．4 实验结果及分析

2．5 本章小结

第三章说话人特征参数提取

3．1 引言

3．2 语音信号产生的数学模型

3．3 基于声道的语音特征参数

3．3．1 LPCC特征参数

3．3．2 MFCC特征参数

3．4 基于声门的语音特征参数

3．4．1 能量参数

3．4．2 基音周期

3．5 特征参数优化

3．5．1 基于基音周期的MFCC特征参数(Pitch-MFCC)

3．5．2 Delta特征的引入

3．5．3 特征参数组合实验结果

3．6 特征参数降维处理

3．7 小结

第四章说话人声纹识别模型研究

4．1 引言

4．2 声纹识别模型

4．2．1 矢量量化模型(VQ)

4．2．2 人工神经网络(ANN)

4．2．3 隐马尔可夫模型(HMM)

4．2．4 高斯混合模型(GMM)

4．3 HMM声纹识别模型鲁棒性研究

4．4 GMM声纹识别模型鲁棒性研究

4．4．1 EM算法改进

4．4．2 基于熵值的匹配权重算法

4．4．3 实验结果验证

4．5 本章小结

第五章提高声纹识别系统效率的方法研究

5．1 引言

5．2 基于GMM的模型生长聚类法(GCM)

5．2．1 分组策略

5．2．2 基于相对熵的相似性准则

5．2．3 基于亲密度的聚类分组算法

5．2．4 类代表的产生

5．2．5 类GMM的生成

5．2．6 参数设置

5．2．7 基于匹配子集的类选择策略

5．2．8 实验分析

5．3 基于特征参数聚类的HMM快速识别算法

5．3．1 基于特征参数的模型库分组策略

5．3．2 基于邻近规则的k-means聚类分组算法

5．3．3 二次平滑分组算法

5．3．4 基于特征参数的类选择及度量

5．3．5 实验分析

5．4 本章小结

第六章声纹识别技术应用系统

6．1 引言

6．2 基于HMM的移动终端声纹签到系统

6．2．1 开发背景

6．2．2 系统开发环境

6．2．3 系统体系结构

6．2．4 签到系统的模块设计

6．2．5 系统开发设计过程

6．3 基于GMM的手机声纹锁系统

6．3．1 应用背景

6．3．2 系统开发环境

6．3．3 手机声纹锁系统开发设计

6．4 系统功能测试

6．4．1 环境适应鲁棒性实验

6．4．2 实时性实验

6．5 本章小结

总结

参考文献

攻读博士学位期间的研究成果

声明

致谢

分类号： UDC：学校代号：l 1845 密级：学号：112 1 1 040 10 广东工业大学博士学位论文 (工学博士) 声纹识别鲁棒性技术及应用研究张晶指导教师姓名、职称：专业或领域名称：控带鲤论皇控剑王猩学生所属学院名称：自动化学院昌星筮塾握论文答辩日期： 2015年l 2月

A Dissertatio n Submitted to Guangdo ng University of Technolo gy for the Degree of Doctor of Philosophy (PhD in Engineering Science) Research on voicep ri n t recog n ition rob u st technology and ltsapplICatIOnII 。 … o‘ -o P h D．C a n d i d a t e：Z HA N G J i n g S u p e rvi s O r：P r Of．Y U S i—m i n D e c e m b e r 2 0 1 5 F a c u lt Y Of A Ut O m a t i O n G u a n g d O n g U n i ve rs i t Y Of Te C h rl o I O g Y G u a n g zh O U，G U a n g d O n g，P．R．C h i n a，5 1 0 0 0 6

摘要摘要声纹识别技术通过对说话者语音和数据库中登记的声纹作比较，对用户进行身份校验和鉴别，从而确定该说话人是否为本人或是否为集群中的哪个人。声纹识别所提供的安全性可与其他生物识别技术(指纹、掌形和虹膜)相媲美，且只需电话或麦克风即可，数据采集极为方便，造价低廉，是最为经济、可靠、简便和安全的身份识别方式，目前在市场上有了很大的发展前景。但背景噪声环境下识别率低以及实时性差等问题制约着其走向实际应用，提高系统鲁棒性和实时性是声纹识别技术广泛应用的关键所在。声纹识别系统主要由语音信号预处理及端点检测，特征参数提取，声纹模型的训练及匹配识别等几部分组成，论文以提高系统的鲁棒性和实时性为目的，分别对这几部分的实现技术或算法进行深入研究，并通过开发设计的两个应用系统对所提算法进行验证，实验结果证明所提出的算法能有效提高系统的鲁棒性和实时性。论文共分六章。第一章概述声纹识别系统各个组成部分实现技术或算法的研究现状。从第二章开始，主要研究内容从五个部分展开，每一部分作为一章，其中二章、三章、四章主要研究系统鲁棒性的提高，五章研究系统实时性的提高，六章给出了作者开发的两个应用系统，将上述研究的技术应用其中，验证其对系统鲁棒性和实时性提高的有效性。第二章研究了带噪语音的预处理技术及端点检测算法，鉴于语音和噪声在语谱图上表现出的直观差异，论文采用语谱图端点检测方法。语谱图端点检测的技术难点在于如何用数学量将语谱图上的直观差异表述出来，根据自相关系数对图像纹理特性的描述能力，论文选用自相关函数描述这一差异，提出列自相关语谱图检测法。通过语谱图自相关函数的分布，找到区分语音和噪声的分界点，作为带噪语音端点检测的阈值。由于论文采用的是宽带语谱图，频率分辨率差，所以经过列自相关语谱图检测之后，语音列中仍然残留噪声，为了在不同频段进一步去嗓，论文结合经验模态分解EMD的多分辨性，将带噪语音先进行多分辨分析，分解为不同的频率尺度之后再进行列自相关语谱图分析，实验证明带噪语音的降噪效果比较理想。第三章研究了说话人语音特征参数的提取。最理想的声纹识别语音特征参数是只反映说话人特征，不反映语义信息，而且数据总量小。实验表明要区分说话人身份信息，选择的参数既要包括声门特征，也要包括声道特征。因此论文将声道特征和声门特征结合，从而使说话人之间具有良好的区分性。对常用的说话人声道特征和声门特征进行分析对比，选取美尔倒谱系数 MFCC表征声道特征，选取基音周期表征声门特征，并将两个特征参数结合，结合的方式为MFCC T

广东工业大学博士学位论文三角滤波器组所包含的Mel滤波器的个数以及组内各滤波器的中心频率由基音周期动态决定，称之为基于基因周期的MFCC特征参数。为了进一步提升声纹识别系统的识别率，通过引入Delta 特征获取语音各帧之间的时变要素，在基于基音周期的MFCC特征参数的基础上，扩展Delta 特征。扩展的特征参数的表现力增强了，但随之而来的是维数增加导致后续计算复杂度增加，论文提出一种分块合并映射降维处理算法。实验证明了所提取的特征参数及处理方法有助于系统鲁棒性提高。第四章研究了声纹识别模型。针对文本相关的声纹识别系统，主要研究了隐马尔可夫(HMM) 模型，包括实现过程中问题的解决以及其鲁棒性的分析研究。文本无关的声纹识别模型系统主要研究了高斯混合(GMM)模型，分别从训练阶段和识别阶段对GMM模型进行改进。训练阶段提出一种基于邻近规则的k-means算法获取GMM初值，克服了传统方法因过分关注少数指标而造成系统整体性能不佳的缺点，通过简化最大期望算法(EM)推导过程，提高系统的训练速度和识别率。识别阶段，为了避免坏帧对判决结果的影响，提出基于熵值的帧匹配权重法，提高系统的鲁棒性。第五章研究了提高声纹识别系统效率的方法。从模型聚类的思想出发，提出基于模型生长聚类的GMM模型快速识别法和基于特征参数统计分组的HMM快速识别法。模型生长聚类算法的聚类策略是由起初的一个类生长出多个类，实现对说话人模型的聚类，核心算法是基于亲密度概念的分组策略、基于近似熵的相似性准则和类GMM的产生；对于利用HMM模型实现的声纹识别系统，针对HMM模型与GMM模型结构的不同，采用将特征参数序列进行聚类分组的策略，将聚在同一组的语音特征参数训练得到的HMM模型归为一组，达到了将模型库进行分组的目的，巧妙地避开了由于HMM模型结构所带来的直接将模型进行聚类分组的难度，核心算法为基于邻近规则的K—means算法、二次平滑分组算法、基于DTW的相似性准则和基于特征参数的类选择。第六章给出了两个声纹识别技术应用系统：基于HMM的移动终端声纹签到系统和基于GMM 的手机声纹锁系统。所用为上述章节所研究的提高鲁棒性和实时性的技术和算法，介绍系统的开发设计过程，分别对两个应用系统环境适应鲁棒性和实时性进行了实验，结果验证了所研究的技术和算法的有效性。关键词：声纹识别；多分辨语谱图；特征参数优化；特征参数降维；GMM模型；HMM模型；聚类生长；亲密度；特征参数分组；鲁棒性；实时性本研究工作得到了广东省科技技术项目(20138040401015)的资助。 II

ABSTRACT ABSTRACT By comparing with the voiceprint from the speaker or enrolled in the database，the voiceprint recognition technology checks and identifies the user’s status and determines whether the speaker is the target people or not．As the most economical，reliable， convenient and safe way of identity，the security that voiceprint recognition provided can rival other biometric technologies(fingerprints，palm and iris)，and need no special equipment but a phone or a microphone，and the data acquisition is extremely convenient with low cost．At present the technology got great development prospects．But the low recognition rate and limited real—time character problem in the environment with bad background noise restrict its practical application,SO the key to a voiceprint recognition technology is to improve the system robustness and real—time performance Voiceprint recognition system is mainly composed of speech signal preprocessing and endpoint detection and feature parameter extraction as well as the training and matching of the voiceprint model．In order to improve the robustness of the system and real time，the paper deeply studied the technology or algorithm afore mentioned，and the idea is further tested by two application systems．The experimental results show that the proposed algorithm is effective．The paper is divided into six chapters．The first chapter is an overview of voiceprint recognition which described the current research status of imp lementation technology or algorithm．Starting from the second chapter,the main research content spread successive ly with five parts，chapter two，chapter three and chapter four mainly study the improvement ofthe system robustness，chapter five studies the improvement of system real-time performance，while chapter six gives the two application system developed by the author,in which the technology mentioned in the paper is applied，and the robustness and the real—time performance of the system are verified． In the second chapter,the speech pretreatment technology and endpoint detection algorithm in noise background were studied，in view ofthe intuitive difference between the speech and noise in spectrogram，this paper puts forward the endpoint detec tion methods based spectrogram．The technical difficulty of the endpoint detection of the ⅡI

广东工业大学博士学位论文 Spectrogram is how to express the visual differences．According to the characteristic of the autocorrelation，the paper chooses the autocon'elation function to describe the difference．The column autocorrelation function of spectrogram can describe the difference obviously．By the autocorrelation function distribution of speech and noise spectra，the cut-off point of distinction between speech and noise can be found，and this point can be taken as the endpoint detection threshold of speech with noise．Because paper uses a broadband spectrogram，frequency resolution will be somewhat less，SO after row autocorrelation spectrogram detection，speech column still have residual noise In order to further remove the noise in different frequency,the paper combined multi—resolution of empirical mode decomposition analyzed the speech with noise and decomp osed the speech into d ifferent frequency scale b efore column autocorrelation spectra analysi s，the experiments proved that the effect is ideal． In the third chapter,the extraction of speech feature parameter was studied．For the voiceprint recognition system the most ideal speech feature parameters does not reflect the semantic information but the characteristics of the speaker,and owns a small data amount．Experiments show that the voice is the excitation signal from the sound source through the resonance of soundtrack and radiated by the mouth and nose．The speaker identity information was reflected by the characteristics of both glottis and the channel So the paper proposed to combine channel characteristics and characteristics of the glottis，SO as to make good distinguish between speakers．Through the comparison analysis of common channel characteristics and glottis characteristics，the cepslrum coefficient MFCC was selected to represent a channel characteristics and the pitch represent glottis features，and comb ined the two characteristic parameters，the concreted combination ways is：the center frequency of each filter contained in Mel filter MFCC triangular filter group is no longer fixed，but according to pitch frequency of corresponding point in the actual frequency domain，the number of filter is also a dynamic，this feature is called MFCC feature parameters based on pitch cycle．。In order to further improve the recognition rate of voiceprint recognition system，by introducing the Delta characteristics to get time—varying elements between each speech flame，and based the MFCC feature parameters to extend Delta features．With the expression of IV

ABSTRACT extended characteristic parameters improved，the subsequent calculation time increased， SO the paper puts forward a kind of dimension reduction algorithm based mapping block combination．Experiments show that the feature parameters and processing methods are helpful to improve the yobustness of the system_ In the fourth chapter,the voicep rint recognition model was studied．Aimed at the voiceprint recognition system described in the paper,mainly studied the Hidden Markov Model(HNM)，including the solution of the problem in the process of implementation and the system robustness analysis．For the voiceprint recognition system with text-independent mainly studied the Gaussian Mixture Model(GMM)，and improved the GMM model respective ly in the stage of training and recognition．In the training stage a k—means algorithm based on adjacent rules for GMM initial value was presented，which overcame the disadvantage caused by traditional method that focus too much o n a few indicators，by simplifying the maximum expected algorithm(EM)derivation and adding a correction coefficient method，the training speed and the recognition rate of the system improved．In the recognition stage，in order to avoid the influence to the verdict caused by the bad flame，a weighted flame scoring algorithm based on entropy was put forward and improved the robustness of the system． In the fifth chapter,the method of improving the efficiency of voiceprint recognition system was studied．Based on the idea of model clustering，a GMM model rapid recognition method based the model growth clustering is proposed，and a HMM rapid recognition method based statistical grouping of characteristic parameters is proposed as well．The clustering strategy of growth model clustering algorithm is to grow a multiple classes from a initial class to realize the speaker model clustering，the core algorithm of which is symmetric grouping strategy based on the concept of density set and the similarity criterion based on relative entropy as well as the class of GMM．For the voiceprint recognition system implementation by using the HMM model，in view of the difference b etween the HMM model and GMM mod el structure，the strategy is to cluster and grouping the characteristic parameters series，and gather HMM model obtained by training speech feature parameters in the same group into a group，which achieved the purpose of grouping the model fibrary,and cleverly avoided the difficulty of clustering V

广东工业大学博士学位论文 and grouping directly the model due to the HMM model structure，and the core algorithm is K—means algorithm based on aajacent rules and secondary smooth grouping algorithm and the similarity criterion based DTW(dynamic time warping)and class choice of characteri stic parameters In the sixth chapter，two voiceprint recognition technology application systems were proposed，which are mobile terminal voiceprint sign in system based on HMM and mobile phone voice print lock system based on GMM，Both systems used the techniques and algorithms about real time and robustness mentioned in the above chapter,and the development of the system was introduced at the same time，and the robustness and real—time performance of the two application system was tested respectively，the results proved the validity of the researched techniques and algorithms Keywords：voiceprint recognition；The multi—resolution spectrum；Characteristic parameters optimization；Characteristic parameters dimension reduction；GMM model； HMM model；The clustering growth；Density set；Characteristic parameters grouping； Robustnes s；teal time VI

分享到：

赞收藏

资料库

声纹识别鲁棒性技术及应用研究.pdf

相关推荐

人工智能

热门标签

最新资料