CN102509547B - Method and system for voiceprint recognition based on vector quantization based - Google Patents

Method and system for voiceprint recognition based on vector quantization based Download PDF

Info

Publication number
CN102509547B
CN102509547B CN 201110450364 CN201110450364A CN102509547B CN 102509547 B CN102509547 B CN 102509547B CN 201110450364 CN201110450364 CN 201110450364 CN 201110450364 A CN201110450364 A CN 201110450364A CN 102509547 B CN102509547 B CN 102509547B
Authority
CN
China
Prior art keywords
codebook
speaker
speech
codeword
step
Prior art date
Application number
CN 201110450364
Other languages
Chinese (zh)
Other versions
CN102509547A (en
Inventor
霍春宝
赵立辉
崔文翀
张彩娟
曹景胜
Original Assignee
辽宁工业大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 辽宁工业大学 filed Critical 辽宁工业大学
Priority to CN 201110450364 priority Critical patent/CN102509547B/en
Publication of CN102509547A publication Critical patent/CN102509547A/en
Application granted granted Critical
Publication of CN102509547B publication Critical patent/CN102509547B/en

Links

Abstract

一种基于矢量量化的声纹识别方法及系统,具有良好识别性能和抗噪能力,识别效果比较好,建模数据少,判决速度快,而且复杂度不高。 Vector quantization voiceprint recognition method and system based on recognition performance and good noise immunity results were better recognition, data modeling small, fast decision speed, and low complexity. 具体步骤如下:语音信号的采集;语音信号预处理;语音信号特征参数提取:采用MFCC参数,MFCC的阶数为12~16;模板训练:采用LBG聚类算法为系统中的每一个说话人建立一个码本存储在语音数据库中作为该说话人的语音模板;声纹辨识:通过将采集到的待识别语音信号特征参数与库中已建立的说话人语音模板进行比较,并根据加权欧式距离测度进行判断,若对应的说话人模板使得待识别的话者语音特征向量X具有最小平均距离测度,则认为识别出说话人。 The following steps: collecting speech signal; speech signal pre-processing; speech feature parameter extraction: using MFCC parameters, MFCC order of 12 to 16; template training: LBG was established by a clustering algorithm for each speaker system one codebook stored in the speech database as the speaker's utterance; voiceprint identification: the parameter by comparing the collected speech to be recognized signal characteristics with speaker speech template database has been established, according to the weighted Euclidean distance measure the determination, if the corresponding template such that the speaker's voice to be recognized word feature vector X having the minimum average distance measure, it is recognized that speaker.

Description

基于矢量量化的声纹识别方法及系统 A vector quantization method and a voiceprint identifying system

技术领域 FIELD

[0001] 本发明属于语音信号处理技术,特别涉及一种用说话人的语音信号来辨识说话人身份的基于矢量量化的声纹识别方法及系统。 [0001] The present invention pertains to speech signal processing, and more particularly to a voiceprint identifying the vector quantization method and a system with a voice signal to the speaker recognition based on the identity of the speaker.

背景技术 Background technique

[0002] 近年来,随着信息处理与人工智能技术的广泛应用,以及人们对快速有效身份验证的迫切要求,传统密码认证的身份识别已经逐渐失去了他的地位,而在生物识别领域中,基于说话人语音的身份识别技术却受到了越来越多的人的青睐。 [0002] In recent years, with the wide application of information processing and artificial intelligence technologies, and there is an urgent requirement for rapid and effective authentication, the traditional password authentication identity has gradually lost his position in the field of biometrics, identification technology based on the speaker's voice has been more and more people of all ages.

[0003] 由于每个人的发音器官的生理差异以及后天形成的行为差异导致发音方式和说话习惯各不相同,因此用说话人的语音来识别身份成为可能。 [0003] due to physiological differences in each person's speech organs and lead to behavioral differences acquired form and pronunciation of speech habits are different, hence the speaker's voice to identify the identity possible. 声纹识别除了具有不会遗忘、不需记忆、使用方便等优点外,还具有下列特性:首先,它的认证方式易于接受,使用的“密码”为声音,开口即得;其次,识别文本的内容可以随机,不易窃取,安全性能比较高;第三,识别使用的终端设备为麦克风或电话,成本低廉且易于和现有通信系统相结合。 In addition to the voiceprint identifying not forget, without memory, ease of use, but also has the following characteristics: First, its authentication easily accepted, "Password" used as sound, i.e., to obtain an opening; Secondly, recognizing text content may be random and difficult to steal, high safety performance; third, the terminal device identifies the microphone or telephone use, low cost and easy to combine with existing communication systems. 因此,声纹识别的应用前景非常广阔:在经济活动中,可以实现各银行的汇款、余额查询、转账等;在保密安全中,可以用指定的声音检查秘密场所的人员,其只响应特定说话人;在司法鉴定中,可以根据即时录音判断疑犯中作案者的真实身份;在生物医学中,可以使该系统只响应患者的命令,从而实现对使用者假肢的控制。 Therefore, the prospect voiceprint recognition is very broad: in economic activities, can achieve the bank's remittance, balance inquiries, transfers, etc.; in the secret security can be used to specify the sound check secret places, which respond only to a particular speaker people; in judicial identification, can determine the true identity of the suspect in the immediate recording of the perpetrator; in biomedicine, so that the system can only respond to the patient's command, enabling the user to control the prosthesis.

[0004] 声纹识别的关键技术主要是语音信号特征参数提取和模型匹配。 [0004] Key voiceprint identification technology mainly speech signal and extracting feature parameters match the model. 语音信号特征参数大体可分为两类:一类是主要体现说话人发音器官生理特性的低层特征,如根据人耳对不同频率的语音信号的敏感程度提取的梅尔频率倒谱系数(MFCC),根据语音信号的全极点模型得到的线性预测倒谱系数(LPCC)等;另一类是主要体现说话人用语习惯、发音特点的高层特征,如反映说话人语音抑扬顿挫的韵律特征(Prosodic Features)、反映说话人习惯用语中音素统计规律的音素特征(Phone Features)等。 Characteristic parameters of the speech signal can be roughly divided into two categories: one is mainly low-level features of the speaker to pronounce physiological characteristics of the organ, such as in accordance with the sensitivity of the human ear to different frequencies of the speech signal extracting Mel Frequency Cepstral Coefficients (MFCC) according to linear predictive all-pole model voice signal obtained cepstral Coefficients (LPCC); the other is mainly used to the speaker's language, pronunciation characteristics of high-level features, such as prosodic features reflect the speaker's voice cadence (prosodic features) , reflecting the statistical law phoneme speaker idiom phoneme features (Phone features) and so on. LPCC是基于语音信号的发音模型建立的,容易受到假设模型的影响,高层特征虽然有些文献中使用,但识别率并不是很高。 LPCC pronunciation model is established based on the voice signal is easily affected by assumptions of the model, although some high-level features of the literature, but the recognition rate is not very high.

[0005] 针对各种语音信号特征参数而提出的模型匹配方法主要有动态时间规整(DTW)法、矢量量化(VQ)法、高斯混合模型(GMM)法、人工神经网络(ANN)法等。 [0005] The method for various model matching speech feature parameters are mainly proposed dynamic time warping (DTW) method, a vector quantization (VQ) method, a Gaussian mixture model (GMM), artificial neural network (ANN) method. 其中DTW模型依赖于参数的时间顺序,实时性能较差,适合基于孤立字(词)的说话人识别;GMM主要用于大量语音的说话人识别,需要较多的模型训练数据,较长的训练时间及识别时间,而且还需要较大的内存空间。 DTW model which depends on the time of the order parameter, poor real-time performance, suitable for speaker recognition based on isolated words (words) of; GMM speaker recognition is mainly used for a large number of voice, need more training data model, a longer training time and time recognition, but also need a larger memory space. 在ANN模型中,对最佳模型拓扑结构的设计的训练算法并不一定能保证收敛,而且会存在过学习的问题。 In the ANN model training algorithm for optimal design topology model does not necessarily guarantee convergence, and learning problems exist before. 在基于VQ的说话人识别中,模板匹配不依赖参数的时间顺序,实时性比较好,而且建模数据少,判决速度快,复杂度也不高。 VQ-based speaker recognition, template matching chronological order does not depend on the parameters of real-time performance is better, and less data modeling, rapid decision speed, complexity is not high. 基于矢量量化模型的说话人识别原理是把每个说话人的语音信号特征参数量化成码本,保存在语音库中作为说话人的语音模板,识别时将待识别语音的特征矢量与语音库中已有的某一个说话人的语音模板进行比较,计算各自的总平均量化失真,以最小失真的语音模板作为识别结果。 Speaker identification is to model the principle of vector quantization to each speaker into speech feature parameter quantization codebook, stored in the speech database as a feature vector speaker utterance, the speech to be recognized when the speech recognition database existing one speaker utterance calculated by comparing the respective total average quantization distortion, with minimal distortion voice recognition results as a template. 然而不足之处是语音信号是成椭圆状的正态分布,各矢量的分布不相等,在基于传统的VQ说话人识别系统的欧氏距离测度中没有得到很到的反应。 However, the downside is the speech signal is a normal distribution into an ellipsoidal shape, unequal distribution of vectors, it is not in reaction to the Euclidean distance measure based VQ conventional speaker recognition system. 发明内容 SUMMARY

[0006] 本发明要解决的技术问题是提出一种基于矢量量化的声纹识别方法及系统,具有良好识别性能和抗噪能力,识别效果比较好,建模数据少,判决速度快,而且复杂度不高。 [0006] The present invention is to solve the technical problem is to provide a vector quantization method and a voiceprint identifying system based on recognition performance and good noise immunity results were better recognition, data modeling small, fast decision speed, and complex It is not high.

[0007] —种基于矢量量化的声纹识别方法,具体步骤如下: [0007] - the vector quantization species voiceprint identification method based on the following steps:

[0008] 1、语音信号的采集:以程控交换综合实验箱的话机作为采集语音的终端设备,通过语首卡米集语首ί目号; [0008] 1, the speech signal acquisition: A comprehensive experiment telephone box terminal device as a controlled exchange of collected speech, language set by the language number of the first mesh header ί Cami;

[0009] 2、语音信号预处理:通过计算机将提取的语音信号进行分帧加窗操作,在分帧过程中一帧包括256个采样点,帧移为128个采样点,所加的窗函数为汉明窗;端点检测,采用基于短时能量和短时过零率相结合的端点检测法;预加重,加重系数的取值为0.9(Tl.00 ; [0009] 2, the speech signal pre-processing: by a computer speech signal extracted framing windowing operation, in the course of a sub-frame includes 256 samples, the frame shift of 128 samples, the increase of the window function Hamming window; endpoint detection, using endpoint detection method is based on short-time energy and zero crossing rate short combination; pre-emphasis, emphasis coefficient value is 0.9 (Tl.00;

[0010] 3、语音信号特征参数提取:采用MFCC参数,MFCC的阶数为12〜16 ; [0010] 3, speech feature parameter extraction: using MFCC parameters, MFCC order of 12~16;

[0011] 4、模板训练:采用LBG聚类算法为系统中的每一个说话人建立一个码本存储在语音数据库中作为该说话人的语音模板; [0011] 4, template training: the use of every speaker system to establish a codebook stored as the voice of the speaker in the speech template database LBG clustering algorithm;

[0012] 5、声纹辨识:通过将采集到的待识别语音信号特征参数与库中通过步骤1、2、3、4已建立的说话人语音模板进行比较,并根据加权欧式距离测度进行判断,若对应的说话人模板使得待识别的话者语音特征向量X具有最小平均距离测度,则认为识别出说话人。 [0012] 5, voiceprint identification: 1,2,3,4 collected by established to be recognized speech feature parameter in the library in step speaker utterance are compared, and judged based on the weighted Euclidean distance measure If the template such that the corresponding speaker speaker speech feature vectors to be recognized X having the minimum average distance measure, it is recognized that speaker.

[0013] 上述的语音信号特征参数提取步骤如下: [0013] The speech signal characteristic parameter extraction step as follows:

[0014] (I)将预处理后的语音信号进行短时傅里叶变换得到其频谱X(k),语音信号的DFT公式为: [0014] (I) the pre-processed speech signal short-time Fourier transform of the frequency spectrum X (k), DFT formula for the speech signal:

Figure CN102509547BD00051

[0016] 其中,x(n)为输入的以帧为单位的语音信号,N为傅里叶变换的点数,取256 ; [0016] where, x (n) is in units of frames of the input voice signal, N is the number of points of Fourier transform, taking 256;

[0017] (2)求频谱的平方,即能量谱丨1(幻|2 ,然后通过Mel频率滤波器对语音信号的频谱进行平滑,并消除谐波,凸显原先语音的共振峰; [0017] (2) determining a spectrum of the square, i.e., an energy spectrum Shu (Magic | 2, then smoothed by a Mel spectrum of the speech signal is a frequency filter, and eliminate harmonics, highlighting the formants of the original speech;

[0018] Mel频率滤波器是一组三角带通滤波器,中心频率为f(q) , q =1,2,…,Q,Q为三角带通滤波器的个数,Mel滤波器表示如下: [0018] Mel frequency filter is a set of belt-pass filter, the center frequency f (q), q = 1,2, ..., Q, Q is the number of belt-pass filter, the filter is expressed as follows Mel :

Figure CN102509547BD00052

[0020] (3)对滤波器组输出的Mel频谱取对数:压缩语音谱的动态范围;将频域中噪声的乘性成分转换成加性成分,对数Mel频谱巩如下: [0020] (3) taking the output of the Mel filter bank spectrum logarithmic: compressing the dynamic range of the speech spectrum; converting the frequency domain by the noise component added to the component, Gong logarithmic Mel spectrum is as follows:

Figure CN102509547BD00053

[0022] (4)离散余弦变换(DCT)[0023] 将公式(3)获得的对数Mel频谱巩€)变换到时域,其结果为Mel频率倒谱系数(MFCC),第η个系数Cf(S)的计算如下式: [0022] (4) a discrete cosine transform (DCT) [0023] Equation (3) obtained in logarithmic Mel spectrum Gong €) to the time domain, the result is a Mel Frequency Cepstral Coefficients (MFCC), the η coefficients Cf (S) is calculated by the following formula:

[0024] [0024]

Figure CN102509547BD00061

[0025] 其中,L为MFCC参数的阶数,Q为Mel滤波器的个数,L取12〜16,Q取23〜26 ; [0025] where, L is the order MFCC parameters, Q is the number of filter Mel, L taken 12~16, Q take 23~26;

[0026] 上述的模板训练时所采用LBG聚类算法的具体步骤如下: [0026] Specific steps LBG clustering algorithm used during training templates described above are as follows:

[0027] ( I)取得输入的特征矢量集合S中全体训练矢量X,并通过分裂码本法给定一个初 [0027] (I) to obtain the input feature vectors in the set S of all training vectors X, and by splitting a given First Law Code

始码本的码字If〉; Codebook codeword beginning If>;

[0028] (2)利用一个较小的阈值j , ;| = (X01,将一分为二,分裂的方法遵循下列规则: [0028] (2) the use of a smaller threshold j,; | = (X01, will be divided into two, a method of splitting the following rules:

Figure CN102509547BD00062

[0030] 分裂后,得到新码本的码字, if).; [0030] After cleavage, to give a new codeword codebook, if) .;

[0031] (3)根据最邻近准则,为新码本的码字寻找距离最近的码字,最后把S分成m个子集,即当时, [0031] (3) According to the nearest neighbor criteria, find the nearest codeword new codebook codeword, and finally into the m subsets S, at that time,

[0032] [0032]

Figure CN102509547BD00063

(6) (6)

[0033] 式中,M为当前初始码本中码字的个数; [0033] In the formula, M being the number of code words present in the current initial codebook;

[0034] (4)计算每个子集中特征矢量的质心,并用这个质心代替该集合中的码字,这样就得到了新的码本; [0034] (4) is calculated for each feature vector subset centroids, and replaced with the centroid of the set of codewords, thus obtaining a new codebook;

[0035] (5)通过第(3),(4)步进行的迭代计算,得到新的码本的码字if , ψ ; [0035] (5) through (3), (4) iteration calculations performed to obtain the new codebook codeword if, ψ;

[0036] (6)然后再重复第(2)步,将新得到的码字各分为二,接着再通过第(3),(4)步进行迭代计算,如此继续,直到所需的码本码字数是M= Z,r是整数,则共需要做r轮上述的循环处理,直到聚类完毕,此时,各类的质心即为所需的码字。 [0036] (6) and then repeat step (2), a new codeword obtained was divided into two, followed by the iterative calculation of (3), (4) step, and so on until the desired code this code word is M = Z, r is an integer, the need to make a total of r rounds aforementioned loop processing until the clustering is completed at this time, various types of centroid is the desired code word.

[0037] 上述的LBG聚类算法中的初始码本,采用分裂码本法进行码本初始化,具体过程如下: [0037] The above-described initial clustering algorithm LBG codebook, using the present code-division code Law initialization procedure is as follows:

[0038] (I)将提取出来的所有帧的特征矢量的均值作为初始码本的码字Jf > ; [0038] (I) the mean feature vectors extracted from all frames as an initial codebook codeword Jf>;

[0039] (2)将if >根据以下规则分裂,形成2m个码字; [0039] (2) if> according to the following rules split to form 2m codewords;

Figure CN102509547BD00064

[0041] 其中m是从I变化到当前码本的码字数,J是分裂时的参数,取;I = 0.01 ; [0041] wherein m is from I to change the current codebook code words, J is the parameter of division, taken; I = 0.01;

[0042] (3)根据新的码字把所有的特征矢量进行聚类,然后计算总距离测度D和£)': [0042] (3) according to the new code word all the cluster feature vectors, and then calculates the total distance measure D and £) ':

Figure CN102509547BD00071

[0044] £)'为下一次迭代的总距离测度,diXX)为训练特征矢量X与训练出来的码本 [0044] £) 'for the next iteration of the total distance measure, diXX) the training feature vectors X and trained codebook

之间的距离测度; The distance between the measure;

[0045] 计算相对距离测度: [0045] The calculation of the relative distance measure:

Figure CN102509547BD00072

[0046] 若( e<10-5 ),则停止迭代计算,当前的码本就是设计好的码本,否则,转下一步。 [0046] When (e <10-5), the iteration is stopped, the current codebook is designed codebook, otherwise, take the next step.

[0047] (4)重新计算各个区域的新质心; [0047] (4) the newly calculated centroid of each region;

[0048] (5)重复第(3)步和第(4)步,直到形成一个2m个码字的最佳的码本; [0048] (5) Repeat step (3) and (4) step, until a best codebook a code word of 2m;

[0049] (6)重复第(2)、(3)和(4)步,直到形成有M个码字的码本; [0049] (6) repeat (2), (3) (4) and the step, until there is formed a codebook of M codewords;

[0050] 上述的离散余弦变换时,L=13,Q=25。 [0050] When the above-described discrete cosine transform, L = 13, Q = 25.

[0051] 一种基于矢量量化的声纹识别系统,组成如下: [0051] A vector quantization based on voiceprint recognition system, the following composition:

[0052] 语音信号采集模块、语音信号预处理模块,语音信号特征参数提取模块,语音模板训练模块和声纹识别模块。 [0052] The audio signal acquisition module, a voice signal pre-processing module, a speech feature parameter extraction module, a voice and a voiceprint identification training module template module.

[0053] 本发明与现有技术相比的有益效果是: [0053] Advantageous effects of the present invention compared to the prior art are:

[0054] 通过语音卡采集语音信号,利用语音信号处理技术对采集到的语音信号进行预处理,然后提取语音信号特征参数,利用矢量量化技术对得到的语音信号特征参数建立语音模型从而构建一个说话人识别系统。 [0054], using the speech signal processing the collected speech signal pre-processed by the speech voice signal acquisition card, and then extracting characteristic parameters of the speech signal by vector quantization characteristic parameters of speech signals obtained thereby construct a model establishing a voice speaking recognition system. 采用MFCC参数,具有良好识别性能和抗噪能力且能充分模拟人耳感知能力,在说话人识别中最有用的说话人信息包含在MFCC参数的第2阶到16阶之间;通过采用矢量量化(VQ)法,具有良好识别性能和抗噪能力,实时性强,识别效果好,建模数据少,算法简单,判决速度快,而且复杂度不高。 Using MFCC parameters, good recognition performance and noise immunity and can adequately simulate the human ear perception, the most useful in the speaker recognition speaker information is included in the second stage between the stage 16 MFCC parameters; through the use of vector quantization (VQ) method, having good noise immunity and recognition performance, real-time, good recognition results less modeling data, a simple algorithm, the decision speed is fast, and low complexity.

附图说明 BRIEF DESCRIPTION

[0055] 图1是本发明的系统框图; [0055] FIG. 1 is a system block diagram of the present invention;

[0056] 图2是本发明的主流程图; [0056] FIG 2 is a main flowchart according to the present invention;

[0057] 图3是LBG算法流程图; [0057] FIG. 3 is a flowchart showing an algorithm LBG;

[0058] 图4是基于VQ的声纹识别人机交互界面。 [0058] FIG. 4 is a voiceprint identification based VQ interactive interface.

具体实施方式 Detailed ways

[0059] 如图1所示,该基于矢量量化的声纹识别系统,由软硬件结合完成对说话人语音的识别,组成如下: [0059] 1, the vector quantization based on voiceprint recognition system, the hardware and software to complete the identification of the speaker's voice, the following composition:

[0060] 语音信号采集模块、语音信号预处理模块,语音信号特征参数提取模块,语音模型训练模块和声纹识别模块。 [0060] The audio signal acquisition module, a voice signal pre-processing module, a speech feature parameter extraction module, the speech model training module voiceprint identification module.

[0061] 如图2〜图3所示,该基于矢量量化的声纹识别方法的具体步骤如下: [0061] As shown in FIG. 2 ~ 3, the particular vector quantization step voiceprint identification method is as follows:

[0062] 1、语音信号的采集 [0062] 1, collected voice signal

[0063] 语音信号的采集是将原始的语音模拟信号转换为数字信号,设置通道号、采样频率,本发明以采用杭州三汇公司生产的SHT-8B/PCI型语音卡进行语音信号的采集,通道号为2 (语音卡默认通道号为2),采样频率为SKHz (语音卡默认采样频率)。 [0063] The voice signal acquisition is to convert the original analog voice signals to digital signals, set the channel number, a sampling frequency, the present invention is to employ Synway produced SHT-8B / PCI type speech cards to collect the speech signal, channel No. 2 (default channel voice card number is 2), the sampling frequency is SKHz (voice card default sampling frequency). 识别的终端设备为实验用程控交换综合实验箱的电话机,且程控交换实验箱的的交换方式为空分交换,话路为甲二路(共四路:甲一路,甲二路,乙一路,乙二路,本发明随机选取甲二路,对实验结果无影响)。 Identifying the terminal equipment controlled exchange experiment experimental box integrated telephone, and exchange experiments controlled exchange tank for space division switching, is a methyl Road session (four total: A way, A Road, all the way B , b Road, the present invention is randomly selected A Road, had no effect on the experimental results).

[0064] 2、语音信号的预处理 [0064] 2, the speech signal pre-processing

[0065] (I)加窗分帧 [0065] (I) framing windowing

[0066] 语音信号的时变特性决定对其进行处理必须在一小段语音上进行,因此要对其进行分帧处理,同时为了保证语音信号不会因为分帧而导致信息的丢失,帧与帧之间要保证一定的重叠,即帧移,帧移与帧长的比值一般在(Γ1/2之间。本发明中使用的帧长为256个采样点,帧移为128个采样点。窗函数w(«)采用平滑特性较好的汉明窗函数,如下所示: [0066] The time-varying characteristics of the speech signal processing must be determined in a short speech, and therefore subjected to framing processing, and in order to ensure that a speech signal will not lead to loss of information sub-frame, the frame and the frame to ensure a certain overlap between, i.e. frame shift, the frame shift is the ratio of the frame length is generally between (Γ1 / 2. in the present invention the frame length is 256 samples, the frame shift of 128 samples. window function W (<<) smoothing using the characteristics of a good Hamming window function, as follows:

[0067] [0067]

Figure CN102509547BD00081

[0068] 式中N为窗口长度,本发明为256个点。 [0068] wherein N is the window length, the present invention is 256 points.

[0069] (2)端点检测 [0069] (2) endpoint detection

[0070] 本发明采用基于短时能量和短时平均过零率相结合的端点检测法对语音信号进行端点检测,从而判断语音信号的起始点和终止点。 [0070] The present invention employs a speech signal endpoint detection based on short-time energy and short-time average zero-crossing rate combined endpoint detection method, to determine starting and ending points of the voice signal. 短时能量检测浊音,过零率检测清音。 Short-term energy detecting voiced, unvoiced detection of zero-crossing rate.

假设坤)为语音信号,Ο为汉明窗函数,则定义短时能量尾为 Kun assumed) a voice signal, o Hamming window function, is defined as the short-time energy tail

Figure CN102509547BD00082

[0072] 式中,= w20),艮表示语音信号的第n个点开始加窗函数时的短时能量。 [0072] In the formula, = w20), short-term energy Gen indicates the n-th point voice signal starts windowing function.

[0073] 短时平均过零率为: [0073] The short-time average zero crossing rate:

I φ I φ

Figure CN102509547BD00083

[0075] 式中,N是窗函数的长度,是符号函数,即 [0075] In the formula, N is the length of the window function, is the sign function, i.e.,

Figure CN102509547BD00084

[0076] (3)预加重 [0076] (3) pre-emphasis

[0077] 由于语音信号的平均功率谱受到声门激励和口鼻辐射的影响,高频端大约在8000Hz以上按6dB/倍程跌落,为此要进行预加重处理以提升语音信号的高频部分,使信号的频谱变得平坦。 High frequency portion [0077] Since the average power spectrum of the speech signal and glottal excitation affected muzzle radiation, high frequency end above about 8000Hz press 6dB / octave fall, this should be pre-emphasis to improve speech signal the spectrum of the signal flat. 预加重用6dB/倍程的具有提升高频特性的数字滤波器来实现,它一般是一阶的数字滤波器I(Z),即 With pre-emphasis 6dB / octave digital filter having frequency characteristics to achieve enhance, it is generally a first order digital filter I (Z), i.e.,

[0078] H (z) = 1- uz~l (13) [0078] H (z) = 1- uz ~ l (13)

[0079] 其中u取值在0.9(Tl.00之间系统的识别率最高,本发明取u=0.97。 [0079] wherein u value the highest recognition rate in the system (between the Tl.00 0.9, the present invention takes u = 0.97.

[0080] 3、语音信号特征参数提取[0081] 语音信号特征参数提取就是从说话人的语音信号中提取出能够反映说话人个性的参数,具体过程如下: [0080] 3, speech feature parameter extraction [0081] The speech signal is extracted feature extraction to reflect the personality of the speaker from the speaker's speech parameter signal procedure is as follows:

[0082] (I)将预处理后的语音信号进行短时傅里叶变换(DFT)得到其频谱X(k)。 [0082] (I) the pre-processed speech signal short-time Fourier transform (DFT) to obtain its frequency spectrum X (k). 语音信号的DFT公式为: DFT equation voice signal is:

Figure CN102509547BD00091

[0084] 其中,x(n)为输入的以帧为单位的语音信号,N为傅里叶变换的点数,取256。 In units of frames of the speech signal [0084] where, x (n) as an input, N is the number of points of Fourier transform, taking 256.

[0085] (2)求频谱Ι(Α:)的平方,即能量谱|l#)f ,然后将它们通过Mel滤波器,以实现对语音信号的频谱进行平滑,并消除谐波,凸显原先语音的共振峰。 [0085] (2) request (Α :) squared, i.e. energy spectrum | l #) spectrum Ι f, and then filter them through Mel, in order to achieve smoothing spectrum of the speech signal, and eliminates harmonics, highlighting the original formant speech.

[0086] Mel频率滤波器是一组三角带通滤波器,中心频率为/(?) , q =1,2,…,Q,Q为三 [0086] Mel frequency filter is a set of belt-pass filter, the center frequency of / (?), Q = 1,2, ..., Q, Q is three

角带通滤波器的个数,Mel滤波器I4(Z)表示如下: The number of band pass filter corner, Mel filter I4 (Z) is expressed as follows:

Figure CN102509547BD00092

[0088] (3)对滤波器组的输出取对数:压缩语音谱的动态范围;将频域中噪声的乘性成分转换成加性成分,得到的对数Mel频谱5%)如下: [0088] (3) taking the logarithm of the output of the filter bank: dynamic range compression of the speech spectrum; converting the frequency domain by the noise component plus component to obtain logarithmic spectrum Mel 5%) as follows:

Figure CN102509547BD00093

[0090] (4)离散余弦变换(DCT) [0090] (4) a discrete cosine transform (DCT)

[0091] 将上述步骤获得的Mel频谱巩的变换到时域,其结果就是Mel频率倒谱系数(MFCC)。 [0091] The Gong Mel spectrum obtained in the above step to the time domain, the result is Mel Frequency Cepstral Coefficients (MFCC). 第η个系数C(rt)的计算如下式: Η of coefficient C (rt) is calculated by the following formula:

Figure CN102509547BD00094

[0093] 其中,L为MFCC的阶数,Q为Mel滤波器的个数,二者取值常依据实验情况来定。 [0093] where, L is the order of MFCC, Q is the number of the Mel filter, often based on both values ​​to set experimental conditions. 本实施例取L=13,Q=25,实际不受本实施例限制。 Take the present embodiment L = 13, Q = 25, the actual embodiment is not limited by the present embodiment.

[0094] 4、模板训练 [0094] 4, template training

[0095] (I)基本原理 [0095] (I) the basic principles

[0096] 在声纹识别中,一般是先用矢量量化的码本作为说话人的语音模板,即系统中每一个说话人的语音,被量化为一个码本存到语音库中作为该说话人语音模板。 [0096] In the voiceprint identification, it is common to use vector quantization codebook as a speaker utterance, i.e., the system of each speaker's voice, is quantized to a codebook stored into the speech database as the speaker voice template. 识别时对于任意输入的语音特征矢量序列提取特征参数,计算该语音特征参数对每一个语音模板的总平均失真量化误差,总平均误差最小的模板所对应的说话人即为识别结果。 For any input speech feature vector extracting a feature parameter sequence is recognized, the speech feature parameter calculating quantization distortion error for each of a total average utterance total average error smallest speaker is the template corresponding to the recognition result.

[0097] (2)距离测度[0098] 设未知模式的K维特征矢量为X,与码本中某个K维码字矢量Y进行比较,H分别表示X和Y的同一维分量,则欧式距离测度#17)为: K-dimensional feature vector [0097] (2) Measure the distance [0098] X is disposed unknown mode, with a K-dimensional codebook codeword vector Y are compared, H denote the same dimension X and Y components of the European distance measure # 17) as follows:

[0099] [0099]

Figure CN102509547BD00101

[0100] 对于传统的欧氏距离测度特征矢量的各分量是等权重的,这只有当特征矢量的自然分布为球状或接近于球状时,也就是说当特征矢量的各分量的分布接近于相等时才能取得较好的识别效果。 [0100] For each component of the conventional Euclidean distance measure feature vector is equal weight, which only when the natural distribution of feature vectors is spherical or close to spherical, i.e. when the distribution of each component of the feature vector is close to equal when in order to achieve better recognition results. 而语音信号是成椭圆状的正态分布,各矢量的分布不相等,他们在欧氏距离测度中没有得到很好的反应,若直接采用欧式距离测度对说话人进行判决,系统的识别率将会受到影响。 The speech signal is a normal distribution into an ellipsoidal shape, unequal distribution of vectors, they are not well reaction Euclidean distance measure, if direct use of the Euclidean distance measure for the judgment of the speaker, the recognition rate will It will be affected.

[0101] 本发明采用13阶的MFCC,为了体现他们在聚类的不同贡献,采用加权的欧式距离测度,对不同分布的矢量赋予不同的权重,分布较离散的矢量赋予很小的权重,对于分布较集中的矢量赋予很大的权重。 [0101] The present invention uses 13 order MFCCs, to reflect their different contributions in clusters, weighted Euclidean distance measure, vector different distributions give different weights distribution is more discrete vector given lower weight for the more concentrated distribution of vector given great weight. 分布的离散程度用矢量到聚类中心(矢量均值)的欧氏距离来衡量,加权因子w(t)为: With the degree of dispersion of the distribution to the cluster center vector (average vector) is the Euclidean distance measure, the weighting factor w (t) is:

[0102] [0102]

Figure CN102509547BD00102

[0103] 上式中的K为特征矢量的维数。 [0103] Dimension of formula K as feature vectors. 在训练及识别时将得到的欧式距离进行降序排列,然后用加权因子进行预加重,此过程实质上等效于在训练及识别时采用不加权的欧式距离,而对特征矢量的各维分量用比例因子进行预加重,这样对排序很高有破坏性质的矢量,如孤立点或者噪声赋予很小的权重,而对排序很低的好的矢量赋予较大的权重,从而各个矢量对识别的贡献得到很好的体现。 When the training and recognition Euclidean distance obtained in descending order, and then the pre-emphasis weighting factors, this process is substantially equivalent to a weighted Euclidean distance is not employed in training and recognition, and each component of the feature vector dimension with scale factor pre-emphasis, so that there is destructive nature of the highly ordered vector, or as an isolated point noise impart very small weight, while the low ordering vector impart good greater weight to the contribution of each vector to the recognition of well represented.

[0104] (3)模板训练 [0104] (3) training template

[0105] 本发明采用的是基于分裂法的LBG算法,具体步骤如下: [0105] The present invention employs the LBG algorithm based on a splitting method, the following steps:

[0106] I)取得输入的特征矢量集合S中全体训练矢量X,并通过分裂码本(码本即矢量 [0106] I) to obtain the input feature vectors in the set S of all training vectors X, and by splitting codebook (i.e. codebook vector

集,或者说是码字的集合)法给定一个初始码本的码字if〉; Set, or a set of codewords) method Given an initial codebook codeword if>;

[0107] 2)利用一个较小的阈值J (^ = α01)将一分为二,分裂的方法遵循下列规则: [0107] 2) using a small threshold value J (^ = α01) will be divided into two, a method of splitting the following rules:

[0108] [0108]

Figure CN102509547BD00103

[0109] 分裂后,得到新码本的码字if',if' ; After [0109] split to obtain a new codeword if codebook ', if';

[0110] 3)根据最邻近准则,为新码本的码字寻找距离最近的码字,最后把S分成m个子集,即当if €5»时, [0110] 3) The nearest neighbor criterion, to find the nearest codeword new codebook codeword, and finally into the m subsets S, i.e. when if € 5 »,

[0111] [0111]

Figure CN102509547BD00104

[0112] 式中,M为当前初始码本中码字的个数; [0112] In the formula, M being the number of code words present in the current initial codebook;

[0113] 4)计算每个子集中特征矢量的质心,并用这个质心代替该集合中的码字,这样就得到了新的码本;[0114] 5)通过第3),4)步进行的迭代计算,得到新的码本的码字Y1,Y2; [0113] 4) calculated for each subset of feature vector centroid, and with the center of mass in place of the code word in the set, thus obtaining a new codebook; [0114] 5) 3), 4) to carry out step iteration calculation, the new codeword codebook Y1, Y2;

[0115] 6)然后再重复第2)步,将新得到的码字各分为二,接着再通过第3),4)步进行迭代计算,如此继续,直到所需的码本码字数是M= Z (r是整数),则共需要做r轮上述的循环处理,直到聚类完毕,此时,各类的质心即为所需的码字。 [0115] 6) then repeat step 2), to give a new codeword for each divided into two, followed by 3), 4) for iterative calculation steps, and so on until the desired codebook words is M = Z (r is an integer), the need to make a total of r rounds aforementioned loop processing until the clustering is completed at this time, various types of centroid is the desired code word.

[0116] 上述的LBG聚类算法中的初始码本,采用分裂码本法进行码本初始化,具体过程如下: [0116] The above-described initial clustering algorithm LBG codebook, using the present code-division code Law initialization procedure is as follows:

[0117] 将提取出来的所有帧的特征矢量的均值作为初始码本的码字F(P); Mean [0117] The feature vectors extracted from all frames as an initial codebook codeword F (P);

[0118] 将#:))根据以下规则分裂,形成2m个码字; [0118] A :) #) according to the following rules split to form 2m code words;

Figure CN102509547BD00111

[0120] 其中m是从I变化到当前码本的码字数,Λ是分裂时的参数,本发明取J = 0.01 ; [0120] wherein m is from I to change the current codebook code words, Lambda parameter is split, the present invention takes J = 0.01;

[0121] ③根据新的码字把所有的特征矢量进行聚类,然后计算总距离测度D和」: [0121] ③ The new code word all the cluster feature vectors, and then calculates and the total distance measure D ':

Figure CN102509547BD00112

[0123] 为下一次迭代的总距离测度,S(XJm)为训练特征矢量X与训练出来的码本< [0123] for the next iteration of the total distance measure, S (XJm) the training feature vectors X and trained codebook <

之间的距离测度。 The distance between the measure.

[0124] 计算相对距离测度: [0124] calculates a relative distance measure:

Figure CN102509547BD00113

[0125] 若σ'<ε(ε<10)则停止迭代计算,当前的码本就是设计好的码本,否则,转下一 [0125] When σ '<ε (ε <10) stops the iterative calculation, the current codebook is designed codebook, otherwise, turn next

步; step;

[0126] ④重新计算各个区域的新质心; [0126] ④ newly calculated centroid of each region;

[0127] ⑤重复③和④,直到形成一个2m个码字的最佳的码本; [0127] ⑤ and ③ is repeated ④, until a best codebook a code word of 2m;

[0128] ⑥重复②、③和④,直到形成有M个码字的码本; [0128] ⑥ repeated ②, ③ and ④, there until a codebook of M codewords;

[0129] 5、声纹辨识 [0129] 5 voiceprint identification

[0130] (I)提取长度为T的待识别说话人语音信号的特征矢量序列X=(X1,X2,...,Xr) [0130] (I) X is extracted feature vector sequence length to be recognized by T speaker speech signal = (X1, X2, ..., Xr)

在训练阶段所形成的语音库中的码本为:(Y1,Y2…YN) (N表示说话人个数)。 This speech database code in the training phase is formed: (Y1, Y2 ... YN) (N is the number of the speaker).

[0131] (2)计算特征矢量与库中已有的说话人的语音模板之间的距离测度,即求出 [0131] (2) calculating a distance measure between feature vectors in the library existing speaker utterance, i.e., to obtain

d(Xj,Ymki): d (Xj, Ymki):

Figure CN102509547BD00114

[0133] 式中,j表示X中第= …Γ;)帧的特征矢量,m表示第i个说话人的第m个码字,共有M个码字,K为特征矢量的维数。 [0133] In the formula, j represents the first X = ... Γ; feature vector) frame, m represents the i-th speaker in the m-th code words, a total of M codewords, K is the dimensionality of the feature vector. 加权因子W(Ar)为: Weighting factor W (Ar) is:

[0134] [0134]

Figure CN102509547BD00121

[0135] (3)计算X到第i个码本的平均距离测度只。 [0135] (3) Measurement of average distance calculation X i-th codebook only.

[0136] [0136]

Figure CN102509547BD00122

[0137] (4)计算A+1,得到所有的A為,…Ar。 [0137] (4) calculating A + 1, A is obtained for all, ... Ar.

[0138] (5)求出中最小者对应的那个i,即是所求的那个人。 [0138] (5) that corresponds to the smallest i is determined, i.e. the person is asked for.

[0139] 本系统属于闭集识别,也就是说所有待识别的说话人都属于已知的说话人集合。 [0139] The system is a closed set identification, i.e. the speaker to be recognized that all belong to the known set of speakers. 说话人识别的人机交互界面如图4所示。 Speaker Recognition interactive interface shown in Fig. 在声纹识别系统的人机交互界面中,“语音卡状态显示”列表视图显示当前语音卡可用的语音通道号及通道状态;“语音样本库”列表视图显示当前语音样本库中的说话人样本数目及说话人姓名。 In the man-machine interface voiceprint recognition system, "Voice Card Status Display" view shows the current list of available speech cards voice channel number and channel status; "speech sample library" list view displays the current speaker speech samples in the sample database The number and the name of the speaker. “声纹识别参数设置”一栏显示语音采集所要设置的参数,包括:训练时长(默认23s),测试时长(默认15s)以及候选人个数(默认I)。 "Voiceprint identification parameter setting" column shows the voice collecting a parameter to be set, comprising: a length (23s of default) training length (15s default) and the number of candidates (the default I) test.

[0140] 以下结合实例进行具体说明:假设语音样本库中预先存了100个人的语音,当张XX拨通电话时,其声音如何识别的过程。 [0140] The following detailed description with examples: Suppose the speech samples stored in advance in the database 100 the personal speech, when Zhang XX phone call, the process of how to identify the sound.

[0141] 1、若张XX不属于已知的语音样本库 [0141] 1, if Zhang XX does not belong to a known speech sample library

[0142] (I)语音信号的采集:以程控交换综合实验箱的话机作为采集语音的终端设备,通过语音卡采集语音; [0142] (I) of the speech signal acquisition: A comprehensive experiment telephone box terminal device as a controlled exchange of collected speech, voice collected by the voice card;

[0143] 首先,设置需要采集的训练语音的“训练时长”参数(范围:10_39s),然后在姓名编辑框中添加说话人的姓名“张XX”,点击“添加说话人”按钮。 [0143] First, set up voice training to be collected "long training" parameter (range: 10_39s), and then add the speaker's name in the name edit box "Zhang XX", click "Add speaker" button. 添加完成后点击“确定”,然后拨通程控交换综合实验箱的电话(号码:8700),接通后,语音卡通道2 (默认为通道2)的状态更新为“录音中”,此时语音卡就可以进行采集语音。 After completion click "OK", then the program-controlled telephone switching dial comprehensive experimental box (number: 8700), after switching, voice (the default is Channel 2) of the card passage 2 is updated to the state of "recording", then the voice card can be collected voice. 采集的语音达到预定的训练时长,电话会自动挂断; When the train reaches a predetermined length collection of voice, the phone will automatically hang up;

[0144] (2)语音信号的预处理:通过计算机和VC软件结合将提取的语音信号进行分帧加窗操作,在分帧过程中一帧包括256个采样点,帧移为128个采样点,所加的窗函数为汉明窗;端点检测,采用基于短时能量和短时过零率法相结合的检测法;预加重,加重系数的值为0.97 ; [0144] (2) pre-processing speech signals: VC extracted by combining software and computer signals framed speech windowing operation, in the course of a sub-frame includes 256 samples, the frame shift of 128 samples the applied window function is a Hamming window; endpoint detection, detection based on the use of short-time short-time energy and zero crossing rate Combination of; pre-emphasis, emphasis coefficient is 0.97;

[0145] (3)提取语音信号特征参数:利用计算机与VC软件结合提取13阶的MFCC参数; [0145] (3) extracting speech feature parameters: using computer software and VC combined extracts order MFCC parameters 13;

[0146] (4)模板训练:利用分裂码本法对码本进行初始化,然后采用LBG聚类算法为系统中的每一个说话人建立一个码本存储在语音数据库中作为该说话人的语音模板; [0146] (4) Template Training: Law on the use of code division codebook initialized, and then using LBG clustering algorithm for each speaker in the system to establish a codebook stored as the speaker in the speech database utterance ;

[0147] (5)说话人识别 [0147] (5) Speaker Recognition

[0148] 首先,设置需要采集的测试语音的“测试时长”参数(范围:5-20s),拨通程控交换综合实验箱的电话(号码:8700),利用语音卡(通道为2)采集语音。 [0148] First, a voice test to be collected "long test" parameter (range: 5-20s), called program-controlled telephone exchange comprehensive experimental box (number: 8700), using a speech card (2 channels) acquiring speech . 采集的语音达到预定的测试时长,电话会自动挂断; When the collected voice reaches a predetermined length of the test, the phone will automatically hang up;

[0149] 然后软件禁止“进行说话人辨识”按钮使用,对说话人的语音进行步骤(2 )、( 3 )的操作,最后将提取的待测试的说话人的语音与库中的语音模板进行比较,点击“进行说话人辨识”按钮,选择要显示的候选人数(范围1-3),若对应的说话人模板使得待识别的话者语音特征向量X具有最小平均距离测度,则认为识别出说话人,同时在“说话人辨识”视图列表上显示辨识结果“张XX”和识别度。 [0149] The software then prohibition "in the speaker identification" button is used for the speaker's voice step (2), (3) operations, and finally the speaker extracted to be tested voice in the library of speech templates Comparative click "for speaker identification" buttons to select the number of candidates to be displayed (range 1-3), if the corresponding template such that the speaker's voice to be recognized word feature vector X having the minimum average distance measure, it is recognized that speaker people, also shows the identification results, "Zhang XX" and in recognition of the "speaker identification" view list.

[0150] 2、若张XX属于已知的语音样本库 [0150] 2, if the sheet is a known speech sample XX library

[0151] 若张XX属于已知的语音样本库则直接进行说话人辨识:首先,设置需要采集的测试语音的“测试时长”参数(范围:5-20s),拨通程控交换综合实验箱的电话(号码:8700),利用语音卡(通道为2)采集语音。 [0151] When Zhang XX is a known speech sample library is directly speaker identification: First, set the test to be collected voice of "test duration" parameter (range: 5-20s), called the program-controlled exchange of comprehensive experimental box telephone (number: 8700), the speech using a speech card collection (channel 2). 采集的语音达到预定的测试时长,电话会自动挂断; When the collected voice reaches a predetermined length of the test, the phone will automatically hang up;

[0152] 然后软件禁止“进行说话人辨识”按钮使用,对说话人的语音进行步骤(2 )、( 3 )的操作,最后将提取的待测试的说话人的语音与库中的语音模板进行比较,若对应的说话人模板使得待识别的话者语音特征向量X具有最小平均距离测度,则认为识别出说话人,同时在“说话人辨识”视图列表上显示辨识结果“张XX”和识别度。 [0152] The software then prohibition "in the speaker identification" button is used for the speaker's voice step (2), (3) operations, and finally the speaker extracted to be tested voice in the library of speech templates comparison, if the corresponding speaker template so to be recognized speaker speech feature vector X having the minimum average distance measure is considered to identify the speaker while displaying the recognition result "Zhang XX" and the degree of recognition on the "speaker identification" view list .

Claims (2)

1.一种基于矢量量化的声纹识别方法,其特征是,具体步骤如下: (1)、语音信号的采集:以程控交换综合实验箱的话机作为采集语音的终端设备,通过语首卡米集语首ί目号; (2)、语音信号预处理:通过计算机将提取的语音信号进行分帧加窗操作,在分帧过程中一帧包括256个采样点,帧移为128个采样点,所加的窗函数为汉明窗;端点检测,采用基于短时能量和短时过零率相结合的端点检测法;预加重,加重系数的取值为0.9(Tl.0O ; (3)、语音信号特征参数提取:采用MFCC参数,MFCC的阶数为12〜16 ; (4)、模板训练:采用LBG聚类算法为系统中的每一个说话人建立一个码本存储在语音数据库中作为该说话人的语音模板,所采用LBG聚类算法的具体步骤如下: (4.1)取得输入的特征矢量集合S中全体训练矢量X,并通过分裂码本法给定一个初始码本的码字 YP ; (4.2)利用一个较 A vector quantization based on voiceprint identification method, characterized in that the specific steps are as follows: (1) speech signal acquisition: A program-controlled telephone exchange comprehensive experiment box terminal device as voice collected by the first language Cami set language ί first mesh number; (2), the speech signal pre-: computer by dividing the extracted speech signal frame windowing operation, in the course of a sub-frame includes 256 samples, the frame shift of 128 samples the applied window function is a Hamming window; endpoint detection, using endpoint detection method is based on short-time energy and zero crossing rate short combination; pre-emphasis, emphasis coefficient value is 0.9 (Tl.0O; (3) , speech feature parameter extraction: using MFCC parameters, MFCC order of 12~16; (4), the template training: LBG clustering algorithm using a code established for each speaker in the present system is stored in the speech database as specific steps of the speaker's utterance, the LBG clustering algorithm used is as follows: (4.1) obtaining the input feature vectors of all the training vectors in the set S X, and by splitting a given initial code Law codebook codeword YP ; (4.2) using a relatively 的阈值Λ , ^ = ,将if5一分为二,分裂的方法遵循下列规则: The threshold value Λ, ^ =, the if5 into two, split method follow these rules:
Figure CN102509547BC00021
分裂后,得到新码本的码字if; (4.3)根据最邻近准则,为新码本的码字寻找距离最近的码字,最后把S分成m个子集,即当Xe风《)时, ά (if,1^) <d (Ar1Ijw) m = \,......M,m 羊i (6) 式中,M为当前初始码本中码字的个数; (4.4)计算每个子集中特征矢量的质心,并用这个质心代替该集合中的码字,这样就得到了新的码本; (4.5)通过第3步、第4步进行的迭代计算,得到新的码本的码字if , ; (4.6)然后再重复第2步,将新得到的码字各分为二,接着再通过第3步、第4步进行迭代计算,如此继续,直到所需的码本码字数是J/= Z,r是整数,则共需要做r轮上述的循环处理,直到聚类完毕,此时,各类的质心即为所需的码字。 After the split, to obtain the codeword if a new codebook; (4.3) The nearest neighbor criteria, find the nearest codeword new codebook codeword, and finally the S into m subsets, i.e., when the Xe wind "), the ά (if, 1 ^) <d (Ar1Ijw) m = \, ...... m, m sheep i (6) formula, m is the number of the current initial codebook codeword; (4.4) is calculated each subset of feature vector centroid, and with the center of mass in place of the code word in the set, thus obtaining a new codebook; (4.5) through step 3, the iteration step 4 is calculated to obtain the new codebook codeword if,; (4.6) then repeat step 2, new codeword obtained was divided into two, followed by step 3, step 4 iterative calculation, and so on until the desired codebook word is J / = Z, r is an integer, the need to make a total of r rounds aforementioned loop processing until the clustering is completed at this time, various types of centroid is the desired code word. (5)、声纹辨识:通过将采集到的待识别语音信号特征参数与库中通过第I步〜第4步已建立的说话人语音模板进行比较,并根据加权欧式距离测度进行判断,若对应的说话人模板使得待识别的话者语音特征向量X具有最小平均距离测度,则认为识别出说话人。 (5), voiceprint identification: By collected to be recognized speech feature parameters are compared library by step I ~ Step 4 speaker utterance established, and is determined based on the weighted Euclidean distance measurement and, if corresponding to the template such that the speaker's voice to be recognized word feature vector X having the minimum average distance measure, it is recognized that speaker.
2.根据权利要求1所述的基于矢量量化的声纹识别方法,其特征是,LBG聚类算法中的初始码本,采用分裂码本法进行码本初始化,具体过程如下: (1)将提取出来的所有帧的特征矢量的均值作为初始码本的码字if); (2)将If >根据以下规则分裂,形成2m个码字; According to claim 1, said vector quantization based on voiceprint identification method, characterized in that, the initial codebook LBG clustering algorithm using this code-division code Law initialization procedure is as follows: (1) mean feature vectors extracted from all frames as an initial codebook codeword if); (2) the If> split according to the following rules, 2m codewords formed;
Figure CN102509547BC00031
其中m是从I变化到当前码本的码字数,/I是分裂时的参数,取2 = 0.01 ; (3)根据新的码字把所有的特征矢量进行聚类,然后计算总距离测度D和£)_: Wherein m is from I to change the current codebook code words, / I parameter is split, taking 2 = 0.01; (3) all the feature vectors based on the new cluster codeword, and then calculates the total distance measure D and £) _:
Figure CN102509547BC00032
D1为下一次迭代的总距离测度为训练特征矢量X与训练出来的码本圪之间的距离测度;计算相对距离测度: D1 for the next iteration measure the total distance is a distance measure between feature vectors X training and trained codebook ge; calculating a relative distance measure:
Figure CN102509547BC00033
若5'<£.( f<10_5 ),则停止迭代计算,当前的码本就是设计好的码本,否则,转下一步(4)重新计算各个区域的新质心; (5)重复第3步和第4步,直到形成一个2m个码字的最佳的码本; (6)重复第2、第3步、第4步,直到形成有M个码字的码本。 If 5 '<£ (f <10_5), an iterative calculation is stopped, the current codebook is designed codebook, otherwise, take the next step (4) the newly calculated centroid of each region; (5) Repeat 3 steps 3 and 4 until the optimal codebook formed a 2m codeword; (6) repeat 2, step 3, step 4, until there is formed a codebook of M codewords.
CN 201110450364 2011-12-29 2011-12-29 Method and system for voiceprint recognition based on vector quantization based CN102509547B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN 201110450364 CN102509547B (en) 2011-12-29 2011-12-29 Method and system for voiceprint recognition based on vector quantization based

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN 201110450364 CN102509547B (en) 2011-12-29 2011-12-29 Method and system for voiceprint recognition based on vector quantization based

Publications (2)

Publication Number Publication Date
CN102509547A CN102509547A (en) 2012-06-20
CN102509547B true CN102509547B (en) 2013-06-19

Family

ID=46221622

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 201110450364 CN102509547B (en) 2011-12-29 2011-12-29 Method and system for voiceprint recognition based on vector quantization based

Country Status (1)

Country Link
CN (1) CN102509547B (en)

Families Citing this family (25)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103794207A (en) * 2012-10-29 2014-05-14 西安远声电子科技有限公司 Dual-mode voice identity recognition method
CN103714826B (en) * 2013-12-18 2016-08-17 讯飞智元信息科技有限公司 Formant automatic matching method towards vocal print identification
CN103794219B (en) * 2014-01-24 2016-10-05 华南理工大学 A kind of Codebook of Vector Quantization based on the division of M code word generates method
CN104485102A (en) * 2014-12-23 2015-04-01 智慧眼(湖南)科技发展有限公司 Voiceprint recognition method and device
CN105989842B (en) * 2015-01-30 2019-10-25 福建星网视易信息系统有限公司 The method, apparatus for comparing vocal print similarity and its application in digital entertainment VOD system
CN104994400A (en) * 2015-07-06 2015-10-21 无锡天脉聚源传媒科技有限公司 Method and device for indexing video by means of acquisition of host name
CN106340298A (en) * 2015-07-06 2017-01-18 南京理工大学 Voiceprint unlocking method integrating content recognition and speaker recognition
CN105304087B (en) * 2015-09-15 2017-03-22 北京理工大学 Voiceprint recognition method based on zero-crossing separating points
CN105355206A (en) * 2015-09-24 2016-02-24 深圳市车音网科技有限公司 Voiceprint feature extraction method and electronic equipment
CN105355195A (en) * 2015-09-25 2016-02-24 小米科技有限责任公司 Audio frequency recognition method and audio frequency recognition device
CN106920558A (en) * 2015-12-25 2017-07-04 展讯通信(上海)有限公司 Keyword recognition method and device
CN106971711A (en) * 2016-01-14 2017-07-21 芋头科技(杭州)有限公司 A kind of adaptive method for recognizing sound-groove and system
CN106971726A (en) * 2016-01-14 2017-07-21 芋头科技(杭州)有限公司 A kind of adaptive method for recognizing sound-groove and system based on code book
CN106971735B (en) * 2016-01-14 2019-12-03 芋头科技(杭州)有限公司 A kind of method and system regularly updating the Application on Voiceprint Recognition of training sentence in caching
CN106971729A (en) * 2016-01-14 2017-07-21 芋头科技(杭州)有限公司 A kind of method and system that Application on Voiceprint Recognition speed is improved based on sound characteristic scope
CN106971712A (en) * 2016-01-14 2017-07-21 芋头科技(杭州)有限公司 A kind of adaptive rapid voiceprint recognition methods and system
CN106981287A (en) * 2016-01-14 2017-07-25 芋头科技(杭州)有限公司 A kind of method and system for improving Application on Voiceprint Recognition speed
CN106057212B (en) * 2016-05-19 2019-04-30 华东交通大学 Driving fatigue detection method based on voice personal characteristics and model adaptation
CN106448682A (en) * 2016-09-13 2017-02-22 Tcl集团股份有限公司 Open-set speaker recognition method and apparatus
CN106847292B (en) * 2017-02-16 2018-06-19 平安科技(深圳)有限公司 Method for recognizing sound-groove and device
CN107039036A (en) * 2017-02-17 2017-08-11 南京邮电大学 A kind of high-quality method for distinguishing speek person based on autocoding depth confidence network
CN107068154A (en) * 2017-03-13 2017-08-18 平安科技(深圳)有限公司 The method and system of authentication based on Application on Voiceprint Recognition
CN108022584A (en) * 2017-11-29 2018-05-11 芜湖星途机器人科技有限公司 Office Voice identifies optimization method
CN108417226A (en) * 2018-01-09 2018-08-17 平安科技(深圳)有限公司 Speech comparison method, terminal and computer readable storage medium
CN108460081B (en) * 2018-01-12 2019-07-12 平安科技(深圳)有限公司 Voice data base establishing method, voiceprint registration method, apparatus, equipment and medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2011004098A1 (en) 2009-07-07 2011-01-13 France Telecom Allocation of bits in an enhancement coding/decoding for improving a hierarchical coding/decoding of digital audio signals
CN102231277A (en) 2011-06-29 2011-11-02 电子科技大学 Method for protecting mobile terminal privacy based on voiceprint recognition

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2011004098A1 (en) 2009-07-07 2011-01-13 France Telecom Allocation of bits in an enhancement coding/decoding for improving a hierarchical coding/decoding of digital audio signals
CN102231277A (en) 2011-06-29 2011-11-02 电子科技大学 Method for protecting mobile terminal privacy based on voiceprint recognition

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
张彩娟,霍春宝,吴峰,韦春丽.《改进K-means算法在声纹识别中的应用》.《辽宁工业大学学报》.2011,第31卷(第5期),第1-4节.

Also Published As

Publication number Publication date
CN102509547A (en) 2012-06-20

Similar Documents

Publication Publication Date Title
Togneri et al. An overview of speaker identification: Accuracy and robustness issues
Hansen et al. Speaker recognition by machines and humans: A tutorial review
Tiwari MFCC and its applications in speaker recognition
Prasanna et al. Extraction of speaker-specific excitation information from linear prediction residual of speech
US8731936B2 (en) Energy-efficient unobtrusive identification of a speaker
US20150301796A1 (en) Speaker verification
Campbell Speaker recognition: A tutorial
CN101261832B (en) Extraction and modeling method for Chinese speech sensibility information
EP2410514B1 (en) Speaker authentication
US8160877B1 (en) Hierarchical real-time speaker recognition for biometric VoIP verification and targeting
Wu et al. A study on spoofing attack in state-of-the-art speaker verification: the telephone speech case
CN102238189B (en) Voiceprint password authentication method and system
US20080046241A1 (en) Method and system for detecting speaker change in a voice transaction
Xiao et al. Spoofing speech detection using high dimensional magnitude and phase features: The NTU approach for ASVspoof 2015 challenge
Mashao et al. Combining classifier decisions for robust speaker identification
WO2014153800A1 (en) Voice recognition system
JP4802135B2 (en) Speaker authentication registration and confirmation method and apparatus
Martinez et al. Speaker recognition using Mel frequency Cepstral Coefficients (MFCC) and Vector quantization (VQ) techniques
CN1162838C (en) Speech intensifying-characteristic weighing-logarithmic spectrum addition method for anti-noise speech reorganization
WO2006099467A2 (en) An automatic donor ranking and selection system and method for voice conversion
Ahmad et al. A unique approach in text independent speaker recognition using MFCC feature sets and probabilistic neural network
CN103345923B (en) A kind of phrase sound method for distinguishing speek person based on rarefaction representation
CN101136199B (en) Voice data processing method and equipment
JP5106371B2 (en) Method and apparatus for verification of speech authentication, speaker authentication system
JP2007133414A (en) Method and apparatus for estimating discrimination capability of voice and method and apparatus for registration and evaluation of speaker authentication

Legal Events

Date Code Title Description
C06 Publication
C10 Request of examination as to substance
C14 Granted
C17 Cessation of patent right