CN111785285A - Voiceprint recognition method for home multi-feature parameter fusion - Google Patents
Voiceprint recognition method for home multi-feature parameter fusion Download PDFInfo
- Publication number
- CN111785285A CN111785285A CN202010439120.7A CN202010439120A CN111785285A CN 111785285 A CN111785285 A CN 111785285A CN 202010439120 A CN202010439120 A CN 202010439120A CN 111785285 A CN111785285 A CN 111785285A
- Authority
- CN
- China
- Prior art keywords
- feature parameter
- characteristic parameters
- voiceprint recognition
- recognition method
- filter
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 19
- 230000004927 fusion Effects 0.000 title claims abstract description 8
- 239000000203 mixture Substances 0.000 claims abstract description 6
- 238000011478 gradient descent method Methods 0.000 claims abstract description 4
- 238000001228 spectrum Methods 0.000 claims description 36
- 238000000605 extraction Methods 0.000 claims description 16
- 238000007781 pre-processing Methods 0.000 claims description 10
- 238000012545 processing Methods 0.000 claims description 8
- 238000001914 filtration Methods 0.000 claims description 5
- 210000000721 basilar membrane Anatomy 0.000 claims description 3
- 238000001514 detection method Methods 0.000 claims description 3
- 238000013139 quantization Methods 0.000 claims description 3
- 230000001755 vocal effect Effects 0.000 claims description 3
- 230000006835 compression Effects 0.000 claims description 2
- 238000007906 compression Methods 0.000 claims description 2
- 238000005070 sampling Methods 0.000 claims 1
- 230000007547 defect Effects 0.000 abstract description 2
- 238000004364 calculation method Methods 0.000 description 2
- 238000012546 transfer Methods 0.000 description 2
- 108010076504 Protein Sorting Signals Proteins 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000012790 confirmation Methods 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000009432 framing Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 210000000056 organ Anatomy 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 230000002194 synthesizing effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/06—Decision making techniques; Pattern matching strategies
- G10L17/10—Multimodal systems, i.e. based on the integration of multiple recognition engines or fusion of expert systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/02—Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/24—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Computational Linguistics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Business, Economics & Management (AREA)
- Game Theory and Decision Science (AREA)
- Complex Calculations (AREA)
Abstract
本发明公开了一种面向家居多特征参数融合的声纹识别方法,包括如下步骤:分别计算提取到语音信号的MFCC特征参数、GFCC特征参数和LPCC特征参数;分别利用MFCC特征参数、GFCC特征参数和LPCC特征参数训练三个混合高斯模型;将三个混合高斯模型的结果加权融合,进行软判决,设定阈值,用随机梯度下降法,得到最优的权重系数,输出最终的识别结果。本发明将MFCC特征参数、GFCC特征参数和LPCC特征参数进行融合,弥补了单一特征参数无法较好的表达说话人的特征的缺陷,从而大幅提高声纹识别准确度。
The invention discloses a voiceprint recognition method for household multi-feature parameter fusion, comprising the following steps: respectively calculating MFCC feature parameters, GFCC feature parameters and LPCC feature parameters extracted from speech signals; Three Gaussian mixture models are trained with LPCC feature parameters; the results of the three Gaussian mixture models are weighted and fused, soft decision is made, the threshold is set, and the stochastic gradient descent method is used to obtain the optimal weight coefficient and output the final recognition result. The present invention fuses the MFCC feature parameters, the GFCC feature parameters and the LPCC feature parameters, which makes up for the defect that a single feature parameter cannot better express the speaker's features, thereby greatly improving the accuracy of voiceprint recognition.
Description
技术领域technical field
本发明属于声纹识别领域,具体涉及一种面向家居多特征参数融合的声纹识别方法。The invention belongs to the field of voiceprint recognition, in particular to a voiceprint recognition method for household multi-feature parameter fusion.
背景技术Background technique
声纹识别也称为说话人识别,包括说话人辨认和说话人确认。声纹识别应用领域十分广泛,包括金融领域、军事安全、医疗领域以及家居安全领域等等。在许多声纹识别系统的识别之前,除了预处理操作外,特征参数和模型匹配对识别的准确率至关重要。Voiceprint recognition, also known as speaker recognition, includes speaker identification and speaker confirmation. Voiceprint recognition has a wide range of applications, including financial, military security, medical, and home security. Before the recognition of many voiceprint recognition systems, in addition to preprocessing operations, feature parameters and model matching are critical to the recognition accuracy.
传统单一的特征参数无法较好的表达说话人的语音特征,可能会产生过拟合,并且MFCC特征参数容易并模仿。除了单一的特征之外,许多学者将GFCC和MFCC直接相连接,形成新的特征参数向量,这样会带来维度灾难,同时增加系统的计算量。因此,目前的家居声纹识别算法无法满足较好的表达说话人的特征需求,其识别的准确率有待提高。The traditional single feature parameter cannot express the speaker's speech features well, which may cause over-fitting, and the MFCC feature parameters are easy to imitate. In addition to a single feature, many scholars directly connect GFCC and MFCC to form a new feature parameter vector, which will bring dimensional disaster and increase the computational complexity of the system. Therefore, the current home voiceprint recognition algorithm cannot meet the needs of expressing the characteristics of the speaker better, and its recognition accuracy needs to be improved.
发明内容SUMMARY OF THE INVENTION
发明目的:为了克服现有技术中存在的不足,提供一种面向家居多特征参数融合的声纹识别方法,有效的解决了单一特征参数无法完全表达说话人的语音特征的问题,提高了声纹识别的准确率。Purpose of the invention: In order to overcome the deficiencies in the prior art, a voiceprint recognition method for household multi-feature parameter fusion is provided, which effectively solves the problem that a single feature parameter cannot fully express the speaker's voice features, and improves the voiceprint performance. recognition accuracy.
技术方案:为实现上述目的,本发明提供一种面向家居多特征参数融合的声纹识别方法,包括如下步骤:Technical solution: In order to achieve the above purpose, the present invention provides a voiceprint recognition method for household multi-feature parameter fusion, comprising the following steps:
S1:分别计算提取到语音信号的MFCC特征参数、GFCC特征参数和LPCC特征参数;S1: Calculate the MFCC feature parameters, GFCC feature parameters and LPCC feature parameters extracted from the speech signal respectively;
S2:分别利用MFCC特征参数、GFCC特征参数和LPCC特征参数训练三个混合高斯模型;S2: Use MFCC feature parameters, GFCC feature parameters and LPCC feature parameters to train three mixture Gaussian models respectively;
S3:将三个混合高斯模型的结果加权融合,进行软判决,设定阈值,用随机梯度下降法,得到最优的权重系数,输出最终的识别结果。S3: The results of the three mixed Gaussian models are weighted and fused, soft decision is made, the threshold is set, and the stochastic gradient descent method is used to obtain the optimal weight coefficient and output the final recognition result.
进一步的,所述步骤S1中语音信号在进行特征参数提取之前经过预处理操作。Further, in the step S1, the speech signal undergoes a preprocessing operation before the feature parameter extraction is performed.
进一步的,所述步骤S1中预处理操作包括采样量化、预加重、分帧加窗、端点检测。Further, the preprocessing operations in the step S1 include sample quantization, pre-emphasis, frame-by-frame windowing, and endpoint detection.
进一步的,所述步骤S1中MFCC特征参数的提取过程为:Further, the extraction process of the MFCC feature parameters in the step S1 is:
A1)对输入的语音信号进行预处理,生成时域信号,对每一帧语音信号通过快速傅里叶变换或离散傅里叶变换处理得到语音线性频谱;A1) carry out preprocessing to the input speech signal, generate time domain signal, and obtain speech linear spectrum by fast Fourier transform or discrete Fourier transform processing for each frame of speech signal;
A2)将线性频谱输入Mel滤波器组进行滤波,生成Mel频谱,取Mel频谱的对数能量,生成相应的对数频谱;A2) filter the linear spectrum input Mel filter bank, generate Mel spectrum, get the logarithmic energy of Mel spectrum, generate corresponding logarithmic spectrum;
A3)通过使用离散余弦变换将对数频谱求解转换为MFCC特征参数。A3) Convert the logarithmic spectrum solution to MFCC characteristic parameters by using discrete cosine transform.
进一步的,所述步骤S1中GFCC特征参数的提取过程为:Further, the extraction process of the GFCC feature parameters in the step S1 is:
B1)将语音信号进行预处理,生成时域信号,通过快速傅里叶变换或离散傅里叶变换处理得到离散功率谱;B1) preprocessing the speech signal to generate a time-domain signal, and obtain a discrete power spectrum through fast Fourier transform or discrete Fourier transform processing;
B2)对离散功率谱求平方生成语音能量谱,使用Gammatone滤波器组进行滤波处理;B2) square the discrete power spectrum to generate the speech energy spectrum, and use the Gammatone filter bank for filtering;
B3)对每个Gammatone滤波器的输出进行指数压缩,获得一组指数能量谱;B3) exponentially compress the output of each Gammatone filter to obtain a set of exponential energy spectrum;
B4)使用离散余弦变换将指数能量谱转化为GFCC特征参数。B4) Using discrete cosine transform to transform exponential energy spectrum into GFCC characteristic parameters.
进一步的,所述步骤S1中LPCC特征参数的提取过程为:Further, the extraction process of the LPCC feature parameters in the step S1 is:
C1)设定声道模型的系统函数;C1) Set the system function of the channel model;
C2)设定系统函数的冲击响应,计算冲击响应的复倒谱;C2) Set the impulse response of the system function, and calculate the complex cepstrum of the impulse response;
C3)根据复倒谱与倒谱系数的关系,计算得到LPCC特征参数。C3) According to the relationship between the complex cepstrum and the cepstral coefficient, the LPCC characteristic parameters are obtained by calculation.
进一步的,所述步骤S3中识别结果的判定方式为:当加权融合的结果大于等于阈值时,识别为目标说话人,否则识别为非目标说话人。Further, the determination method of the recognition result in the step S3 is as follows: when the result of weighted fusion is greater than or equal to the threshold, the target speaker is recognized, otherwise, the non-target speaker is recognized.
有益效果:本发明与现有技术相比,将MFCC特征参数、GFCC特征参数和LPCC特征参数进行融合,弥补了单一特征参数无法较好的表达说话人的特征的缺陷,从而大幅提高声纹识别准确度。Beneficial effects: Compared with the prior art, the present invention fuses MFCC feature parameters, GFCC feature parameters and LPCC feature parameters, which makes up for the defect that a single feature parameter cannot better express the speaker's features, thereby greatly improving voiceprint recognition. Accuracy.
附图说明Description of drawings
图1为本发明方法的总体结构框图;Fig. 1 is the overall structure block diagram of the method of the present invention;
图2为MFCC特征参数提取流程图;Fig. 2 is a flowchart of MFCC feature parameter extraction;
图3为GFCC特征参数提取流程图。Figure 3 is a flowchart of GFCC feature parameter extraction.
具体实施方式Detailed ways
下面结合附图和具体实施例,进一步阐明本发明,应理解这些实施例仅用于说明本发明而不用于限制本发明的范围,在阅读了本发明之后,本领域技术人员对本发明的各种等价形式的修改均落于本申请所附权利要求所限定的范围。Below in conjunction with the accompanying drawings and specific embodiments, the present invention will be further clarified. It should be understood that these embodiments are only used to illustrate the present invention and not to limit the scope of the present invention. Modifications of equivalent forms all fall within the scope defined by the appended claims of this application.
如图1所示,本发明提供一种面向家居CNN分类与特征匹配联合的声纹识别方法,包括如下步骤:As shown in Figure 1, the present invention provides a voiceprint recognition method for household CNN classification and feature matching, comprising the following steps:
1)对输入的说话人的语音进行预处理,预处理包括采样量化、预加重、加窗和分帧、端点检测等。预处理目的是消除发声器官和语音采集设备的干扰,提高系统的识别率。1) Preprocess the input speaker's speech, including sample quantization, pre-emphasis, windowing and framing, endpoint detection, etc. The purpose of preprocessing is to eliminate the interference of vocal organs and voice acquisition equipment and improve the recognition rate of the system.
2)分别计算提取到语音信号的MFCC特征参数、GFCC特征参数和LPCC特征参数;2) Calculate the MFCC feature parameters, GFCC feature parameters and LPCC feature parameters extracted from the speech signal respectively;
3)分别利用MFCC特征参数、GFCC特征参数和LPCC特征参数训练三个混合高斯模型,分别为GMM模型A、GMM模型B和GMM模型C;3) Using MFCC feature parameters, GFCC feature parameters and LPCC feature parameters to train three Gaussian mixture models, which are GMM model A, GMM model B and GMM model C respectively;
4)将GMM模型A、GMM模型B和GMM模型C的结果加权融合,进行软判决,设定阈值,用随机梯度下降法,得到最优的权重系数,输出最终的识别结果。4) The results of GMM model A, GMM model B and GMM model C are weighted and fused, soft decision is made, threshold is set, and the stochastic gradient descent method is used to obtain the optimal weight coefficient and output the final recognition result.
如图2所示,本实施例中MFCC特征参数的提取过程为:As shown in Figure 2, the extraction process of the MFCC feature parameters in this embodiment is:
A1)对输入的语音信号s(n)进行预处理,生成时域信号x(n)(信号序列的长度N=256),接着,对每一帧语音信号通过快速傅里叶变换或离散傅里叶变换处理得到语音线性频谱X(k),可表示为:A1) Preprocess the input speech signal s(n) to generate a time-domain signal x(n) (the length of the signal sequence N=256), and then, perform fast Fourier transform or discrete Fourier transform on each frame of speech signal Lie transform processing obtains the speech linear spectrum X(k), which can be expressed as:
A2)将线性频谱X(k)输入Mel滤波器组进行滤波,生成Mel频谱,接着取Mel频谱的对数能量,生成相应的对数频谱S(m)。A2) Input the linear spectrum X(k) into the Mel filter bank for filtering to generate the Mel spectrum, and then take the logarithmic energy of the Mel spectrum to generate the corresponding logarithmic spectrum S(m).
这里,Mel滤波器组是一组三角带同滤波器Hm(k),且需满足0≤m≤M,其中M表示滤波器的数量,通常为20~28。带通滤波器的传递函数可以表示为:Here, the Mel filter bank is a set of triangular-strip filters H m (k), and needs to satisfy 0≤m≤M, where M represents the number of filters, usually 20-28. The transfer function of the bandpass filter can be expressed as:
式(2)中,f(m)为中心频率。In formula (2), f(m) is the center frequency.
其中,之所以对Mel能量频谱取对数,是为了促进声纹识别系统性能的提升。语音线性频谱X(k)到对数频谱S(m)的传递函数为:Among them, the reason for taking the logarithm of the Mel energy spectrum is to promote the improvement of the performance of the voiceprint recognition system. The transfer function from the speech linear spectrum X(k) to the logarithmic spectrum S(m) is:
A3)通过使用离散余弦变换(DCT)将对数频谱S(m)求解转换为MFCC特征参数,MFCC特征参数的第n维特征分量C(n)的表达式为:A3) The logarithmic spectrum S(m) is converted into MFCC feature parameters by using discrete cosine transform (DCT), and the expression of the nth dimension feature component C(n) of the MFCC feature parameters is:
通过上述步骤获得的MFCC特征参数仅反映语音信号的静态特性,可通过求其的一阶、二阶差分得到动态特性参数。The MFCC characteristic parameters obtained through the above steps only reflect the static characteristics of the speech signal, and the dynamic characteristic parameters can be obtained by calculating the first-order and second-order differences thereof.
本实施例中GFCC(Gammatone频率倒谱系数)特征参数的提取过程中应用到Gammatone滤波器,其设计方案为:In the present embodiment, the extraction process of the characteristic parameter of GFCC (Gammatone frequency cepstral coefficient) is applied to the Gammatone filter, and its design scheme is:
Gammatone滤波器组用于模拟耳蜗基底膜的听觉特性,其时域表达式如下:The Gammatone filter bank is used to simulate the auditory properties of the cochlear basilar membrane, and its time domain expression is as follows:
g(f,t)=tn-1e-2πbtcos(2πfi+φi)U(t),1≤i≤N (5)式中,N——滤波器个数;g(f, t)=t n-1 e -2πbt cos(2πf i +φ i )U(t), 1≤i≤N (5) In the formula, N——the number of filters;
n----滤波器级数,一般取4;n——Number of filter stages, generally 4;
i——滤波器序数;i——filter ordinal;
fi——滤波器的中心频率;f i ——the center frequency of the filter;
U(t)——单位阶跃函数;U(t)——unit step function;
bi——滤波器的衰减因子;b i ——the attenuation factor of the filter;
φi——序列为i的滤波器的相位,一般取0。φ i ——The phase of the filter whose sequence is i, generally 0.
每个滤波器的带宽与人耳的听觉临界频带有关,根据心理学的理论,临界频带可用等效矩形带宽来表达:The bandwidth of each filter is related to the auditory critical band of the human ear. According to the theory of psychology, the critical band can be expressed by the equivalent rectangular bandwidth:
滤波器的衰减因子bi与带宽有关,脉冲响应的衰减率由衰减因子bi决定。其表达式为:The attenuation factor b i of the filter is related to the bandwidth, and the attenuation rate of the impulse response is determined by the attenuation factor b i . Its expression is:
bi=1.019EBR(f) (7)b i = 1.019EBR(f) (7)
Gammatone滤波器的时域冲激函数是模拟函数,为了方便计算处理,需要对其离散化,对式(4)进行拉普拉斯变换有:The time-domain impulse function of the Gammatone filter is an analog function. In order to facilitate the calculation and processing, it needs to be discretized. The Laplace transform of formula (4) is as follows:
输入的语音信号s(n)与gi(n)经过卷积运算可得Gammatone滤波器的输出。The input speech signal s(n) and g i (n) can get the output of Gammatone filter through convolution operation.
GFCC特征参数的提取过程类似于MFCC特征参数的提取过程,只需要用Gammatone滤波器组代替传统的Mel滤波器组,这样就有效地利用了Gammatone滤波器的耳蜗基底膜特性,能很好地对语音信号进行非线性处理。The extraction process of GFCC feature parameters is similar to the extraction process of MFCC feature parameters. It only needs to use the Gammatone filter bank to replace the traditional Mel filter bank, which effectively utilizes the cochlear basilar membrane characteristics of the Gammatone filter The speech signal is processed non-linearly.
基于上述Gammatone滤波器,如图3所示,GFCC(Gammatone频率倒谱系数)特征参数的提取过程为:Based on the above Gammatone filter, as shown in Figure 3, the extraction process of GFCC (Gammatone frequency cepstral coefficient) characteristic parameters is as follows:
B1)首先将输入的语音信号s(n)进行预处理,生成时域信号x(n),通过快速傅里叶变换或离散傅里叶变换处理得到离散功率谱X(k),其表达式为:B1) First, the input speech signal s(n) is preprocessed to generate a time domain signal x(n), and the discrete power spectrum X(k) is obtained through fast Fourier transform or discrete Fourier transform processing, and its expression for:
B2)对离散功率谱X(k)求平方生成语音能量谱,然后使用Gammatone滤波器组进行滤波处理。B2) Square the discrete power spectrum X(k) to generate the speech energy spectrum, and then use the Gammatone filter bank for filtering.
B3)为了更好地改善声纹识别系统性能,对每个滤波器的输出进行指数压缩,获得一组指数能量谱s1,s2,…,sM:B3) In order to better improve the performance of the voiceprint recognition system, exponentially compress the output of each filter to obtain a set of exponential energy spectra s 1 , s 2 ,...,s M :
式中,e(f)——指数压缩值,M——滤波器通道数。In the formula, e(f)——exponential compression value, M——number of filter channels.
B4)最后,使用离散余弦变换(DCT)将指数能量谱转化为GFCC特征参数,其表达式为:B4) Finally, use discrete cosine transform (DCT) to transform the exponential energy spectrum into the GFCC characteristic parameters, whose expression is:
式中,L——特征参数的维度。In the formula, L is the dimension of the feature parameter.
本实施例中LPCC(线性预测倒谱系数)特征参数的提取过程为:In the present embodiment, the extraction process of LPCC (Linear Prediction Cepstral Coefficient) characteristic parameter is:
假设声道模型的系统函数如下:Suppose the system function of the vocal tract model is as follows:
式(12)中p是预测器的阶数。In equation (12), p is the order of the predictor.
设h(n)是H(z)的冲击响应,是h(n)的复倒谱,则Let h(n) be the impulse response of H(z), is the complex cepstrum of h(n), then
综合式(12)和式(13)两式,并对z-1求导,简化后可以得到:Synthesizing formula (12) and formula (13), and taking the derivative of z -1 , after simplification, we can get:
把公式(14)等号两边z-1各次幂的系数相加,可得到复倒谱,如下:The complex cepstrum can be obtained by adding the coefficients of the powers of z -1 on both sides of the equal sign of formula (14), as follows:
根据复倒谱与倒谱系数的关系:According to the relationship between complex cepstrum and cepstral coefficient:
可以计算得到线性预测倒谱系数:The linear prediction cepstral coefficients can be calculated:
其中c(n)为线性预测倒谱系数LPCC,an为线性预测系数。where c( n ) is the linear prediction cepstral coefficient LPCC, and an is the linear prediction coefficient.
本实施例中的步骤4中GMM模型A、GMM模型B和GMM模型C的混合度均取1024。三个模型的输出结果分别为a、b、c,对三个结果进行加权融合,权重系数为ωi且最终结果D=ω1a+ω2b+ω3c,设定阈值γ,当D大于等于阈值γ时,识别为目标说话人,否则识别为非目标说话人。In step 4 in this embodiment, the mixing degree of GMM model A, GMM model B, and GMM model C are all set to 1024. The output results of the three models are a, b, and c, respectively, and the three results are weighted and fused, and the weight coefficient is ω i and The final result is D=ω 1a +ω 2b +ω 3c , and a threshold γ is set. When D is greater than or equal to the threshold γ, it is recognized as a target speaker, otherwise it is recognized as a non-target speaker.
Claims (8)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010439120.7A CN111785285A (en) | 2020-05-22 | 2020-05-22 | Voiceprint recognition method for home multi-feature parameter fusion |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010439120.7A CN111785285A (en) | 2020-05-22 | 2020-05-22 | Voiceprint recognition method for home multi-feature parameter fusion |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111785285A true CN111785285A (en) | 2020-10-16 |
Family
ID=72754331
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010439120.7A Pending CN111785285A (en) | 2020-05-22 | 2020-05-22 | Voiceprint recognition method for home multi-feature parameter fusion |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111785285A (en) |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112542174A (en) * | 2020-12-25 | 2021-03-23 | 南京邮电大学 | VAD-based multi-dimensional characteristic parameter voiceprint identification method |
CN112712820A (en) * | 2020-12-25 | 2021-04-27 | 广州欢城文化传媒有限公司 | Tone classification method, device, equipment and medium |
CN112885355A (en) * | 2021-01-25 | 2021-06-01 | 上海头趣科技有限公司 | Speech recognition method based on multiple features |
CN113177536A (en) * | 2021-06-28 | 2021-07-27 | 四川九通智路科技有限公司 | Vehicle collision detection method and device based on deep residual shrinkage network |
CN113257266A (en) * | 2021-05-21 | 2021-08-13 | 特斯联科技集团有限公司 | Complex environment access control method and device based on voiceprint multi-feature fusion |
CN113393847A (en) * | 2021-05-27 | 2021-09-14 | 杭州电子科技大学 | Voiceprint recognition method based on fusion of Fbank features and MFCC features |
CN113612738A (en) * | 2021-07-20 | 2021-11-05 | 深圳市展韵科技有限公司 | Voiceprint real-time authentication encryption method, voiceprint authentication equipment and controlled equipment |
CN113823290A (en) * | 2021-08-31 | 2021-12-21 | 杭州电子科技大学 | Multi-feature fusion voiceprint recognition method |
CN113823293A (en) * | 2021-09-28 | 2021-12-21 | 武汉理工大学 | A method and system for speaker recognition based on speech enhancement |
CN116386647A (en) * | 2023-05-26 | 2023-07-04 | 北京瑞莱智慧科技有限公司 | Audio verification method, related device, storage medium and program product |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101436405A (en) * | 2008-12-25 | 2009-05-20 | 北京中星微电子有限公司 | Method and system for recognizing speaking people |
CN104835498A (en) * | 2015-05-25 | 2015-08-12 | 重庆大学 | Voiceprint identification method based on multi-type combination characteristic parameters |
-
2020
- 2020-05-22 CN CN202010439120.7A patent/CN111785285A/en active Pending
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101436405A (en) * | 2008-12-25 | 2009-05-20 | 北京中星微电子有限公司 | Method and system for recognizing speaking people |
CN104835498A (en) * | 2015-05-25 | 2015-08-12 | 重庆大学 | Voiceprint identification method based on multi-type combination characteristic parameters |
Cited By (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112712820A (en) * | 2020-12-25 | 2021-04-27 | 广州欢城文化传媒有限公司 | Tone classification method, device, equipment and medium |
CN112542174A (en) * | 2020-12-25 | 2021-03-23 | 南京邮电大学 | VAD-based multi-dimensional characteristic parameter voiceprint identification method |
CN112885355A (en) * | 2021-01-25 | 2021-06-01 | 上海头趣科技有限公司 | Speech recognition method based on multiple features |
CN113257266A (en) * | 2021-05-21 | 2021-08-13 | 特斯联科技集团有限公司 | Complex environment access control method and device based on voiceprint multi-feature fusion |
CN113393847A (en) * | 2021-05-27 | 2021-09-14 | 杭州电子科技大学 | Voiceprint recognition method based on fusion of Fbank features and MFCC features |
CN113393847B (en) * | 2021-05-27 | 2022-11-15 | 杭州电子科技大学 | Voiceprint recognition method based on fusion of Fbank features and MFCC features |
CN113177536A (en) * | 2021-06-28 | 2021-07-27 | 四川九通智路科技有限公司 | Vehicle collision detection method and device based on deep residual shrinkage network |
CN113177536B (en) * | 2021-06-28 | 2021-09-10 | 四川九通智路科技有限公司 | Vehicle collision detection method and device based on deep residual shrinkage network |
CN113612738A (en) * | 2021-07-20 | 2021-11-05 | 深圳市展韵科技有限公司 | Voiceprint real-time authentication encryption method, voiceprint authentication equipment and controlled equipment |
CN113823290A (en) * | 2021-08-31 | 2021-12-21 | 杭州电子科技大学 | Multi-feature fusion voiceprint recognition method |
CN113823293A (en) * | 2021-09-28 | 2021-12-21 | 武汉理工大学 | A method and system for speaker recognition based on speech enhancement |
CN113823293B (en) * | 2021-09-28 | 2024-04-26 | 武汉理工大学 | Speaker recognition method and system based on voice enhancement |
CN116386647A (en) * | 2023-05-26 | 2023-07-04 | 北京瑞莱智慧科技有限公司 | Audio verification method, related device, storage medium and program product |
CN116386647B (en) * | 2023-05-26 | 2023-08-22 | 北京瑞莱智慧科技有限公司 | Audio verification method, related device, storage medium and program product |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111785285A (en) | Voiceprint recognition method for home multi-feature parameter fusion | |
CN107886967B (en) | Bone conduction voice enhancement method of deep bidirectional gate recurrent neural network | |
CN110619885A (en) | Method for generating confrontation network voice enhancement based on deep complete convolution neural network | |
CN108520753B (en) | Speech Lie Detection Method Based on Convolutional Bidirectional Long Short-Term Memory Network | |
CN111653289B (en) | Playback voice detection method | |
CN109192200B (en) | Speech recognition method | |
CN103310789A (en) | Sound event recognition method based on optimized parallel model combination | |
CN108564956B (en) | Voiceprint recognition method and device, server and storage medium | |
CN108630209A (en) | A kind of marine organisms recognition methods of feature based fusion and depth confidence network | |
CN111785262B (en) | Speaker age and gender classification method based on residual error network and fusion characteristics | |
CN111986679A (en) | A speaker confirmation method, system and storage medium for dealing with complex acoustic environments | |
CN108564965A (en) | A kind of anti-noise speech recognition system | |
CN110648684A (en) | A WaveNet-based Bone Conduction Speech Enhancement Waveform Generation Method | |
CN112735477B (en) | Voice emotion analysis method and device | |
CN101419800B (en) | Emotional speaker recognition method based on frequency spectrum translation | |
Jing et al. | Speaker recognition based on principal component analysis of LPCC and MFCC | |
Gamit et al. | Isolated words recognition using mfcc lpc and neural network | |
CN106653004A (en) | Speaker identification feature extraction method for sensing speech spectrum regularization cochlear filter coefficient | |
CN111243621A (en) | Construction method of GRU-SVM deep learning model for synthetic speech detection | |
CN112863517B (en) | Speech Recognition Method Based on Convergence Rate of Perceptual Spectrum | |
Gandhiraj et al. | Auditory-based wavelet packet filterbank for speech recognition using neural network | |
CN114038469A (en) | A speaker recognition method based on multi-class spectral feature attention fusion network | |
Nirjon et al. | sMFCC: exploiting sparseness in speech for fast acoustic feature extraction on mobile devices--a feasibility study | |
Yadav et al. | Speaker identification system using wavelet transform and VQ modeling technique | |
CN107871498A (en) | A Hybrid Feature Combination Algorithm Based on Fisher's Criterion to Improve Speech Recognition Rate |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20201016 |