CN105590628A - Adaptive adjustment-based Gaussian mixture model voice identification method - Google Patents

Adaptive adjustment-based Gaussian mixture model voice identification method Download PDF

Info

Publication number
CN105590628A
CN105590628A CN201510977077.9A CN201510977077A CN105590628A CN 105590628 A CN105590628 A CN 105590628A CN 201510977077 A CN201510977077 A CN 201510977077A CN 105590628 A CN105590628 A CN 105590628A
Authority
CN
China
Prior art keywords
gaussian
sigma
subcomponents
subcomponent
sub
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201510977077.9A
Other languages
Chinese (zh)
Inventor
沈希忠
包玲玲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Institute of Technology
Original Assignee
Shanghai Institute of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Institute of Technology filed Critical Shanghai Institute of Technology
Priority to CN201510977077.9A priority Critical patent/CN105590628A/en
Publication of CN105590628A publication Critical patent/CN105590628A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/04Training, enrolment or model building
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/06Decision making techniques; Pattern matching strategies

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Business, Economics & Management (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Game Theory and Decision Science (AREA)
  • Stereophonic System (AREA)

Abstract

The invention relates to an adaptive adjustment-based Gaussian mixture model voice identification method, which utilizes the absolute value sum of a probability difference to improve a traditional Gaussian mixing model, dynamically adjusts the contribution made by each Gaussian subcomponent in fitting voice signal characteristics, utilizes each Gaussian subcomponent to the maximum, and fully expresses information, thereby improving the identification performance confirmed by a speaker.

Description

Self-adaptive adjustment based human voice recognition method of Gaussian mixture model
Technical Field
The invention relates to a human voice recognition technology, in particular to a human voice recognition method based on a self-adaptive adjustment Gaussian mixture model.
Background
The human voice recognition technology is a technology for recognizing the identity of a speaker according to the voice of the speaker by utilizing a signal processing and probability theory method, and mainly comprises the following two steps: training of a speaker model and recognition of speaker voice.
The characteristic parameters mainly adopted by human voice recognition mainly comprise Mel cepstrum coefficients (MFCC), Linear Predictive Coding Coefficients (LPCC) and perceptually weighted linear predictive coefficients (PLP). The human voice recognition algorithm mainly comprises a Support Vector Machine (SVM), a Gaussian Mixture Model (GMM), a vector quantization method (VQ) and the like. The Gaussian mixture model is widely applied in the field of speech recognition.
The degree of mixing of the traditional Gaussian mixture model is fixed, voice characteristics of human voice are diversified, the information quantity carried by some Gaussian subcomponents in the characteristic distribution is small, and the information quantity carried by some Gaussian subcomponents is large, so that the phenomenon of over-fitting or under-fitting can be caused, and the recognition rate of speaker confirmation is reduced.
Disclosure of Invention
The invention provides a voice recognition method based on a self-adaptive adjustment Gaussian mixture model, aiming at the problems of voice recognition of the traditional Gaussian mixture model, and the voice recognition probability is improved by self-adaptively adjusting the mixedness and the Gaussian subcomponent on the basis of the traditional Gaussian mixture model.
The technical scheme of the invention is as follows: a self-adaptive adjustment based human voice recognition method of a Gaussian mixture model specifically comprises the following steps:
1) training by using the voice characteristic parameters of the speaker to generate a traditional Gaussian mixture model corresponding to the speaker;
2) calculating the probability of each frame of data generated by each Gaussian sub-component in the Gaussian mixture model, and then calculating the sum of absolute values of probability differences of the same frame of data generated by different Gaussian sub-components;
3) taking the minimum value of the plurality of sum values obtained in the step 2) and a set low threshold value theta3Making a comparison, if less than theta3Then, the two Gaussian subcomponents corresponding to the minimum value are merged to obtain a new Gaussian subcomponent;
4) taking the maximum value of the obtained multiple sum values and the set high threshold value theta1Making comparison, if greater than threshold value theta1Then, performing weight reconfiguration on the two Gaussian subcomponents corresponding to the maximum value to obtain two new Gaussian subcomponents;
5) taking the maximum value of the weight of the Gaussian sub-component and the set threshold value theta2By comparison, if greater than θ2Splitting the Gaussian subcomponents to obtain two new Gaussian subcomponents;
6) and replacing the original Gaussian subcomponents with the newly obtained Gaussian subcomponents, obtaining the finally optimized Gaussian model through multiple iterations, inputting the characteristic parameters of the voice to be recognized, calculating the probability of the voice signal generated by the fitting of each Gaussian mixture model, and judging the largest speaker as the corresponding target speaker, namely the true speaker of the tested voice.
The absolute value calculation expression of the probability difference value of the same frame signal generated in the step 2) is as follows:
p _ d i f f = | π a N ( x i | μ a , σ a ) Σ j = 1 K π j N ( x i | μ j , σ j ) - π b N ( x i | μ b , σ b ) Σ j = 1 K π j N ( x i | μ j , σ j ) | ,
by λn={πnnnDenotes the nth Gaussian sub-component, πnIs the weight of the nth Gaussian sub-component, munAnd σnExpressing the expectation and covariance matrixes of the nth Gaussian subcomponents, the probability of each frame of data generated by fitting K Gaussian subcomponents respectively, and the total L frame of data and xi(i-1, 2, …, L) is the input i-th frame speech signal, a and b are the ordinal numbers of the different gaussian subcomponents, piaIs the weight of the a-th Gaussian sub-component, N (x)iaa) Is the probability density of the a-th Gaussian sub-component, muaAnd σaExpressing the expectation and covariance matrix of the a-th Gaussian sub-component, wherein the subscript j in the formula expresses the sequence number of the j-th Gaussian sub-component; the subscript b indicates the sequence number of the b-th gaussian subcomponent.
The combination processing mode in the step 3) is as follows:
π T = π a + π b ω a = π a π T , ω b = π b π T μ T = ω a μ a + ω b μ b σ T = ω a σ a + ω b σ b
wherein a represents the serial number of the a-th Gaussian subcomponent; b represents the sequence number of the b-th Gaussian sub-component; t is the number of the new Gaussian subcomponent after combination, and the newly added Gaussian subcomponent lambda is usedTInstead of the original gaussian subcomponent lambdaaAnd λb
In the step 4), the weights of the two gaussian subcomponents a and b are redistributed to obtain two new gaussian subcomponents, and the processing mode is as follows:
π T = α 1 ( π a + π b ) π T + 1 = α 2 ( π a + π b )
wherein, α 1 = γ a γ a + γ b , α 1 = γ a γ a + γ b , γ a = π a N ( x i | μ a , σ a ) Σ j = 1 K π j N ( x i | μ j , σ j ) , γ b = π b N ( x i | μ b , σ b ) Σ j = 1 K π j N ( x i | μ j , σ j ) , the desired and covariance matrices of the two gaussian distributions remain unchanged.
Splitting the Gaussian subcomponents in the step 5), wherein the splitting processing mode is as follows:
π T = 1 2 π a π T + 1 = 1 2 π a μ T = μ a + τ E μ T + 1 = μ a - τ E σ T = ( 1 + β ) - 1 σ a σ T + 1 = ( 1 + β ) - 1 σ a
wherein,is σaMaximum on the diagonal; e ═ 1,1, …,1]Is a matrix of all 1 s and is,using the latest two Gaussian sub-components lambdaTT+1Replacing the original gaussian subcomponent lambdaa
The invention has the beneficial effects that: the invention relates to a voice recognition method based on a self-adaptive adjustment Gaussian mixture model, which improves the traditional Gaussian mixture model by using the sum of absolute values of probability difference values, dynamically adjusts the Gaussian subcomponents for the contribution of each Gaussian subcomponent in the process of fitting the characteristics of a voice signal, fully expresses useful information by using each Gaussian subcomponent to the maximum extent and further improves the recognition performance of speaker confirmation.
Drawings
FIG. 1 is a schematic diagram of a training process of adaptively adjusting a Gaussian mixture model according to the present invention;
FIG. 2 is a schematic flow chart of Gaussian subcomponent weight assignment in accordance with the present invention;
FIG. 3 is a schematic flow diagram of the improved Gaussian subcomponent splitting of the present invention;
FIG. 4 is a flow chart illustrating the improved Gaussian subcomponent combination of the present invention.
Detailed Description
The experimental data in this embodiment is collected of the recorded voices of 43 participants, the sampling rate is 8000Hz, 23 women and 20 men in 43 people record 5 voice segments each, each voice segment is recorded in a quiet environment, and each voice segment is a four-character idiom.
And training a certain amount of voice of different speakers to obtain traditional Gaussian mixture models corresponding to the different speakers, and optimizing the different traditional Gaussian mixture models according to the self-adaptive adjustment rule.
In the training process, three sections of voices of different speakers are selected randomly to be trained to obtain optimized Gaussian mixture models corresponding to the different speakers.
During the test, the recognition rate test of each optimized Gaussian mixture model is performed by using other speech segments of different speakers.
As shown in fig. 1, the flow chart of adaptively adjusting the gaussian mixture model training process includes the following steps:
the method comprises the steps of preprocessing a voice signal, wherein the preprocessing step comprises end point detection, framing, windowing and extracting a characteristic parameter, namely a Mel cepstrum coefficient, and a 12-dimensional Mel cepstrum coefficient (MFCC) is selected in the experiment.
And training the extracted MFCC parameters through an EM algorithm to obtain a traditional Gaussian mixture model corresponding to the speaker. The degree of mixing of the traditional Gaussian mixture model is K, the traditional Gaussian mixture model is formed by linearly superposing K Gaussian subcomponents, and the probability density of the Gaussian mixture model is calculated as follows:
p ( x ) = Σ n = 1 K p ( n ) p ( x | n ) = Σ n = 1 K π n N ( x | μ n , σ n )
N ( x | μ , σ ) = 1 ( 2 π ) D / 2 1 | σ | 1 / 2 exp [ - 1 2 ( x - μ ) T σ - 1 ( x - μ ) ]
wherein, pinIs the weight of the nth Gaussian sub-component, N (x | μ |)nn) The probability density function of the nth Gaussian sub-component is shown, in the embodiment, K is 16, mu and sigma are expected and covariance matrixes of the Gaussian sub-components, D is the dimension of data x and is used as lambdan={πnnnDenotes the nth gaussian subcomponent, n may take any integer value from 1 to K. And obtaining the probability that the speaker to be identified belongs to the current model by calculating p (x).
Let i frame data of speaker be xi(i ═ 1,2, … L), the specific estimation steps of the EM algorithm are as follows:
step one, if the first execution is carried out, initializing parameters { pi, mu, sigma } of a Gaussian mixture model; if the first execution is not performed, the parameters of the Gaussian mixture model are the result obtained by the previous iteration calculation. Then, the probability γ (i, n) of each frame data generated by the K gaussian subcomponents respectively is estimated (representing the probability of the ith frame data generated by the nth gaussian subcomponent):
γ ( i , n ) = π n N ( x i | μ n , σ n ) Σ j = 1 K π j N ( x i | μ j , σ j )
j in the formula represents the serial number of the jth Gaussian subcomponent; n represents the serial number of the nth Gaussian sub-component, the total number of the Gaussian sub-components is K, i represents the ith frame data of the speaker, and the L frame data are shared.
Secondly, estimating parameters to be solved of the Gaussian model by using the result obtained in the first step:
μ n = 1 Δ Σ i = 1 L γ ( i , n ) x i
σ n = 1 Δ Σ i = 1 L γ ( i , n ) ( x i - μ n ) ( x i - μ n ) T
π n = Δ L
wherein, Δ = Σ i = 1 L γ ( i , n )
and thirdly, repeating the first step and the second step until the value of the likelihood function tends to be stable.
And optimizing the obtained traditional Gaussian mixture model.
The probability that each frame of data is generated by fitting the K Gaussian subcomponents respectively is calculated by using the parameters of the traditional Gaussian model obtained by training, if L frames of data exist, a K x L matrix is obtained, for example, the 1 st row and 2 nd column of data represent the probability that the 2 nd frame of data is generated by the 1 st Gaussian subcomponent. And then calculating the absolute value of the probability difference value generated by two different Gaussian subcomponents in the same frame data, and summing the absolute values of the probability difference values of all the frame signals generated by fitting the two Gaussian subcomponents. Wherein, the absolute value calculation expression of the probability difference value of the signal of the same frame generated by the a-th and b-th Gaussian subcomponents is as follows:
p _ d i f f = | π a N ( x i | μ a , σ a ) Σ j = 1 K π j N ( x i | μ j , σ j ) - π b N ( x i | μ b , σ b ) Σ j = 1 K π j N ( x i | μ j , σ j ) |
j in the formula represents the serial number of the jth Gaussian subcomponent; a represents the sequence number of the a-th Gaussian sub-component; b represents the sequence number of the b-th Gaussian subcomponent, and the total number of the Gaussian subcomponents is K; i represents the i-th frame data of the speaker, and L frames are shared.
Taking the minimum value of the multiple sum values obtained in the last step and a low threshold valueθ3Making a comparison, if less than theta3If the two gaussian subcomponents are considered to fit the same part of the speech signal feature, that is, the information is overlapped, the two gaussian subcomponents are combined to form a new gaussian subcomponent, and the combination processing mode is as follows:
π T = π a + π b ω a = π a π T , ω b = π b π T μ T = ω a μ a + ω b μ b σ T = ω a σ a + ω b σ b
wherein a represents the serial number of the a-th Gaussian subcomponent; b represents the sequence number of the b-th Gaussian sub-component; and T, the sequence number of the new Gaussian subcomponents after combination. The low threshold in this step is an empirical value taken after a number of experiments.
With newly added gaussian sub-component lambdaTInstead of the original gaussian subcomponent lambdaaAnd λbThus, the degree of mixing of the gaussian mixture model is reduced by one.
Taking the maximum value of the above obtained multiple sums and the high threshold value theta1Making a comparison if greater than theta1Then, the two gaussian subcomponents are considered to be fitting different parts of the speech signal feature, in which case the weights need to be reassigned to the two gaussian subcomponents, and the processing is as follows:
π T = α 1 ( π a + π b ) π T + 1 = α 2 ( π a + π b )
wherein, α 1 = γ a γ a + γ b , α 1 = γ a γ a + γ b , γ a = π a N ( x i | μ a , σ a ) Σ j = 1 K π j N ( x i | μ j , σ j ) , γ b = π b N ( x i | μ b , σ b ) Σ j = 1 K π j N ( x i | μ j , σ j ) , the desired and covariance matrices of the two gaussian subcomponents remain unchanged.
The high threshold in this step is an empirical value taken after a number of experiments.
Taking the maximum value of the Gaussian sub-component weight and comparing the maximum value with a weight threshold value theta2Making a comparison if greater than theta2Then, it means that the gaussian subcomponent contains too much information and needs to be split, and the split processing manner is as follows:
π T = 1 2 π a π T + 1 = 1 2 π a μ T = μ a + τ E μ T + 1 = μ a - τ E σ T = ( 1 + β ) - 1 σ a σ T + 1 = ( 1 + β ) - 1 σ a
wherein,is σaMaximum on the diagonal; e ═ 1,1, …,1]Is a matrix of all 1 s and is,using the latest two Gaussian sub-components lambdaTT+1Replacing the original gaussian subcomponent lambdaaThe degree of mixing of the gaussian mixture model is increased by one.
The weight threshold in this step is an empirical value taken after a number of experiments.
And presetting an iteration number M, repeatedly executing the steps by using a new Gaussian subcomponent, and obtaining an optimized Gaussian mixture model after executing M times. And optimizing the model of each speaker to obtain an optimized Gaussian mixture model corresponding to each speaker. In this embodiment, M is 10.
And for the voice signal x to be recognized, calculating the probability that the voice signal is generated by different Gaussian mixture models, and taking the largest one of the voice signals, wherein the target speaker corresponding to the largest one is the real speaker of the tested voice.
For example, if the probability of a certain segment of speech to be recognized generated by the 3 rd gaussian mixture model is the maximum, the speech to be recognized is uttered by the 3 rd speaker.

Claims (5)

1. A self-adaptive adjustment based human voice recognition method of a Gaussian mixture model is characterized by comprising the following steps:
1) training by using the voice characteristic parameters of the speaker to generate a traditional Gaussian mixture model corresponding to the speaker;
2) calculating the probability of each frame of data generated by each Gaussian sub-component in the Gaussian mixture model, and then calculating the sum of absolute values of probability differences of the same frame of data generated by different Gaussian sub-components;
3) taking the minimum value of the plurality of sum values obtained in the step 2),and a set low threshold value theta3Making a comparison, if less than theta3Then, the two Gaussian subcomponents corresponding to the minimum value are merged to obtain a new Gaussian subcomponent;
4) taking the maximum value of the obtained multiple sum values and the set high threshold value theta1Making comparison, if greater than threshold value theta1Then, performing weight reconfiguration on the two Gaussian subcomponents corresponding to the maximum value to obtain two new Gaussian subcomponents;
5) taking the maximum value of the weight of the Gaussian sub-component and the set threshold value theta2By comparison, if greater than θ2Splitting the Gaussian subcomponents to obtain two new Gaussian subcomponents;
6) and replacing the original Gaussian subcomponents with the newly obtained Gaussian subcomponents, obtaining the finally optimized Gaussian model through multiple iterations, inputting the characteristic parameters of the voice to be recognized, calculating the probability of the voice signal generated by the fitting of each Gaussian mixture model, and judging the largest speaker as the corresponding target speaker, namely the true speaker of the tested voice.
2. The method for recognizing human voice based on adaptively adjusted gaussian mixture model according to claim 1, wherein the absolute value calculation expression of the probability difference value of step 2) generated for the same frame signal is:
p _ d i f f = | π a N ( x i | μ a , σ a ) Σ j = 1 K π j N ( x i | μ j , σ j ) - π b N ( x i | μ b , σ b ) Σ j = 1 K π j N ( x i | μ j , σ j ) | ,
by λn={πnnnDenotes the nth Gaussian sub-component, πnIs the weight of the nth Gaussian sub-component, munAnd σnExpressing the expectation and covariance matrixes of the nth Gaussian subcomponents, the probability of each frame of data generated by fitting K Gaussian subcomponents respectively, and the total L frame of data and xi(i-1, 2, …, L) is the input i-th frame speech signal, a and b are the ordinal numbers of the different gaussian subcomponents, piaIs the weight of the a-th Gaussian sub-component, N (x)iaa) Is the probability density of the a-th Gaussian sub-component, muaAnd σaExpressing the expectation and covariance matrix of the a-th Gaussian sub-component, wherein the subscript j in the formula expresses the sequence number of the j-th Gaussian sub-component; the subscript b indicates the sequence number of the b-th gaussian subcomponent.
3. The method for recognizing human voice based on adaptively adjusted gaussian mixture model according to claim 2, wherein the combination processing manner in the step 3) is as follows:
π T = π a + π b ω a = π a π T , ω b = π b π T μ T = ω a μ a + ω b μ b σ T = ω a σ a + ω b σ b
wherein a represents the serial number of the a-th Gaussian subcomponent; b represents the sequence number of the b-th Gaussian sub-component; t is the number of the new Gaussian subcomponent after combination, and the newly added Gaussian subcomponent lambda is usedTInstead of the original gaussian subcomponent lambdaaAnd λb
4. The method for recognizing human voice based on adaptively adjusted gaussian mixture model according to claim 2, wherein the step 4) re-assigns weights to the two gaussian subcomponents a and b to obtain two new gaussian subcomponents, and the processing method is as follows:
π T = α 1 ( π a + π b ) π T + 1 = α 2 ( π a + π b )
wherein, α 1 = γ a γ a + γ b , α 2 = γ b γ a + γ b , γ a = π a N ( x i | μ a , σ a ) Σ j = 1 K π j N ( x i | μ j , σ j ) , γ b = π b N ( x i | μ b , σ b ) Σ j = 1 K π j N ( x i | μ j , σ j ) , the desired and covariance matrices of the two gaussian distributions remain unchanged.
5. The adaptive adjustment-based human voice recognition method based on the gaussian mixture model as claimed in claim 2, wherein in the step 5), the gaussian subcomponent is split in the following manner:
π T = 1 2 π a π T + 1 = 1 2 π a μ T = μ a + τ E μ T + 1 = μ a - τ E σ T = ( 1 + β ) - 1 σ a σ T + 1 = ( 1 + β ) - 1 σ a
wherein, is σaMaximum on the diagonal; e ═ 1,1, …,1]Is a matrix of all 1 s and is,using the latest two Gaussian sub-components lambdaTT+1Replacing the original gaussian subcomponent lambdaa
CN201510977077.9A 2015-12-22 2015-12-22 Adaptive adjustment-based Gaussian mixture model voice identification method Pending CN105590628A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510977077.9A CN105590628A (en) 2015-12-22 2015-12-22 Adaptive adjustment-based Gaussian mixture model voice identification method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510977077.9A CN105590628A (en) 2015-12-22 2015-12-22 Adaptive adjustment-based Gaussian mixture model voice identification method

Publications (1)

Publication Number Publication Date
CN105590628A true CN105590628A (en) 2016-05-18

Family

ID=55930150

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510977077.9A Pending CN105590628A (en) 2015-12-22 2015-12-22 Adaptive adjustment-based Gaussian mixture model voice identification method

Country Status (1)

Country Link
CN (1) CN105590628A (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102324232A (en) * 2011-09-12 2012-01-18 辽宁工业大学 Method for recognizing sound-groove and system based on gauss hybrid models
CN102360418A (en) * 2011-09-29 2012-02-22 山东大学 Method for detecting eyelashes based on Gaussian mixture model and maximum expected value algorithm
CN102820033A (en) * 2012-08-17 2012-12-12 南京大学 Voiceprint identification method
CN104485108A (en) * 2014-11-26 2015-04-01 河海大学 Noise and speaker combined compensation method based on multi-speaker model

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102324232A (en) * 2011-09-12 2012-01-18 辽宁工业大学 Method for recognizing sound-groove and system based on gauss hybrid models
CN102360418A (en) * 2011-09-29 2012-02-22 山东大学 Method for detecting eyelashes based on Gaussian mixture model and maximum expected value algorithm
CN102820033A (en) * 2012-08-17 2012-12-12 南京大学 Voiceprint identification method
CN104485108A (en) * 2014-11-26 2015-04-01 河海大学 Noise and speaker combined compensation method based on multi-speaker model

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
熊华乔: ""基于模型聚类的说话人识别方法研究"", 《中国优秀硕士学位论文全文数据库信息科技辑》 *
王韵琪 等: ""自适应高斯混合模型及说话人识别应用"", 《通信技术》 *
王韵琪: ""自适应高斯混合模型及说话人识别应用"", 《中国优秀硕士学位论文全文数据库信息科技辑》 *

Similar Documents

Publication Publication Date Title
CN109817246B (en) Emotion recognition model training method, emotion recognition device, emotion recognition equipment and storage medium
CN108447490B (en) Voiceprint recognition method and device based on memorability bottleneck characteristics
US10176811B2 (en) Neural network-based voiceprint information extraction method and apparatus
US9400955B2 (en) Reducing dynamic range of low-rank decomposition matrices
CN108417201B (en) Single-channel multi-speaker identity recognition method and system
JP2016057461A (en) Speaker indexing device, speaker indexing method, and computer program for speaker indexing
CN104485108A (en) Noise and speaker combined compensation method based on multi-speaker model
CN112053694A (en) Voiceprint recognition method based on CNN and GRU network fusion
Agrawal et al. Prosodic feature based text dependent speaker recognition using machine learning algorithms
CN110634476A (en) Method and system for rapidly building robust acoustic model
JP2010078650A (en) Speech recognizer and method thereof
Shabani et al. Speech recognition using principal components analysis and neural networks
Schwartz et al. USSS-MITLL 2010 human assisted speaker recognition
Shi et al. Deep neural network and noise classification-based speech enhancement
CN109360573A (en) Livestock method for recognizing sound-groove, device, terminal device and computer storage medium
WO2021229643A1 (en) Sound signal conversion model learning device, sound signal conversion device, sound signal conversion model learning method, and program
Koolagudi et al. Speaker recognition in the case of emotional environment using transformation of speech features
Yamamoto et al. Denoising autoencoder-based speaker feature restoration for utterances of short duration.
Herrera-Camacho et al. Design and testing of a corpus for forensic speaker recognition using MFCC, GMM and MLE
CN105590628A (en) Adaptive adjustment-based Gaussian mixture model voice identification method
Nijhawan et al. Real time speaker recognition system for hindi words
Dey et al. Content normalization for text-dependent speaker verification
Islam et al. Bangla dataset and MMFCC in text-dependent speaker identification.
Kannadaguli et al. Comparison of artificial neural network and gaussian mixture model based machine learning techniques using ddmfcc vectors for emotion recognition in kannada
Zhipeng et al. Voiceprint recognition based on BP Neural Network and CNN

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20160518

WD01 Invention patent application deemed withdrawn after publication