CN103456302A - Emotion speaker recognition method based on emotion GMM model weight synthesis - Google Patents

Emotion speaker recognition method based on emotion GMM model weight synthesis Download PDF

Info

Publication number
CN103456302A
CN103456302A CN2013103945338A CN201310394533A CN103456302A CN 103456302 A CN103456302 A CN 103456302A CN 2013103945338 A CN2013103945338 A CN 2013103945338A CN 201310394533 A CN201310394533 A CN 201310394533A CN 103456302 A CN103456302 A CN 103456302A
Authority
CN
China
Prior art keywords
emotion
speaker
model
neutral
weight
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN2013103945338A
Other languages
Chinese (zh)
Other versions
CN103456302B (en
Inventor
杨莹春
陈力
吴朝晖
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Original Assignee
Zhejiang University ZJU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU filed Critical Zhejiang University ZJU
Priority to CN201310394533.8A priority Critical patent/CN103456302B/en
Publication of CN103456302A publication Critical patent/CN103456302A/en
Application granted granted Critical
Publication of CN103456302B publication Critical patent/CN103456302B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention discloses an emotion speaker recognition method based on emotion GMM model weight synthesis. The method comprises the steps that (1) neutral GMM models of speakers are built regarding to each speaker and different emotion GMM models are acquired according to corresponding neutral emotion weight parameter conversion models; (2) voices of the speakers are collected, voice features are extracted, and the acquired voice features are calculated and scored in all the emotion GMM models in the step (1); (3) all scores are compared, and a speaker corresponding to the emotion GMM model getting the highest score is the speaker to be recognized. According to the emotion speaker recognition method based on emotion GMM model weight synthesis, the robustness of speaker emotion change reorganization and speaker reorganization accuracy are improved on the basis of collection of the speaker neutral voices because the neutral emotion weight models of the speakers are built.

Description

A kind of based on the synthetic emotional speaker recognition method of emotion GMM Model Weight
Technical field
The present invention relates to signal and process and pattern-recognition, more specifically, the present invention relates to a kind of based on the synthetic emotional speaker recognition method of emotion GMM Model Weight.
Background technology
Speaker Recognition Technology refers to utilizes signal processing technology and mode identification method, identifies the technology of its identity by the voice that gather the speaker, mainly comprises two steps: the speech recognition of speaker model training and testing.Emotional speaker identification is that training utterance and the tested speech in order to solve the registration speaker exists the inconsistent Speaker Recognition System hydraulic performance decline problem caused of emotion.The method that this patent proposes is exactly by setting up speaker's virtual emotion model, improves the recognition performance of system.
At present, the main Short Time Speech feature that Speaker Identification adopts comprises Mel cepstrum coefficient (MFCC), linear predict code cepstralcoefficients (LPCC), the linear predictor coefficient of perceptual weighting (PLP).The algorithm of Speaker Identification mainly comprises vector quantization (VQ), universal background model method (GMM-UBM), support vector machine (SVM) etc.Wherein, GMM-UBM is very extensive in the application of whole Speaker Identification field.
In emotional speaker identification, training utterance is generally neutral emotional speech, because in real world applications, generally the user only can provide the model of the voice training oneself under neutral pronunciation.And when test, voice may comprise the voice of various emotions, as happiness, sad etc.Yet traditional Speaker Recognition System can not be processed the mismatch of this training and testing environment.
Summary of the invention
The invention provides a kind of based on the synthetic emotional speaker recognition method of emotion GMM Model Weight, by setting up speaker's neutral emotion weight model, on the basis that only gathers the neutral voice of speaker, raising changes the robustness of identification to speaker's emotion, improve the accuracy of Speaker Identification.
A kind of based on the synthetic emotional speaker recognition method of emotion GMM Model Weight, step is as follows:
(1), for each speaker, set up speaker's neutral GMM model, and, according to corresponding neutral emotion weight parameter transformation model, obtain different emotion GMM models;
The said emotion of the present invention can have multiple choices, such as glad, indignation, alarmed, sad, constrain etc., the kind of the emotion of selecting is more, final recognition result is more accurate, but corresponding calculated amount also can increase, therefore, during use, can select the emotion kind of proper number according to needs, corresponding every kind of emotion is set up emotion GMM model.
(2) gather speaker's to be identified voice and extract phonetic feature, in all emotion GMM models that the phonetic feature obtained is obtained in step (1), carrying out score calculating;
In this step, corresponding neutral GMM model and emotion GMM model have been set up in to be identified speaking per capita in step (1), for some speakers to be identified, if do not set up corresponding neutral GMM model and emotion GMM model in step (1), can not be identified this speaker to be identified.
(3) all scores are compared, the corresponding speaker of emotion GMM model that score is the highest is speaker to be identified.
There are mapping relations between each speaker's neutral model and the weight between emotion model, utilize this mapping relations, can directly calculate emotion model by neutral model, the method for building up of neutral emotion weight parameter transformation model can adopt various algorithm of the prior art, as long as can between neutral model and emotion model, set up mapping relations, preferably, described neutral emotion weight parameter transformation model utilizes radial base neural net or sparse expression to set up.
As preferably, the process of establishing of described neutral emotion weight parameter transformation model specifically comprises the following steps:
1-1, in development library, extract the Short Time Speech feature of the different speakers under all affective states, go out the irrelevant Gaussian mixture model-universal background model of emotion by the EM Algorithm for Training;
1-2, utilize this Gaussian mixture model-universal background model, by self-adaptation average and adaptive weighting, obtain the neutral GMM model of each speaker in development library;
1-3, utilize the neutral GMM model of step 1-2, by the method for adaptive weighting, obtain the emotion GMM model under various affective states;
1-4, utilize the weight in the emotion GMM model of weight in the neutral GMM model of step 1-2 and step 1-3, train RBF Neural Network or sparse expression model, obtain neutral emotion weight parameter transformation model.
Development library in the present invention refers to, before realizing the present invention, first chooses arbitrarily some speakers and forms development library, and the speaker in follow-up identifying is not necessarily identical with the speaker in development library, can be identical, and also can be different.
As preferably, while adopting radial base neural net to obtain neutral emotion weight parameter transformation model, specifically comprise the following steps: in development library, utilize each speaker's neutral GMM Model Weight sequence and every kind of emotion GMM Model Weight sequence corresponding to this speaker, train the mapping relations that obtain between GMM Model Weight sequence and every kind of emotion GMM Model Weight sequence, i.e. neutral emotion weight parameter transformation model by Orthogonal Least Square.
As preferably, while adopting sparse expression to obtain neutral emotion weight parameter transformation model, specifically comprise the following steps: in development library, utilize each speaker's neutral GMM Model Weight sequence and every kind of emotion GMM Model Weight sequence corresponding to this speaker, obtain neutral emotion alignment dictionary, i.e. neutral emotion weight parameter transformation model.
The present invention is based on the synthetic emotional speaker recognition method of emotion GMM Model Weight, by setting up speaker's neutral emotion weight model, on the basis that only gathers the neutral voice of speaker, improve the robustness that speaker's emotion is changed to identification, improve the accuracy of Speaker Identification.
The accompanying drawing explanation
Fig. 1 is the process flow diagram that the present invention is based on the synthetic emotional speaker recognition method of emotion GMM Model Weight;
Fig. 2 the present invention is based in the synthetic emotional speaker recognition method of emotion GMM Model Weight the radially structural drawing of base neural net;
Fig. 3 is the structural drawing that the present invention is based on neutral emotion alignment dictionary in the synthetic emotional speaker recognition method of emotion GMM Model Weight.
Embodiment
Below in conjunction with accompanying drawing, to the present invention is based on the synthetic emotional speaker recognition method of emotion GMM Model Weight, be described in detail.
What in the experimental data in the present invention, adopt is Chinese emotional speech database (MASC), this database is quietly adopting Olympus DM-20 recording pen to record under environment, 68 speakers that this database is Chinese by 68 mother tongues form, male sex speaker 45 people wherein, female speaker 23 people.In recognition methods provided by the present invention, multiple choices can be arranged, in the present embodiment, for convenience of description and concrete test result is provided, chosen 5 kinds of affective states, be respectively neutral, angry, glad, indignation and sad, each speaker has 5 kinds of voice under affective state.Each speaker reads aloud 2 sections paragraphs (about 30s record length) and reads aloud 5 words and 20 statements each 3 times under neutral emotion, read aloud 5 words and 20 statements each 3 times under all the other every kind of affective states, for each speaker, word and the statement under neutral and other affective states, read aloud are all identical; For all speakers, the word of reading aloud and statement are all identical.
Test data in the present invention is carried out at association's workstation, and it is configured to: CPU E5420, and dominant frequency 2.5GHz, inside save as 4G, and experiment realizes under Visual Studio environment.
As shown in Figure 1, a kind of based on the synthetic emotional speaker recognition method of emotion GMM Model Weight, step is as follows:
(1), for each speaker, set up speaker's neutral GMM model, and, according to corresponding neutral emotion weight parameter transformation model, obtain different emotion GMM models;
In test process, choose arbitrarily several speakers' voice as development library, generally, speaker's number of choosing is no less than 10, for example choose front 18 speakers' voice as development library, record all voice of front 18 speakers under neutral and all the other five kinds of affective states in this development library, training obtains UBM model (being Gaussian mixture model-universal background model of the prior art).
In test process, remove the speaker in development library, all the other speakers are formed to the evaluation and test collection, in evaluation and test, to concentrate, the UBM model that in each speaker's neutral GMM model exploitation storehouse, training obtains, obtain by self-adaptation average and adaptive weighting.
The process of establishing of the neutral emotion weight parameter transformation model in this step specifically comprises the following steps:
1-1, in development library, extract the Short Time Speech feature of the different speakers under all affective states, go out the irrelevant Gaussian mixture model-universal background model of emotion by the EM Algorithm for Training;
Voice signal to speakers different in development library under neutral and all the other affective states carries out pre-service, pretreated step comprises sample quantization, zero-suppress and float, pre-emphasis (increasing the weight of the HFS of signal) and windowing (one section voice signal is divided into to some sections), and every section voice signal is extracted to the Short Time Speech feature.
All speakers' Short Time Speech feature is gone out to the irrelevant Gaussian mixture model-universal background model UBM λ (x) of emotion by the EM Algorithm for Training, and expression formula is as follows;
λ ( x ) = Σ i = 1 n ω i Φ ( μ i , Σ i ; x )
Wherein: ω ithe weight that means i gaussian component;
Φ means gauss of distribution function;
μ ithe average that means i gaussian component;
Σ ithe variance that means i gaussian component;
X means the Short Time Speech feature;
N means the number of gaussian component, can adjust according to needs, is traditionally arranged to be 512.
1-2, utilize this Gaussian mixture model-universal background model, by self-adaptation average and adaptive weighting, obtain the neutral GMM model of each speaker in development library;
Voice in the exploitation storehouse under each speaker's neutral emotion, by self-adaptation average and adaptive weighting, obtain speaker's neutral GMM model.Only adopt the self-adaptation average in prior art, while self-adaptation average and adaptive weighting in the present invention, self-adaptive weight sum self-adaptation average adopts identical method to realize.
1-3, utilize the neutral GMM model of step 1-2, by the method for adaptive weighting, obtain the emotion GMM model (the corresponding emotion GMM model of each affective state) under various affective states; In this step, adaptive weighting adopts and method identical in step 1-2.
1-4, utilize the weight in the emotion GMM model of weight in the neutral GMM model of step 1-2 and step 1-3, train RBF Neural Network or sparse expression model, obtain neutral emotion weight parameter transformation model.
In test, adopt radial base neural net and two kinds of embodiments of sparse expression model, obtain neutral emotion weight parameter transformation model, and test result is contrasted.
When adopting radial base neural net to obtain neutral emotion weight parameter transformation model, specifically comprise the following steps: in development library, utilize each speaker's neutral GMM Model Weight sequence and every kind of emotion GMM Model Weight sequence corresponding to this speaker, train the mapping relations that obtain between GMM Model Weight sequence and every kind of emotion GMM Model Weight sequence, i.e. neutral emotion weight parameter transformation model by Orthogonal Least Square.
The weight sequence of the neutral GMM model of each speaker in development library is designated as to [ω n, 1, ω n, 2..., ω n,n], wherein, N means neutral affective state, n means the number of gaussian component; The weight sequence of the emotion GMM model that this speaker is corresponding is designated as [ω e, 1, ω e, 2..., ω e,n]; Wherein, E means affective state, and n means the number of gaussian component.
As shown in Figure 2, radial base neural net is divided into input layer, hidden layer and output layer; The weight sequence that wherein input layer is neutral GMM model, the weight sequence that output layer is emotion GMM model (the weight sequence of each speaker's the corresponding emotion GMM model of each affective state), hidden layer activation function K (x) adopts radial basis function, and expression formula is as follows:
K ( x ) = e - | | x - ν θ | | 2
Wherein, the input value that x is input layer, i.e. the weight sequence of neutral GMM model;
The average that ν is radial basis function;
The variance that θ is radial basis function.
When train RBF Neural Network, by the K-means clustering method, calculate ν and θ; Calculate the weight w between hidden layer and output layer by Orthogonal Least Square, this weight w is also that (concrete computation process is referring to document [J.Robert for neutral emotion weight parameter transformation model, J.Schilling, J.Carroll.Approximation of nonlinear systems with radial basis function neural network[J] .IEEE Transactions on neural networks, 2001,12 (1): 21-28.]).
When adopting sparse expression to obtain neutral emotion weight parameter transformation model, specifically comprise the following steps: in development library, utilize each speaker's neutral GMM Model Weight sequence and every kind of emotion GMM Model Weight sequence corresponding to this speaker, obtain neutral emotion alignment dictionary, i.e. neutral emotion weight parameter transformation model.
As shown in Figure 3, in Fig. 3 in empty frame is a neutral emotion alignment dictionary, wherein, each row consists of a speaker's neutral GMM Model Weight sequence and a kind of emotion GMM Model Weight sequence of this speaker, and each speaker is to there being 4 neutral emotion alignment dictionaries.
In Fig. 3, the first half D nthe neutral GMM Model Weight sequence that comprises all speakers in development library, the latter half D ethe emotion GMM Model Weight sequence that comprises all speakers in development library, the number that in Fig. 3, M is the speaker in development library.
Obtain neutral emotion weight parameter transformation model in development library after, (in 68 speakers, remove 18 speakers in development library for the evaluation and test collection, the set that remaining 50 speaker forms) each speaker in sets up corresponding neutral GMM model and emotion GMM model, and process of establishing is different according to the acquisition process of neutral emotion weight parameter transformation model.
When adopting radial base neural net to obtain neutral emotion weight parameter transformation model, at first calculate each speaker's neutral GMM model in the UBM model of mode in step (1) by self-adaptation average and adaptive weighting, neutral GMM Model Weight sequence is designated as to ω n, enroll, emotion GMM Model Weight sequence is designated as ω e, enroll, utilize
Figure BDA0000376094170000071
calculate virtual emotion weight sequence ω e, enroll, in formula, the neuronic number that C is hidden layer, K jbe j hidden layer activation function, w jbe hidden layer that j neuron is corresponding and the weight between output layer.
When adopting sparse expression to obtain neutral emotion weight parameter transformation model, at first calculate each speaker's neutral GMM model in the UBM model of mode in step (1) by self-adaptation average and adaptive weighting, by neutral GMM Model Weight sequence [ω n, 1, ω n, 2..., ω n,n] (be ω n), and neutral GMM Model Weight dictionary D n, obtain sparse coefficient B,
arg min | | B | | 1 x subjectto | | D N B - ω N | | ≤ ϵ
Wherein, ε is the limit of error, can set according to concrete condition, be made as 1.3 in the present embodiment, concrete computation process is referring to [J.Wright, A.Y.Yang, A.Ganesh, S.S.Sastry, and Y.Ma, " Robust face recognition via sparse representation; " IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.31, no.2, pp.210 – 227,2009.].
Utilize ω e, enroll=D e* B calculates virtual emotion weight sequence ω e, enroll.
The average of each speaker's neutral GMM model gaussian component is concentrated in evaluation and test, and variance and virtual emotion weight form the corresponding emotion GMM model of each speaker.
λ E ( x ) = Σ i = 1 n ω E , enroll , i Φ ( μ N , i , Σ N , i )
In formula, Φ means gauss of distribution function;
μ n,ithe average that means i gaussian component under neutral affective state;
Σ n,ithe variance that means i gaussian component under neutral affective state;
ω e, enroll, imean virtual emotion weights omega e, enrollin the weight of i gaussian component;
X means the Short Time Speech feature;
N means the number of gaussian component, is made as 512 in the present embodiment.
After having set up the concentrated all speakers' of evaluation and test neutral GMM model and emotion GMM model, start to carry out speaker's identification.
(2) gather speaker's to be identified voice and extract the Short Time Speech feature, in all emotion GMM models that the Short Time Speech feature obtained is obtained in step (1), carrying out score calculating;
In this step, corresponding neutral GMM model and emotion GMM model have been set up in to be identified speaking per capita in step (1).
Voice to be identified are concentrated in all neutral GMM models and emotion GMM model and carried out respectively Likelihood Score calculating in evaluation and test, concentrate k speaker's model for evaluation and test, the Likelihood Score of the Short Time Speech feature xt of voice to be identified can utilize following formula to calculate:
s N , k = Σ i = 1 n ω N , i , k N ( x t , μ N , i , k , Σ N , i , k )
In formula, s n,kfor the score of voice to be identified in k speaker's neutral GMM model;
ω n, i, kbe i the weight that gaussian component is corresponding in k speaker's neutral GMM model;
X tfor the Short Time Speech feature;
μ n, i, kthe average that means i gaussian component under k speaker's neutral affective state;
Σ n, i, kthe variance that means i gaussian component under k speaker's neutral affective state;
N means the number of gaussian component, is made as 512 in the present embodiment.
s E , k = Σ i = 1 n ω E , i , k N ( x t , μ N , i , k , Σ N , i , k )
In formula, s e,kfor the score of voice to be identified in k speaker's emotion GMM model;
ω e, i, kbe i the weight that gaussian component is corresponding in k speaker's emotion GMM model;
X tfor the Short Time Speech feature;
μ n, i, kthe average that means i gaussian component under k speaker's neutral affective state;
Σ n, i, kthe variance that means i gaussian component under k speaker's neutral affective state;
N means the number of gaussian component, is made as 512 in the present embodiment.
(3) all scores are compared, the corresponding speaker of emotion GMM model that score is the highest is speaker to be identified.
For k speaker model, the Short Time Speech feature x of voice to be identified tfinal score s kfor the maximal value of Likelihood Score in all neutral GMM models and emotion GMM model,
S k=max(s N,k,s E,k)
For example, a certain section voice to be identified are in k speaker model, and the corresponding score maximum of this affective state of happiness, using glad corresponding score as S k.
Select the value of statement to be identified score maximum in all speaker models, the recognition result as final, be shown below
id = max arg x S k
In formula, the sequence number of the corresponding speaker model of value that id is the score maximum.
For example, a certain section S that voice to be identified obtain in the 20th speaker model kmaximum, recognition result is that voice to be identified are sent by the 20th speaker.
Concentrated to evaluating and testing, all statements under five kinds of emotional speeches are tested, and tested speech amounts to 15,000 (60 statements of 50 evaluation and test 5 kinds of emotional words * of people * (20 statements, each statement repeats 3 times)).In experiment, simulation be the process that the speaker differentiates, the GMM-UBM Comparison of experiment results of experimental result and benchmark is in Table 1.
Table 1
Emotional semantic classification Benchmark GMM-UBM Radial base neural net Sparse expression
Neutral 90.87% 95.23% 96.47%
Indignation 41.83% 51.97% 50.27%
Glad 44.80% 53.57% 51.20%
In alarm 39.20% 46.70% 45.57%
Sad 65.80% 69.60% 67.70%
On average 56.50% 63.41% 62.24%
As can be seen from Table 1, the inventive method can be synthesized speaker's emotion model effectively, under various affective states, the accuracy rate of identification is greatly improved, simultaneously, for radial base neural net and sparse expression, overall recognition accuracy has also improved respectively 6.91% and 5.74%, proves that this method improves a lot to improving emotional speaker identification accuracy and robustness.

Claims (5)

1. one kind based on the synthetic emotional speaker recognition method of emotion GMM Model Weight, it is characterized in that, step is as follows:
(1), for each speaker, set up speaker's neutral GMM model, and, according to corresponding neutral emotion weight parameter transformation model, obtain different emotion GMM models;
(2) gather speaker's to be identified voice and extract phonetic feature, in all emotion GMM models that the phonetic feature obtained is obtained in step (1), carrying out score calculating;
(3) all scores are compared, the corresponding speaker of emotion GMM model that score is the highest is speaker to be identified.
2. as claimed in claim 1ly based on the synthetic emotional speaker recognition method of emotion GMM Model Weight, it is characterized in that, described neutral emotion weight parameter transformation model utilizes radial base neural net or sparse expression to set up.
3. as claimed in claim 2ly based on the synthetic emotional speaker recognition method of emotion GMM Model Weight, it is characterized in that, the process of establishing of described neutral emotion weight parameter transformation model specifically comprises the following steps:
1-1, in development library, extract the Short Time Speech feature of the different speakers under all affective states, go out the irrelevant Gaussian mixture model-universal background model of emotion by the EM Algorithm for Training;
1-2, utilize this Gaussian mixture model-universal background model, by self-adaptation average and adaptive weighting, obtain the neutral GMM model of each speaker in development library;
1-3, utilize the neutral GMM model of step 1-2, by the method for adaptive weighting, obtain the emotion GMM model under various affective states;
1-4, utilize the weight in the emotion GMM model of weight in the neutral GMM model of step 1-2 and step 1-3, train RBF Neural Network or sparse expression model, obtain neutral emotion weight parameter transformation model.
4. as claimed in claim 3 based on the synthetic emotional speaker recognition method of emotion GMM Model Weight, it is characterized in that, while adopting radial base neural net to obtain neutral emotion weight parameter transformation model, specifically comprise the following steps: in development library, utilize each speaker's neutral GMM Model Weight sequence and every kind of emotion GMM Model Weight sequence corresponding to this speaker, train the mapping relations that obtain between GMM Model Weight sequence and every kind of emotion GMM Model Weight sequence, i.e. neutral emotion weight parameter transformation model by Orthogonal Least Square.
5. as claimed in claim 3 based on the synthetic emotional speaker recognition method of emotion GMM Model Weight, it is characterized in that, while adopting sparse expression to obtain neutral emotion weight parameter transformation model, specifically comprise the following steps: in development library, utilize each speaker's neutral GMM Model Weight sequence and every kind of emotion GMM Model Weight sequence corresponding to this speaker, obtain neutral emotion alignment dictionary, i.e. neutral emotion weight parameter transformation model.
CN201310394533.8A 2013-09-02 2013-09-02 A kind of emotional speaker recognition method based on the synthesis of emotion GMM Model Weight Active CN103456302B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310394533.8A CN103456302B (en) 2013-09-02 2013-09-02 A kind of emotional speaker recognition method based on the synthesis of emotion GMM Model Weight

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310394533.8A CN103456302B (en) 2013-09-02 2013-09-02 A kind of emotional speaker recognition method based on the synthesis of emotion GMM Model Weight

Publications (2)

Publication Number Publication Date
CN103456302A true CN103456302A (en) 2013-12-18
CN103456302B CN103456302B (en) 2016-04-20

Family

ID=49738601

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310394533.8A Active CN103456302B (en) 2013-09-02 2013-09-02 A kind of emotional speaker recognition method based on the synthesis of emotion GMM Model Weight

Country Status (1)

Country Link
CN (1) CN103456302B (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104167208A (en) * 2014-08-08 2014-11-26 中国科学院深圳先进技术研究院 Speaker recognition method and device
CN104464724A (en) * 2014-12-08 2015-03-25 南京邮电大学 Speaker recognition method for deliberately pretended voices
CN105139855A (en) * 2014-05-29 2015-12-09 哈尔滨理工大学 Speaker identification method with two-stage sparse decomposition and device
CN108831435A (en) * 2018-06-06 2018-11-16 安徽继远软件有限公司 A kind of emotional speech synthesizing method based on susceptible sense speaker adaptation
CN109491338A (en) * 2018-11-09 2019-03-19 南通大学 A kind of relevant method for diagnosing faults of multimode procedure quality based on sparse GMM
CN110060657A (en) * 2019-04-04 2019-07-26 南京邮电大学 Multi-to-multi voice conversion method based on SN
CN110060692A (en) * 2019-04-19 2019-07-26 山东优化信息科技有限公司 A kind of Voiceprint Recognition System and its recognition methods
CN113327620A (en) * 2020-02-29 2021-08-31 华为技术有限公司 Voiceprint recognition method and device

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101178897A (en) * 2007-12-05 2008-05-14 浙江大学 Speaking man recognizing method using base frequency envelope to eliminate emotion voice
CN101226743A (en) * 2007-12-05 2008-07-23 浙江大学 Method for recognizing speaker based on conversion of neutral and affection sound-groove model
CN101226742A (en) * 2007-12-05 2008-07-23 浙江大学 Method for recognizing sound-groove based on affection compensation
US20100217595A1 (en) * 2009-02-24 2010-08-26 Korea Institute Of Science And Technology Method For Emotion Recognition Based On Minimum Classification Error

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101178897A (en) * 2007-12-05 2008-05-14 浙江大学 Speaking man recognizing method using base frequency envelope to eliminate emotion voice
CN101226743A (en) * 2007-12-05 2008-07-23 浙江大学 Method for recognizing speaker based on conversion of neutral and affection sound-groove model
CN101226742A (en) * 2007-12-05 2008-07-23 浙江大学 Method for recognizing sound-groove based on affection compensation
US20100217595A1 (en) * 2009-02-24 2010-08-26 Korea Institute Of Science And Technology Method For Emotion Recognition Based On Minimum Classification Error

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105139855A (en) * 2014-05-29 2015-12-09 哈尔滨理工大学 Speaker identification method with two-stage sparse decomposition and device
CN104167208A (en) * 2014-08-08 2014-11-26 中国科学院深圳先进技术研究院 Speaker recognition method and device
CN104167208B (en) * 2014-08-08 2017-09-15 中国科学院深圳先进技术研究院 A kind of method for distinguishing speek person and device
CN104464724A (en) * 2014-12-08 2015-03-25 南京邮电大学 Speaker recognition method for deliberately pretended voices
CN108831435A (en) * 2018-06-06 2018-11-16 安徽继远软件有限公司 A kind of emotional speech synthesizing method based on susceptible sense speaker adaptation
CN108831435B (en) * 2018-06-06 2020-10-16 安徽继远软件有限公司 Emotional voice synthesis method based on multi-emotion speaker self-adaption
CN109491338A (en) * 2018-11-09 2019-03-19 南通大学 A kind of relevant method for diagnosing faults of multimode procedure quality based on sparse GMM
CN110060657A (en) * 2019-04-04 2019-07-26 南京邮电大学 Multi-to-multi voice conversion method based on SN
CN110060692A (en) * 2019-04-19 2019-07-26 山东优化信息科技有限公司 A kind of Voiceprint Recognition System and its recognition methods
CN113327620A (en) * 2020-02-29 2021-08-31 华为技术有限公司 Voiceprint recognition method and device
WO2021169365A1 (en) * 2020-02-29 2021-09-02 华为技术有限公司 Voiceprint recognition method and device

Also Published As

Publication number Publication date
CN103456302B (en) 2016-04-20

Similar Documents

Publication Publication Date Title
CN109817246B (en) Emotion recognition model training method, emotion recognition device, emotion recognition equipment and storage medium
CN103456302B (en) A kind of emotional speaker recognition method based on the synthesis of emotion GMM Model Weight
CN107610707B (en) A kind of method for recognizing sound-groove and device
Lanjewar et al. Implementation and comparison of speech emotion recognition system using Gaussian Mixture Model (GMM) and K-Nearest Neighbor (K-NN) techniques
CN102800316B (en) Optimal codebook design method for voiceprint recognition system based on nerve network
CN102332263B (en) Close neighbor principle based speaker recognition method for synthesizing emotional model
CN110164452A (en) A kind of method of Application on Voiceprint Recognition, the method for model training and server
CN105096955B (en) A kind of speaker's method for quickly identifying and system based on model growth cluster
CN101226743A (en) Method for recognizing speaker based on conversion of neutral and affection sound-groove model
Cheng et al. Speech emotion recognition using gaussian mixture model
CN103544963A (en) Voice emotion recognition method based on core semi-supervised discrimination and analysis
CN104123933A (en) Self-adaptive non-parallel training based voice conversion method
CN105261367A (en) Identification method of speaker
CN102789779A (en) Speech recognition system and recognition method thereof
CN104240706A (en) Speaker recognition method based on GMM Token matching similarity correction scores
Rammo et al. Detecting the speaker language using CNN deep learning algorithm
CN110265051A (en) The sightsinging audio intelligent scoring modeling method of education is sung applied to root LeEco
Shekofteh et al. Feature extraction based on speech attractors in the reconstructed phase space for automatic speech recognition systems
CN103258537A (en) Method utilizing characteristic combination to identify speech emotions and device thereof
Besbes et al. Multi-class SVM for stressed speech recognition
CN112562725A (en) Mixed voice emotion classification method based on spectrogram and capsule network
CN106297769B (en) A kind of distinctive feature extracting method applied to languages identification
CN104464738B (en) A kind of method for recognizing sound-groove towards Intelligent mobile equipment
CN106531192A (en) Speech emotion recognition method and system based on redundancy features and multi-dictionary representation
CN102496366B (en) Speaker identification method irrelevant with text

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant