CN102201237A - Emotional speaker identification method based on reliability detection of fuzzy support vector machine - Google Patents
Emotional speaker identification method based on reliability detection of fuzzy support vector machine Download PDFInfo
- Publication number
- CN102201237A CN102201237A CN201110121720XA CN201110121720A CN102201237A CN 102201237 A CN102201237 A CN 102201237A CN 201110121720X A CN201110121720X A CN 201110121720XA CN 201110121720 A CN201110121720 A CN 201110121720A CN 102201237 A CN102201237 A CN 102201237A
- Authority
- CN
- China
- Prior art keywords
- speaker
- support vector
- vector machine
- component
- model
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000012706 support-vector machine Methods 0.000 title claims abstract description 34
- 238000000034 method Methods 0.000 title claims abstract description 26
- 230000002996 emotional effect Effects 0.000 title claims abstract description 23
- 238000001514 detection method Methods 0.000 title claims abstract description 22
- 238000012360 testing method Methods 0.000 claims description 20
- 230000007935 neutral effect Effects 0.000 claims description 14
- 238000012549 training Methods 0.000 claims description 13
- 238000000605 extraction Methods 0.000 claims description 12
- 230000008451 emotion Effects 0.000 claims description 10
- 238000011161 development Methods 0.000 claims description 8
- 238000002474 experimental method Methods 0.000 claims description 8
- 238000007781 pre-processing Methods 0.000 claims description 5
- 238000013139 quantization Methods 0.000 claims description 4
- 238000011156 evaluation Methods 0.000 claims description 3
- 239000000203 mixture Substances 0.000 claims description 3
- 238000004364 calculation method Methods 0.000 description 5
- 238000005516 engineering process Methods 0.000 description 3
- 230000003044 adaptive effect Effects 0.000 description 2
- 238000004422 calculation algorithm Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000013459 approach Methods 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000015556 catabolic process Effects 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 238000006731 degradation reaction Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000009977 dual effect Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000003909 pattern recognition Methods 0.000 description 1
- 238000012567 pattern recognition method Methods 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 238000003672 processing method Methods 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 230000001953 sensory effect Effects 0.000 description 1
Images
Landscapes
- Measurement Of The Respiration, Hearing Ability, Form, And Blood Characteristics Of Living Organisms (AREA)
Abstract
本发明公开了基于模糊支持向量机的可靠性检测的情感说话人识别方法,通过提取语音分量特征,并将其与UBM模型中对应的权重结合形成通用背景模型分量特征;将得到的通用背景模型分量特征作为模糊隶属度,建立通用背景模型分量下的模糊支持向量机模型;利用模糊支持向量机模型进行可靠性检测从而得到可靠特征;对可靠特征进行计算并识别说话者,提高了说话人识别系统的鲁棒性,改善系统识别说话人的性能。
The invention discloses an emotional speaker recognition method based on fuzzy support vector machine reliability detection, by extracting voice component features, and combining them with corresponding weights in UBM models to form general background model component features; the obtained general background model The component feature is used as the fuzzy membership degree, and the fuzzy support vector machine model under the general background model component is established; the reliable feature is obtained by using the fuzzy support vector machine model for reliability detection; the reliable feature is calculated and the speaker is identified, which improves the speaker recognition The robustness of the system improves the performance of the system in identifying speakers.
Description
技术领域technical field
本发明涉及信号处理和模式识别,特别涉及一种基于模糊支持向量机的可靠性特征检测的情感说话人识别方法。The invention relates to signal processing and pattern recognition, in particular to an emotional speaker recognition method based on fuzzy support vector machine reliability feature detection.
背景技术Background technique
说话人识别技术是指利用信号处理和模式识别方法,根据说话人的语音识别其身份的技术,主要包括两个步骤:说话人模型训练和语音测试。Speaker recognition technology refers to the technology of using signal processing and pattern recognition methods to identify the speaker's identity according to the speaker's voice. It mainly includes two steps: speaker model training and voice testing.
目前,说话人识别采用的主要特征包括梅尔倒谱系数( ),线性预测编码倒谱系数(),感觉加权的线性预测系数()。说话人识别的算法主要包括矢量量化(),通用背景模型方法(),支持向量机()等等。其中,在整个说话人识别领域应用非常广泛。Currently, the main features used in speaker recognition include Mel cepstral coefficients ( ), linear predictive coding cepstral coefficients ( ), the sensory weighted linear predictive coefficient ( ). The algorithm of speaker recognition mainly includes vector quantization ( ), the generic background model approach ( ),Support Vector Machines( )etc. in, It is widely used in the field of speaker recognition.
在情感说话人识别中,训练语音通常为中性情感语音,因为在现实应用中,一般情况下用户只会提供中性发音下的语音训练自己的模型。而测试时,语音可能包括各种情感的语音,如高兴,悲伤等。然而,传统的说话人识别系统并不能处理这种训练和测试条件的失配,因此,情感说话人识别需要解决的是说话人在训练和测试阶段的情感不一致而导致的说话人识别系统性能下降的问题。In emotional speaker recognition, the training speech is usually neutral emotional speech, because in real applications, users generally only provide speech under neutral pronunciation to train their own models. While testing, the voice may include voices of various emotions, such as happy, sad and so on. However, traditional speaker recognition systems cannot handle this mismatch between training and testing conditions. Therefore, emotional speaker recognition needs to solve the performance degradation of the speaker recognition system caused by the emotional inconsistency of the speaker during the training and testing stages. The problem.
我们通过实验观察发现,由于说话人在不同情感状态下的发声状态存在差异而导致语音特征的空间分布存在差异,因此,相对于中性训练模型而言,情感语音特征与其不匹配,可视为不可靠特征,在测试阶段加以剔除后将有助于系统识别性能的提升。Through experimental observations, we found that the spatial distribution of speech features is different due to the differences in the vocalization states of speakers in different emotional states. Therefore, compared with the neutral training model, the emotional speech features do not match it, which can be regarded as Unreliable features, after being eliminated in the test phase, will help improve the system's recognition performance.
发明内容Contents of the invention
针对现有技术的不足,本发明提出一种基于模糊支持向量机的可靠性特征检测的情感说话人识别方法,通过剔除测试语音中的情感语音特征来降低模型失配程度,从而提高说话人识别系统的鲁棒性,改善说话人识别的性能。Aiming at the deficiencies of the prior art, the present invention proposes an emotional speaker recognition method based on fuzzy support vector machine reliability feature detection, which reduces the degree of model mismatch by eliminating the emotional speech features in the test speech, thereby improving speaker recognition The robustness of the system improves the performance of speaker recognition.
为了解决上述技术问题,本发明的技术方案如下:In order to solve the problems of the technologies described above, the technical solution of the present invention is as follows:
基于模糊支持向量机的可靠性检测的情感说话人识别方法,包括如下步骤The emotional speaker recognition method based on the reliability detection of the fuzzy support vector machine comprises the following steps
1) 提取语音分量特征,并将其与UBM模型中对应的权重结合形成通用背景模型分量特征;1) Extract the voice component features, and combine them with the corresponding weights in the UBM model to form the general background model component features;
2) 将所述步骤1)得到的通用背景模型分量特征作为模糊隶属度,建立通用背景模型分量下的模糊支持向量机模型;2) The general background model component feature obtained in the step 1) is used as the fuzzy membership degree, and the fuzzy support vector machine model under the general background model component is established;
3) 对所述步骤2)的模糊支持向量机模型进行可靠性检测从而得到可靠特征;3) Reliability detection is performed on the fuzzy support vector machine model of the step 2) to obtain reliable features;
4) 对所述步骤3)的可靠特征进行计算识别说话者。4) Calculate the reliable features of the step 3) to identify the speaker.
作为可选方案:所述提取语音分量特征包括如下步骤:As an optional solution: the feature of extracting the speech component comprises the following steps:
1) 采集语音信号,对其进行信号预处理;1) Collect voice signals and perform signal preprocessing on them;
2)对预处理后的语音信号进行特征提取;2) Carry out feature extraction to the preprocessed speech signal;
所述特征提取选取基于梅尔倒谱系数的特征提取方法和/或基于线性预测倒谱系数的特征提取方法;The feature extraction selects a feature extraction method based on Mel cepstral coefficients and/or a feature extraction method based on linear predictive cepstral coefficients;
所述预处理依次包括如下步骤:Described preprocessing comprises the steps in turn:
采样量化、去零漂、预加重和加窗。Sample quantization, de-zeroing, pre-emphasis and windowing.
作为可选方案:所述形成通用背景模型分量特征包括如下步骤:As an optional solution: the forming of the common background model component features includes the following steps:
1)将采集的语音信号随机分成开发库和评测库;1) Randomly divide the collected voice signals into a development library and an evaluation library;
2)选取开发库中的所有语音并提取特征,将其通过方法训练通用背景模型;2) Select all voices in the development library and extract features, pass them through method to train a generic background model;
3)对所述每个测试语音分别在通用背景各高斯模型上计算权重;3) For each of the test voices, the weights are calculated on the Gaussian models of the general background;
4) 将步骤2)和步骤3)结合形成通用背景模型分量特征4) Combine step 2) and step 3) to form a general background model component feature
作为可选方案:所述模糊支持向量机模型为每个高斯分量上的可靠-不可靠特征的两类模糊支持向量机分类器,所述两类模糊支持向量机分类器的正样本选自所述开发库中的中性语音、负样本选自所述开发库中的情感语音。As an optional solution: the fuzzy support vector machine model is two types of fuzzy support vector machine classifiers of reliable-unreliable features on each Gaussian component, and the positive samples of the two types of fuzzy support vector machine classifiers are selected from the The neutral voice and negative samples in the development library are selected from the emotional voice in the development library.
作为可选方案:上述模糊支持向量机进行可靠性检测包括如下步骤:As an optional solution: the above fuzzy support vector machine for reliability detection includes the following steps:
1)通过公式计算测试语音在每个高斯分量下的可靠性得分;1) via the formula Computational Test Speech reliability score under each Gaussian component;
所述、为每个高斯分量下的分类面的参数said , is the parameter of the classification surface under each Gaussian component
2)通过公式计算测试语音在所有高斯分量下的加权可靠性得分;2) via the formula Computational Test Speech Weighted reliability score under all Gaussian components;
所述为权重特征said is the weight feature
3)通过步骤2)得到的结果判断是否为可靠特征,如果结果大于所设定的阈值则将其作为可靠特征,否则剔除。3) Judge whether it is a reliable feature through the result obtained in step 2). If the result is greater than the set threshold, it will be regarded as a reliable feature, otherwise it will be eliminated.
作为可选方案:通过上述特征计算识别说话者包括如下步骤;As an optional solution: identifying the speaker through the above feature calculation includes the following steps;
1)训练每个说话人的高斯混合模型,自适应说话人模型采用最大后验概率的方法;1) Train the Gaussian mixture model of each speaker, and the adaptive speaker model adopts the method of maximum posterior probability;
2)通过公式得到第个说话人模型中测试语音的似然得分,通过公式得到整句测试语句得分;2) via the formula get the first test speech in a speaker model The likelihood score of , by the formula Get the score of the whole test sentence;
所述为实验中设定的特征可靠性检测的阈值,为高斯分布的概率密度said is the threshold of feature reliability detection set in the experiment, is the probability density of the Gaussian distribution
3)根据步骤2)中得分最大的识别说话人即 3) According to the speaker with the highest score in step 2), that is
所述表示说话人身份标识。said Indicates the identity of the speaker.
本发明的有益效果在于:通过剔除语音段落中受情感变化影响较严重的不可靠特征,提高说话人识别系统的鲁棒性,改善系统识别说话人的性能。The beneficial effect of the present invention is that the robustness of the speaker recognition system is improved and the speaker recognition performance of the system is improved by eliminating unreliable features that are seriously affected by emotional changes in speech paragraphs.
附图说明Description of drawings
图1为基于模糊支持向量机的可靠性检测情感说话人的识别方法的基本原理图。Fig. 1 is the basic principle diagram of the recognition method of emotional speaker based on fuzzy support vector machine reliability detection.
具体实施方式Detailed ways
下面将结合附图和具体实施例对本发明做进一步说明。The present invention will be further described below in conjunction with the accompanying drawings and specific embodiments.
如图1所示,基于模糊支持向量机的可靠性检测的情感说话人识别方法主要包括四个步骤As shown in Figure 1, the emotional speaker recognition method based on reliability detection of fuzzy support vector machine mainly includes four steps
1)提取语音分量特征,并将其与UBM模型中对应的权重结合形成通用背景模型分量特征;1) Extract the speech component features and combine them with the corresponding weights in the UBM model to form the general background model component features;
2)将所述步骤1)得到的通用背景模型分量特征作为模糊隶属度,建立通用背景模型分量下的模糊支持向量机模型UCFSVM;2) Use the general background model component features obtained in the step 1) as the fuzzy membership degree, and establish the fuzzy support vector machine model UCFSVM under the general background model component;
3)对所述步骤2)的模糊支持向量机模型UCFSVM进行可靠性检测通过得分的大小判断得到可靠特征;3) Carry out reliability detection pass score for the fuzzy support vector machine model UCFSVM in step 2) Reliable features are obtained by judging the size of
4)对所述步骤3)的可靠特征进行计算识别说话者。4) Perform calculations on the reliable features in step 3) to identify the speaker.
通用背景模型分量特征提取包括:Common background model component feature extraction includes:
采集语音信号,对其进行信号预处理,预处理的步骤包括采样量化,去零漂,预加重和加窗。The speech signal is collected, and the signal preprocessing is performed on it. The preprocessing steps include sampling and quantization, zero drift removal, pre-emphasis and windowing.
对预处理后的语音进行特征提取,采用的特征提取方法可以是基于梅尔倒谱系数()的特征提取方法、基于线性预测倒谱系数的特征提取方法()中的一种或者两种。Carry out feature extraction to the preprocessed speech, the feature extraction method that adopts can be based on Mel cepstral coefficient ( ) feature extraction method, feature extraction method based on linear predictive cepstral coefficient ( ) of one or both.
对于每段语音,得到一段特征序列,其中每帧特征是一个维的向量,表示该语句中特征的总帧数。For each speech, get a feature sequence , where each frame feature is a dimension vector, Indicates the total number of frames for features in this statement.
将所有训练模型的语音通过算法训练模型。每一个测试语音的特征分别在各高斯模型上求取权重。假设的模型参数为,其中,、和分别表示权重、均值和方差。则特征属于第个高斯分量的后验概率可以表示为:will all training The voice of the model passes through algorithm training Model. Features of each test utterance Respectively Find the weight on each Gaussian model . suppose The model parameters for ,in, , and denote the weight, mean and variance, respectively. feature belongs to the The posterior probability of a Gaussian component can be expressed as:
其中,表示高斯分布的概率密度。in, Represents the probability density of a Gaussian distribution.
后验概率也可以理解为该特征属于该分量的权重,将原特征和权重结合,即可形成新的通用背景模型分量特征。The posterior probability can also be understood as the feature belongs to the The weight of the component, the original feature and the weight are combined to form a new general background model component feature.
上述步骤(1)形成的特征包含了特征在上的权重,使得新构建的权重特征不仅能够充当训练模糊支持向量机时的模糊隶属度角色,同时也能充当计算可信度时的各高斯分量重要性的权重角色。The features formed in the above step (1) include features in The weight on the above makes the newly constructed weight feature not only play the role of fuzzy membership degree when training fuzzy support vector machine, but also play the weight role of the importance of each Gaussian component when calculating the credibility.
建立通用背景模型分量下的模糊支持向量机模型:Establish a fuzzy support vector machine model under the general background model component:
在模型的基础上,为每个高斯分量训练一个可靠-不可靠特征的两类模糊支持向量机模型。其中中性特征被认为是可靠特征,情感特征被认为是不可靠特征,正样本选自开发库的中性语音,负样本选取的是其中的情感语音。其中,每个样本的模糊隶属度为步骤(1)中提及的权重特征。exist Based on the model, a two-class fuzzy support vector machine model with reliable-unreliable features is trained for each Gaussian component. Among them, the neutral features are considered reliable features, and the emotional features are considered unreliable features. The positive samples are selected from the neutral speech of the development library, and the negative samples are selected from the emotional speech. Among them, the fuzzy membership degree of each sample is the weight feature mentioned in step (1).
训练模糊支持向量机的方法为:对于一个带隶属度标记的训练样本集:;The method of training the fuzzy support vector machine is as follows: for a training sample set with membership mark : ;
其中每个训练数据,如其为情感语音,则视为不可靠语音,其相应的标签,如其为中性语音,其标签为。where each training data , if it is emotional speech, it is regarded as unreliable speech, and its corresponding label , if it is a neutral voice, its label is .
优化超平面的问题等效为:The problem of optimizing the hyperplane is equivalent to:
其中,是一个常数,表示将从映射到得特征空间向量,隶属度代表相应的数据属于某一类的程度,,分别表示分类超平面的线性系数和偏移量。该问题可以采用解线性不等式的理论解决。(Chun-Fu Lin, Sheng-De Wang. Fuzzy Support Vector Machines. IEEE Transactions on Neural Networks, 13(2):464-471, March 2002.)。in, is a constant, express will from map to Get the feature space vector, the degree of membership represent the corresponding data belong to a certain degree, , Respectively represent the classification hyperplane The linear coefficient and offset of . This problem can be solved using the theory of solving linear inequalities. (Chun-Fu Lin, Sheng-De Wang. Fuzzy Support Vector Machines. IEEE Transactions on Neural Networks, 13(2):464-471, March 2002.).
上面式子可以转换为其对偶表达形式:The above formula can be transformed into its dual form:
同时,根据库恩-塔克条件:Meanwhile, according to the Kuhn-Tucker condition:
由上两式可以求解得到每个高斯分量下的分类面参数:和。The above two formulas can be solved to obtain the classification surface parameters under each Gaussian component: and .
基于模糊支持向量机的特征可靠性检测Feature Reliability Detection Based on Fuzzy Support Vector Machine
对于测试语音特征,需要计算其为可靠特征的得分,如果可靠性得分过低,要将其剔除。得分的计算分为两步:首先,求取该特征在通用背景模型单个高斯分量下的模糊支持向量机上的可靠性得分。其次,计算该特征在通用背景模型所有高斯分量下的模糊支持向量机上的可靠性得分的加权和,表示为:For testing speech features , need to calculate its score as a reliable feature, if the reliability score is too low, it should be removed. The calculation of the score is divided into two steps: first, the reliability score of the feature on the fuzzy support vector machine under the single Gaussian component of the general background model is obtained . Second, calculate the weighted sum of reliability scores of the feature on the fuzzy support vector machine under all Gaussian components of the general background model, expressed as:
其中,表示该特征在该高斯分量上的权重,的含义如上文所示。该得分可以用来判断其是否为可靠特征,如果得分大于阀值,则认为其为可靠特征,否则,将其剔除。in, Indicates that the feature is in the Weights on Gaussian components, meaning as above. The score can be used to judge whether it is a reliable feature, if the score is greater than the threshold, it is considered a reliable feature, otherwise, it will be eliminated.
可靠特征得分计算Reliable Feature Score Calculation
经过上述步骤(3)的可靠特征检测之后,需要计算整个语句的得分。After reliable feature detection in step (3) above, the score of the entire sentence needs to be calculated.
首先需要训练每个说话人的高斯混合模型,自适应说话人模型采用最大后验概率()的方法。First, it is necessary to train a Gaussian mixture model for each speaker, and the adaptive speaker model uses the maximum posterior probability ( )Methods.
其次,对于第个说话人模型,测试语音特征的似然得分可以通过计算在第个说话人的似然得分得到,即下式: Second, for the a speaker model to test speech features The likelihood score of can be calculated by the The likelihood score of a speaker is obtained as follows:
对于整句测试语句,其得分计算方法为:For the entire test sentence, the score calculation method is:
为实验中设定的可靠性的阈值,如果可靠性得分大于阈值,则该特征得分保留,否则,会被剔除。 It is the reliability threshold set in the experiment. If the reliability score is greater than the threshold, the feature score will be kept, otherwise, it will be eliminated.
最后,选择该语句的目标说话人时选择得分最大的说话人的。Finally, when selecting the target speaker of the sentence, choose the speaker with the highest score .
实验结果Experimental results
实验中采用的数据库为中文情感语音数据库(MASC)。该数据库是在安静的环境下采用奥林巴斯DM-20录音笔录制的。该数据库包含68个母语为汉语的68个说话人,其中男性45人,女性23人。每个说话人共有5种情感的发音:中性、生气、高兴、愤怒和悲伤。每个说话人会在中性条件下朗读2段中性的段落,同时,会在每种情感下说出5个单词和20句语句各3遍。The database used in the experiment is Chinese Emotional Speech Database (MASC). The database was recorded with an Olympus DM-20 recorder in a quiet environment. The database contains 68 speakers whose native language is Chinese, including 45 males and 23 females. There are 5 emotion pronunciations for each speaker: neutral, angry, happy, angry and sad. Each speaker will read 2 neutral passages under neutral conditions, and at the same time, will speak 5 words and 20 sentences for each emotion 3 times.
本实验是在IBM服务器上进行的。其配置为:CPU E5420,主频2.5GHz。内存为4G。This experiment is carried out on the IBM server. Its configuration is: CPU E5420, main frequency 2.5GHz. The memory is 4G.
实验中,前18个说话人的语音作为开发库,18人的中性段落语音用于训练模型,该18个人5种情感下的的语句发音用于训练模糊支持向量机模型。后50个说话人组成评测集,每个说话人的模型是采用其中性段落自适应出来。五种情感语音下的所有语句用来进行测试,测试语音共计15,000句()。实验中,模拟的是说话人鉴别的过程,实验结果和基准的实验结果比较见表1。In the experiment, the voices of the first 18 speakers were used as the development library, and the neutral paragraph voices of the 18 speakers were used for training Model, the sentence pronunciation of the 18 people under 5 emotions is used to train the fuzzy support vector machine model. The last 50 speakers constitute the evaluation set, and each speaker’s The model is self-adapted using its neutral paragraphs. All the sentences under the five emotional voices are used for testing, and the total number of test voices is 15,000 sentences ( ). In the experiment, the process of speaker identification is simulated, the experimental results and benchmark The experimental results are compared in Table 1.
表 本方法效果和基准实验效果比较surface The effect of this method is compared with that of the benchmark experiment
从上述实验结果可以看出,本方法可以有效地检测出语句中的可靠特征,在各情感状态下,识别的准确率得到了较大的提高。同时,总体的识别准确率也提高了3.64%。说明本方法对提高说话人识别系统的性能和鲁棒性有很大的帮助。It can be seen from the above experimental results that this method can effectively detect reliable features in sentences, and the recognition accuracy has been greatly improved in each emotional state. At the same time, the overall recognition accuracy has also increased by 3.64%. It shows that this method is of great help to improve the performance and robustness of the speaker recognition system.
以上所述仅是本发明的优选实施方式,应当指出,对于本技术领域的普通技术人员,在不脱离本发明构思的前提下,还可以做出若干改进和润饰,这些改进和润饰也应视为本发明的保护范围内。The above is only a preferred embodiment of the present invention, it should be pointed out that for those of ordinary skill in the art, without departing from the concept of the present invention, some improvements and modifications can also be made, and these improvements and modifications should also be considered Within the protection scope of the present invention.
Claims (6)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201110121720XA CN102201237B (en) | 2011-05-12 | 2011-05-12 | Emotional speaker identification method based on reliability detection of fuzzy support vector machine |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201110121720XA CN102201237B (en) | 2011-05-12 | 2011-05-12 | Emotional speaker identification method based on reliability detection of fuzzy support vector machine |
Publications (2)
Publication Number | Publication Date |
---|---|
CN102201237A true CN102201237A (en) | 2011-09-28 |
CN102201237B CN102201237B (en) | 2013-03-13 |
Family
ID=44661863
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201110121720XA Expired - Fee Related CN102201237B (en) | 2011-05-12 | 2011-05-12 | Emotional speaker identification method based on reliability detection of fuzzy support vector machine |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN102201237B (en) |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102779510A (en) * | 2012-07-19 | 2012-11-14 | 东南大学 | Speech emotion recognition method based on feature space self-adaptive projection |
CN102930297A (en) * | 2012-11-05 | 2013-02-13 | 北京理工大学 | Emotion recognition method for enhancing coupling hidden markov model (HMM) voice-vision fusion |
CN102968990A (en) * | 2012-11-15 | 2013-03-13 | 江苏嘉利德电子科技有限公司 | Speaker identifying method and system |
CN103258532A (en) * | 2012-11-28 | 2013-08-21 | 河海大学常州校区 | Method for recognizing Chinese speech emotions based on fuzzy support vector machine |
CN103258537A (en) * | 2013-05-24 | 2013-08-21 | 安宁 | Method utilizing characteristic combination to identify speech emotions and device thereof |
CN106504772A (en) * | 2016-11-04 | 2017-03-15 | 东南大学 | Speech Emotion Recognition Method Based on Importance Weight Support Vector Machine Classifier |
CN107886942A (en) * | 2017-10-31 | 2018-04-06 | 东南大学 | A kind of voice signal emotion identification method returned based on local punishment random spectrum |
CN108922564A (en) * | 2018-06-29 | 2018-11-30 | 北京百度网讯科技有限公司 | Emotion identification method, apparatus, computer equipment and storage medium |
CN110047491A (en) * | 2018-01-16 | 2019-07-23 | 中国科学院声学研究所 | A kind of relevant method for distinguishing speek person of random digit password and device |
CN115104152A (en) * | 2020-02-25 | 2022-09-23 | 松下电器(美国)知识产权公司 | Speaker identification device, speaker identification method, and program |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1758332A (en) * | 2005-10-31 | 2006-04-12 | 浙江大学 | Speaker recognition method based on MFCC linear emotion compensation |
CN101178897A (en) * | 2007-12-05 | 2008-05-14 | 浙江大学 | Speaker Recognition Method Based on Fundamental Band Envelope Removal of Emotional Speech |
JP2008146054A (en) * | 2006-12-06 | 2008-06-26 | Korea Electronics Telecommun | Speaker information acquisition system and method using voice feature information of speaker |
-
2011
- 2011-05-12 CN CN201110121720XA patent/CN102201237B/en not_active Expired - Fee Related
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1758332A (en) * | 2005-10-31 | 2006-04-12 | 浙江大学 | Speaker recognition method based on MFCC linear emotion compensation |
JP2008146054A (en) * | 2006-12-06 | 2008-06-26 | Korea Electronics Telecommun | Speaker information acquisition system and method using voice feature information of speaker |
CN101178897A (en) * | 2007-12-05 | 2008-05-14 | 浙江大学 | Speaker Recognition Method Based on Fundamental Band Envelope Removal of Emotional Speech |
Non-Patent Citations (1)
Title |
---|
ZHENYU SHAN ET AL: "Scores selection for emotional speaker recognition", 《ADVANCES IN BIOMETRICS THIRD INTERNATIONAL CONFERENCE, ICB 2009》 * |
Cited By (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102779510A (en) * | 2012-07-19 | 2012-11-14 | 东南大学 | Speech emotion recognition method based on feature space self-adaptive projection |
CN102930297A (en) * | 2012-11-05 | 2013-02-13 | 北京理工大学 | Emotion recognition method for enhancing coupling hidden markov model (HMM) voice-vision fusion |
CN102930297B (en) * | 2012-11-05 | 2015-04-29 | 北京理工大学 | Emotion recognition method for enhancing coupling hidden markov model (HMM) voice-vision fusion |
CN102968990A (en) * | 2012-11-15 | 2013-03-13 | 江苏嘉利德电子科技有限公司 | Speaker identifying method and system |
CN102968990B (en) * | 2012-11-15 | 2015-04-15 | 朱东来 | Speaker identifying method and system |
CN103258532A (en) * | 2012-11-28 | 2013-08-21 | 河海大学常州校区 | Method for recognizing Chinese speech emotions based on fuzzy support vector machine |
CN103258532B (en) * | 2012-11-28 | 2015-10-28 | 河海大学常州校区 | A kind of Chinese speech sensibility recognition methods based on fuzzy support vector machine |
CN103258537A (en) * | 2013-05-24 | 2013-08-21 | 安宁 | Method utilizing characteristic combination to identify speech emotions and device thereof |
CN106504772A (en) * | 2016-11-04 | 2017-03-15 | 东南大学 | Speech Emotion Recognition Method Based on Importance Weight Support Vector Machine Classifier |
CN106504772B (en) * | 2016-11-04 | 2019-08-20 | 东南大学 | Speech Emotion Recognition Method Based on Importance Weight Support Vector Machine Classifier |
CN107886942A (en) * | 2017-10-31 | 2018-04-06 | 东南大学 | A kind of voice signal emotion identification method returned based on local punishment random spectrum |
CN107886942B (en) * | 2017-10-31 | 2021-09-28 | 东南大学 | Voice signal emotion recognition method based on local punishment random spectral regression |
CN110047491A (en) * | 2018-01-16 | 2019-07-23 | 中国科学院声学研究所 | A kind of relevant method for distinguishing speek person of random digit password and device |
CN108922564A (en) * | 2018-06-29 | 2018-11-30 | 北京百度网讯科技有限公司 | Emotion identification method, apparatus, computer equipment and storage medium |
CN108922564B (en) * | 2018-06-29 | 2021-05-07 | 北京百度网讯科技有限公司 | Emotion recognition method and device, computer equipment and storage medium |
CN115104152A (en) * | 2020-02-25 | 2022-09-23 | 松下电器(美国)知识产权公司 | Speaker identification device, speaker identification method, and program |
Also Published As
Publication number | Publication date |
---|---|
CN102201237B (en) | 2013-03-13 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN102201237B (en) | Emotional speaker identification method based on reliability detection of fuzzy support vector machine | |
CN102799899B (en) | Special audio event layered and generalized identification method based on SVM (Support Vector Machine) and GMM (Gaussian Mixture Model) | |
CN103177733B (en) | Standard Chinese suffixation of a nonsyllabic "r" sound voice quality evaluating method and system | |
CN108922541B (en) | Multi-dimensional feature parameter voiceprint recognition method based on DTW and GMM models | |
Weninger et al. | Deep learning based mandarin accent identification for accent robust ASR. | |
Lee et al. | Mispronunciation detection via dynamic time warping on deep belief network-based posteriorgrams | |
Lengerich et al. | An end-to-end architecture for keyword spotting and voice activity detection | |
TWI395201B (en) | Method and system for identifying emotional voices | |
Semwal et al. | Automatic speech emotion detection system using multi-domain acoustic feature selection and classification models | |
CN104240706B (en) | It is a kind of that the method for distinguishing speek person that similarity corrects score is matched based on GMM Token | |
CN111128128B (en) | Voice keyword detection method based on complementary model scoring fusion | |
CN105632501A (en) | Deep-learning-technology-based automatic accent classification method and apparatus | |
CN101645269A (en) | Language recognition system and method | |
CN103578481B (en) | A kind of speech-emotion recognition method across language | |
CN105280181B (en) | A kind of training method and Language Identification of languages identification model | |
CN110211594A (en) | A kind of method for distinguishing speek person based on twin network model and KNN algorithm | |
Franco et al. | Adaptive and discriminative modeling for improved mispronunciation detection | |
CN103456302A (en) | Emotion speaker recognition method based on emotion GMM model weight synthesis | |
CN106297769B (en) | A kind of distinctive feature extracting method applied to languages identification | |
CN104901807A (en) | Vocal print password method available for low-end chip | |
Guo et al. | Speaker Verification Using Short Utterances with DNN-Based Estimation of Subglottal Acoustic Features. | |
Zeinali et al. | A fast speaker identification method using nearest neighbor distance | |
Chakroun et al. | A hybrid system based on GMM-SVM for speaker identification | |
Vestman et al. | Supervector compression strategies to speed up i-vector system development | |
Lin | An improved GMM-based clustering algorithm for efficient speaker identification |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20130313 |
|
CF01 | Termination of patent right due to non-payment of annual fee |