CN111583966A - Cross-database speech emotion recognition method and device based on joint distribution least square regression - Google Patents

Cross-database speech emotion recognition method and device based on joint distribution least square regression Download PDF

Info

Publication number
CN111583966A
CN111583966A CN202010372728.2A CN202010372728A CN111583966A CN 111583966 A CN111583966 A CN 111583966A CN 202010372728 A CN202010372728 A CN 202010372728A CN 111583966 A CN111583966 A CN 111583966A
Authority
CN
China
Prior art keywords
speech
database
squares regression
matrix
label
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010372728.2A
Other languages
Chinese (zh)
Other versions
CN111583966B (en
Inventor
宗源
江林
张佳成
郑文明
江星洵
刘佳腾
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Southeast University
Original Assignee
Southeast University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southeast University filed Critical Southeast University
Priority to CN202010372728.2A priority Critical patent/CN111583966B/en
Publication of CN111583966A publication Critical patent/CN111583966A/en
Application granted granted Critical
Publication of CN111583966B publication Critical patent/CN111583966B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/63Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for estimating an emotional state
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/18Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/21Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being power information
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/24Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Computational Linguistics (AREA)
  • Acoustics & Sound (AREA)
  • Signal Processing (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Child & Adolescent Psychology (AREA)
  • General Health & Medical Sciences (AREA)
  • Hospice & Palliative Care (AREA)
  • Psychiatry (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本发明公开了一种基于联合分布最小二乘回归的跨数据库语音情感识别方法及装置,方法包括:(1)获取训练数据库和测试数据库,其中,训练语音数据库中包含有若干语音片段和对应的语音情感类别标签,测试数据库中仅包含有若干待识别语音片段;(2)利用若干声学低维描述子对语音片段进行处理并进行统计,将统计得到的每个信息作为一个情感特征,并将多个情感特征组成向量作为对应语音片段的特征向量;(3)建立基于联合分布的最小二乘回归模型,利用训练数据库与测试数据库联合训练,得到稀疏投影矩阵;(4)对于待识别语音片段,按照步骤(2)得到特征向量,并采用学习到的稀疏投影矩阵,得到对应的语音情感类别标签。本发明可以适应不同环境,准确率更高。

Figure 202010372728

The invention discloses a cross-database speech emotion recognition method and device based on joint distribution least squares regression. The method includes: (1) acquiring a training database and a test database, wherein the training speech database includes several speech fragments and corresponding Speech emotion category label, the test database only contains a number of speech fragments to be recognized; (2) Use several acoustic low-dimensional descriptors to process and count the speech fragments, take each information obtained by the statistics as an emotional feature, and use it as an emotional feature. A plurality of emotional feature composition vectors are used as the feature vectors of the corresponding speech segments; (3) a least squares regression model based on joint distribution is established, and a sparse projection matrix is obtained by jointly training the training database and the test database; (4) For the speech segment to be recognized , obtain the feature vector according to step (2), and use the learned sparse projection matrix to obtain the corresponding voice emotion category label. The present invention can adapt to different environments and has higher accuracy.

Figure 202010372728

Description

基于联合分布最小二乘回归的跨数据库语音情感识别方法及 装置Method and device for cross-database speech emotion recognition based on joint distribution least squares regression

技术领域technical field

本发明涉及语音情感识别,尤其涉及一种基于联合分布最小二乘回归的跨数据库语音情感识别方法及装置。The present invention relates to speech emotion recognition, in particular to a cross-database speech emotion recognition method and device based on joint distribution least squares regression.

背景技术Background technique

语音情感识别的目的在于使得机器能够拥有足够智能从说话者的语音中提取它的情感状态(如高兴、恐惧、悲伤等),是人机交互中重要的一个环节,拥有巨大的研究潜能与发展前景。如结合驾驶员的语音、表情和行为信息检测其精神状态,可以及时提醒驾驶员集中注意力避免危险驾驶;在人机交互中检测对话人的语音情感可以使得对话更加流畅,更加照顾对话者的心理,贴近认知;可穿戴设备可以依据穿戴者的情感状态做出更为及时和贴心的反馈;同时,在课堂教学、老师陪护等各种各样的领域,语音情感识别都在发挥着越来越重要的作用。The purpose of speech emotion recognition is to enable the machine to have enough intelligence to extract its emotional state (such as happiness, fear, sadness, etc.) from the speaker's speech. It is an important part of human-computer interaction and has huge research potential and development. prospect. For example, detecting the driver's mental state by combining the driver's voice, facial expressions and behavior information can remind the driver to concentrate on avoiding dangerous driving in time; detecting the interlocutor's voice emotion in the human-computer interaction can make the conversation smoother and take care of the interlocutor's behavior. Psychological, close to cognition; wearable devices can give more timely and intimate feedback based on the wearer's emotional state; at the same time, in various fields such as classroom teaching and teacher escort, speech emotion recognition is playing a more and more important role. increasingly important role.

传统的语音情感识别都在同一个语音数据库上进行训练和测试,训练和测试的数据都遵循着同样的分布。而在实际生活中,训练出的模型必须面对不同的环境,发声背景中也会掺杂着各种各样的噪音。因此跨数据库语音情感识别面临着很大的挑战。如何使训练出的模型面对不同的环境都拥有良好的适应性,成为学术和工业界需要解决的问题。Traditional speech emotion recognition is trained and tested on the same speech database, and the training and testing data follow the same distribution. In real life, the trained model must face different environments, and the sounding background will be mixed with various noises. Therefore, cross-database speech emotion recognition faces great challenges. How to make the trained model have good adaptability to different environments has become a problem that needs to be solved in academia and industry.

发明内容SUMMARY OF THE INVENTION

发明目的:本发明针对现有技术存在的问题,提供一种基于联合分布最小二乘回归的跨数据库语音情感识别方法及装置,本发明对于不同环境都拥有良好的适应性,识别结果更准确。Purpose of the invention: The present invention provides a method and device for cross-database speech emotion recognition based on joint distribution least squares regression, aiming at the problems existing in the prior art. The present invention has good adaptability to different environments, and the recognition result is more accurate.

技术方案:本发明所述的基于联合分布最小二乘回归的跨数据库语音情感识别方法包括:Technical solution: The cross-database speech emotion recognition method based on joint distribution least squares regression according to the present invention includes:

(1)获取两个语音数据库,分别作为训练数据库和测试数据库,其中,训练语音数据库中包含有若干语音片段和对应的语音情感类别标签,而测试数据库中仅包含有若干待识别语音片段;(1) obtain two voice databases, respectively as training database and test database, wherein, training voice database contains several voice fragments and corresponding voice emotion category labels, and test database only contains several voice fragments to be recognized;

(2)利用若干声学低维描述子对语音片段进行处理并进行统计,将统计得到的每个信息作为一个情感特征,并将多个情感特征组成向量作为对应语音片段的特征向量;(2) Use several acoustic low-dimensional descriptors to process and count the speech fragments, take each information obtained by the statistics as an emotional feature, and use a plurality of emotional features to form a vector as the feature vector of the corresponding speech fragment;

(3)建立基于联合分布的最小二乘回归模型,利用已知标签的训练数据库与未知标签的测试数据库对其联合训练,得到一个连接语音片段与语音情感类别标签之间的稀疏投影矩阵;(3) Establish a least-squares regression model based on joint distribution, and use the training database of known labels and the test database of unknown labels to jointly train it to obtain a sparse projection matrix connecting the speech fragments and speech emotion category labels;

(4)对于测试数据库中待识别语音片段,按照步骤(2)得到特征向量,并采用学习到的稀疏投影矩阵,得到对应的语音情感类别标签。(4) For the speech segment to be recognized in the test database, the feature vector is obtained according to step (2), and the learned sparse projection matrix is used to obtain the corresponding speech emotion category label.

进一步的,步骤(2)具体包括:Further, step (2) specifically includes:

(2-1)对于每个语音片段,计算其16个声学低维描述子值和对应增量参数;所述16个声学低维描述子分别为:时间信号过零率、帧能量均方根、基音频率、谐波信噪比以及梅尔顿频率倒谱系数1-12;(2-1) For each speech segment, calculate its 16 acoustic low-dimensional descriptor values and corresponding increment parameters; the 16 acoustic low-dimensional descriptors are: time signal zero-crossing rate, frame energy root mean square , fundamental frequency, harmonic signal-to-noise ratio and Melton frequency cepstral coefficient 1-12;

(2-2)对于每个语音片段,分别对其16个声学低维描述子进行12种统计函数处理,所述12种统计函数分别为求平均值、标准差、峰态、偏度、最大值、最小值、相对位置、相对范围,以及两个线性回归系数及其均方误差;(2-2) For each speech segment, 12 kinds of statistical functions are processed for its 16 acoustic low-dimensional descriptors respectively, and the 12 kinds of statistical functions are mean value, standard deviation, kurtosis, skewness, maximum value, minimum value, relative position, relative range, and two linear regression coefficients and their mean squared errors;

(2-3)将统计得到的每个信息作为一个情感特征,并将多个情感特征组成向量作为对应语音片段的特征向量。(2-3) Each information obtained by statistics is regarded as an emotional feature, and a plurality of emotional features are formed into a vector as the feature vector of the corresponding speech segment.

进一步的,步骤(3)建立的最小二乘回归模型为:Further, the least squares regression model established in step (3) is:

Figure BDA0002478916840000021
Figure BDA0002478916840000021

式中,

Figure BDA0002478916840000022
表示找到使括号内式子最小的矩阵P,Ls∈Rc×n为训练数据库语音片段的语音情感类别标签向量,C为语音情感类别的类数,n为训练数据库语音片段的个数,Xs∈Rd×n为训练数据库语音片段的特征向量,d为特征向量的维数,P∈Rd×c为稀疏投影矩阵,PT为P的转秩矩阵,
Figure BDA0002478916840000023
为Frobenius范数的平方,λ、μ为控制正则项的权衡系数,
Figure BDA0002478916840000024
Xt∈Rd×m为测试数据库语音片段的特征向量, m为测试数据库语音片段的段数,
Figure BDA0002478916840000025
Figure BDA0002478916840000026
分别为训练数据库、测试数据库中情感类别属于第c类的语音片段的集合,nc、mc分别为测试数据库中情感类别属于第c类的语音片段的个数,|| ||2,1为2,1范数。In the formula,
Figure BDA0002478916840000022
means to find the matrix P that minimizes the expression in parentheses, L s ∈ R c×n is the speech emotion category label vector of the speech segment in the training database, C is the number of speech emotion classes, n is the number of speech segments in the training database, X s ∈R d×n is the feature vector of the speech segment in the training database, d is the dimension of the feature vector, P∈R d×c is the sparse projection matrix, P T is the rank transformation matrix of P,
Figure BDA0002478916840000023
is the square of the Frobenius norm, λ and μ are the trade-off coefficients that control the regular term,
Figure BDA0002478916840000024
X t ∈ R d×m is the feature vector of the test database speech segment, m is the number of segments of the test database speech segment,
Figure BDA0002478916840000025
Figure BDA0002478916840000026
are the sets of speech clips whose emotion category belongs to the c-th category in the training database and the test database, respectively, n c , m c are the number of speech clips whose emotion category belongs to the c-th category in the test database, || || 2,1 is the 2,1 norm.

进一步的,步骤(3)中所述利用已知标签的训练数据库与未知标签的测试数据库对其进行联合训练的方法具体包括:Further, described in step (3), the method of using the training database of the known label and the test database of the unknown label to jointly train it specifically includes:

(3-1)将所述最小二乘回归模型转换为:(3-1) Convert the least squares regression model to:

Figure BDA0002478916840000031
Figure BDA0002478916840000031

s.t.P=Qs.t.P=Q

(3-2)通过上述转换后的最小二乘回归模型,估算测试数据库中所有语音片段对应的语音情感类别伪标签形成的伪标签矩阵

Figure BDA0002478916840000032
(3-2) Estimate the pseudo-label matrix formed by the pseudo-labels of the speech emotion categories corresponding to all speech segments in the test database through the transformed least squares regression model
Figure BDA0002478916840000032

(3-3)根据伪标签矩阵

Figure BDA0002478916840000033
统计得到
Figure BDA0002478916840000034
和mc,进而计算得到
Figure BDA0002478916840000035
(3-3) According to the pseudo-label matrix
Figure BDA0002478916840000033
Statistics get
Figure BDA0002478916840000034
and m c , and then calculate
Figure BDA0002478916840000035

(3-4)基于

Figure BDA0002478916840000036
对转换后的最小二乘回归模型利用增广拉格朗日乘子法进行求解,得到投影矩阵估计值
Figure BDA0002478916840000037
(3-4) Based on
Figure BDA0002478916840000036
The transformed least squares regression model is solved by the augmented Lagrange multiplier method to obtain the estimated value of the projection matrix
Figure BDA0002478916840000037

(3-5)根据投影矩阵估计值

Figure BDA0002478916840000038
采用下式对伪标签矩阵
Figure BDA0002478916840000039
进行更新:(3-5) Estimated value according to projection matrix
Figure BDA0002478916840000038
The pseudo-label matrix is
Figure BDA0002478916840000039
To update:

Figure BDA00024789168400000310
Figure BDA00024789168400000310

Figure BDA00024789168400000311
Figure BDA00024789168400000311

式中,

Figure BDA00024789168400000312
表示中间辅助变量,
Figure BDA00024789168400000313
Figure BDA00024789168400000314
第i列第j行的元素,
Figure BDA00024789168400000315
表示求取第i列元素值最大的一行的行数j,
Figure BDA00024789168400000316
是伪标签矩阵
Figure BDA00024789168400000317
第i列第k行的元素;In the formula,
Figure BDA00024789168400000312
represents an intermediate auxiliary variable,
Figure BDA00024789168400000313
for
Figure BDA00024789168400000314
the element in column i and row j,
Figure BDA00024789168400000315
Represents the row number j of the row with the largest element value in the i-th column,
Figure BDA00024789168400000316
is the pseudo-label matrix
Figure BDA00024789168400000317
the element in column i and row k;

(3-6)采用更新后的伪标签矩阵

Figure BDA00024789168400000318
返回执行步骤(3-3),直至达到预设的循环次数后,将循环结束后得到的的投影矩阵估计值
Figure BDA00024789168400000319
作为学习得到的投影矩阵P。(3-6) Using the updated pseudo-label matrix
Figure BDA00024789168400000318
Return to step (3-3), until the preset number of cycles is reached, the estimated value of the projection matrix obtained after the cycle ends
Figure BDA00024789168400000319
as the learned projection matrix P.

进一步的,步骤(3-2)具体包括:Further, step (3-2) specifically includes:

(3-2-1)利用转换后的最小二乘回归模型不加正则项的公式,求得投影矩阵估计值的初始值

Figure BDA00024789168400000320
(3-2-1) Obtain the initial value of the estimated value of the projection matrix by using the formula of the transformed least squares regression model without adding a regular term
Figure BDA00024789168400000320

Figure BDA00024789168400000321
Figure BDA00024789168400000321

(3-2-2)根据投影矩阵的初始值

Figure BDA00024789168400000322
采用下式得到伪标签矩阵的初始值:(3-2-2) According to the initial value of the projection matrix
Figure BDA00024789168400000322
Use the following formula to get the initial value of the pseudo-label matrix:

Figure BDA00024789168400000323
Figure BDA00024789168400000323

Figure BDA00024789168400000324
Figure BDA00024789168400000324

式中,

Figure BDA0002478916840000041
表示中间辅助变量,
Figure BDA0002478916840000042
是伪标签矩阵的初始值
Figure BDA0002478916840000043
第i列第k行的元素。进一步的,步骤(3-4)具体包括:In the formula,
Figure BDA0002478916840000041
represents an intermediate auxiliary variable,
Figure BDA0002478916840000042
is the initial value of the pseudo-label matrix
Figure BDA0002478916840000043
The element in column i and row k. Further, step (3-4) specifically includes:

(3-4-1)获取所述最小二乘回归模型的增广拉格朗日方程:(3-4-1) Obtain the augmented Lagrangian equation of the least squares regression model:

Figure BDA0002478916840000044
Figure BDA0002478916840000044

式中,T为拉格朗日乘子,k>0为一个正则参数,tr()表示求矩阵的迹;In the formula, T is the Lagrange multiplier, k>0 is a regular parameter, and tr() represents the trace of the matrix;

(3-4-2)保持P、T、k不变,更新Q:(3-4-2) Keep P, T, and k unchanged, and update Q:

将增广拉格朗日方程中与变量Q有关的部分提出,得到:The part related to the variable Q in the augmented Lagrangian equation is proposed, and we get:

Figure BDA0002478916840000045
Figure BDA0002478916840000045

求解上式,得到:Solving the above formula, we get:

Figure BDA0002478916840000046
Figure BDA0002478916840000046

(3-4-3)保持Q、T、k不变,更新P:(3-4-3) Keep Q, T, and k unchanged, and update P:

将增广拉格朗日方程中与变量P有关的部分提出,得到:The part related to the variable P in the augmented Lagrangian equation is proposed, and we get:

Figure BDA0002478916840000047
Figure BDA0002478916840000047

求解上式,得到:Solving the above formula, we get:

Figure BDA0002478916840000048
Figure BDA0002478916840000048

Pi是P的第i个列向量,Ti是T的第i个列向量;Pi is the ith column vector of P, and Ti is the ith column vector of T;

(3-4-4)保持Q、P不变,更新T、k:(3-4-4) Keep Q and P unchanged, and update T and k:

T=T+k(P-C)T=T+k(P-C)

k=min(ρk,kmax)k=min(ρk,k max )

式中,kmax是预设k的最大值,ρ是缩放系数,ρ>1;In the formula, k max is the maximum value of the preset k, ρ is the scaling factor, ρ>1;

(3-4-5)检查是否收敛:(3-4-5) Check for convergence:

检查||P-Q||<ε是否成立,若否,则返回执行步骤(3-4-2),若是或迭代次数大于设置值,则将此时的P的值作为所求的稀疏投影矩阵,|| ||表示求数据中的最大元素,ε表示收敛阈值。Check whether ||PQ|| <ε is true, if not, return to step (3-4-2), if or the number of iterations is greater than the set value, the value of P at this time is used as the required sparse projection matrix , || || represents the largest element in the data, and ε represents the convergence threshold.

进一步的,步骤(4)中所述测试数据库的语音情感类别标签的计算方法为:Further, the calculation method of the voice emotion category label of the test database described in step (4) is:

采用下式计算:Calculated using the following formula:

Figure BDA0002478916840000051
Figure BDA0002478916840000051

Figure BDA0002478916840000052
Figure BDA0002478916840000052

式中,P为我们学习到的最终的投影矩阵,Xt表示测试数据库语音片段的特征向量集合,即待识别语音片段的特征向量集合,

Figure BDA0002478916840000053
表示中间辅助变量,j*表示待识别语音片段的语音情感类别标签。In the formula, P is the final projection matrix we have learned, X t represents the feature vector set of the speech segment in the test database, that is, the feature vector set of the speech segment to be recognized,
Figure BDA0002478916840000053
represents the intermediate auxiliary variable, and j* represents the speech emotion category label of the speech segment to be recognized.

本发明所述的基于联合分布最小二乘回归的跨数据库语音情感识别装置包括处理器及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述程序时实现上述方法。The cross-database speech emotion recognition device based on joint distribution least squares regression according to the present invention includes a processor and a computer program stored in a memory and running on the processor, and the processor implements the above method when executing the program .

有益效果:本发明与现有技术相比,其显著优点是:本发明的跨数据库语音情感识别方法及装置是在跨库学习,因此,对于不同环境都拥有良好的适应性,识别结果更准确。Beneficial effects: Compared with the prior art, the present invention has the following significant advantages: the method and device for cross-database speech emotion recognition of the present invention are cross-database learning, therefore, they have good adaptability to different environments, and the recognition results are more accurate .

附图说明Description of drawings

图1是本发明提供的基于联合分布最小二乘回归的跨数据库语音情感识别方法的流程示意图。FIG. 1 is a schematic flowchart of a cross-database speech emotion recognition method based on joint distribution least squares regression provided by the present invention.

具体实施方式Detailed ways

本实施例提供了一种基于联合分布最小二乘回归的跨数据库语音情感识别方法,如图1所示,包括以下步骤:This embodiment provides a cross-database speech emotion recognition method based on joint distribution least squares regression, as shown in FIG. 1 , including the following steps:

(1)获取两个语音数据库,分别作为训练数据库和测试数据库,其中,训练语音数据库中包含有若干语音片段和对应的语音情感类别标签,而测试数据库中仅包含有若干待识别语音片段。(1) Acquire two speech databases as training database and test database, wherein the training speech database contains several speech fragments and corresponding speech emotion category labels, while the test database only contains several speech fragments to be recognized.

在本实施例中,我们使用了情感语音识别中常见的三类语音情感数据库:Berlin、eNTERFACE和CAISA。因为三类数据库包含的情感类别不同,所以在两两比较时都对数据进行了选择。当Berlin和eNTERFACE进行比较时,我们分别选取了375条数据和 1077条数据,情感类别为5类(生气、害怕、快乐、厌恶、悲伤);当Berlin和CAISA 进行比较时,我们分别选取了408条数据和1000条数据,情感类别为5类(生气、害怕、高兴、厌恶、悲伤);当eNTERFACE和CAISA进行比较时,我们分别选取了1072 条数据和1000条数据,情感类别为5类(生气、害怕、高兴、厌恶、悲伤)。In this embodiment, we use three types of speech emotion databases commonly used in emotional speech recognition: Berlin, eNTERFACE and CAISA. Because the three types of databases contain different sentiment categories, the data are selected in the pairwise comparison. When comparing Berlin and eNTERFACE, we selected 375 pieces of data and 1077 pieces of data respectively, and the emotion categories were 5 categories (angry, scared, happy, disgusted, sad); when Berlin and CAISA were compared, we selected 408 pieces of data respectively. There are 1000 pieces of data and 1000 pieces of data, and the emotion category is 5 categories (angry, scared, happy, disgusted, sad); when eINTERFACE and CAISA are compared, we select 1072 pieces of data and 1000 pieces of data respectively, and the emotion category is 5 categories ( angry, scared, happy, disgusted, sad).

(2)利用若干声学低维描述子对语音片段进行处理并进行统计,将统计得到的每个信息作为一个情感特征,并将多个情感特征组成向量作为对应语音片段的特征向量。(2) Use several acoustic low-dimensional descriptors to process and count the speech fragments, take each information obtained by statistics as an emotional feature, and use multiple emotional features to form a vector as the feature vector of the corresponding speech fragment.

该步骤具体包括:This step specifically includes:

(2-1)对于每个语音片段,计算其16个声学低维描述子值和对应增量参数;所述16个声学低维描述子分别为:时间信号过零率、帧能量均方根、基音频率、谐波信噪比以及梅尔顿频率倒谱系数1-12;描述子来源于INTERSPEECH 2009Emotion Challenge提供的功能集;(2-1) For each speech segment, calculate its 16 acoustic low-dimensional descriptor values and corresponding increment parameters; the 16 acoustic low-dimensional descriptors are: time signal zero-crossing rate, frame energy root mean square , fundamental frequency, harmonic signal-to-noise ratio and Melton frequency cepstral coefficients 1-12; the descriptor comes from the function set provided by INTERSPEECH 2009 Emotion Challenge;

(2-2)对于每个语音片段,利用openSIMLE软件分别对其16个声学低维描述子进行12种统计函数处理,所述12种统计函数分别为求平均值、标准差、峰态、偏度、最大值、最小值、相对位置、相对范围,以及两个线性回归系数及其均方误差;(2-2) For each speech segment, use openSIMLE software to process 12 statistical functions for its 16 acoustic low-dimensional descriptors, and the 12 statistical functions are average, standard deviation, kurtosis, partial Degree, maximum value, minimum value, relative position, relative range, and two linear regression coefficients and their mean square errors;

(2-3)将统计得到的每个信息作为一个情感特征,并将16×2×12=384个情感特征组成向量作为对应语音片段的特征向量。(2-3) Take each information obtained by statistics as an emotion feature, and use 16×2×12=384 emotion features to form a vector as the feature vector of the corresponding speech segment.

(3)建立基于联合分布的最小二乘回归模型,利用已知标签的训练数据库与未知标签的测试数据库对其联合训练,得到一个连接语音片段与语音情感类别标签之间的稀疏投影矩阵。(3) Establish a least squares regression model based on joint distribution, and use the training database of known labels and the test database of unknown labels to jointly train it, and obtain a sparse projection matrix connecting speech fragments and speech emotion category labels.

其中,建立的最小二乘回归模型为:Among them, the established least squares regression model is:

Figure BDA0002478916840000061
Figure BDA0002478916840000061

式中,

Figure BDA0002478916840000062
表示找到使括号内式子最小的矩阵P,Ls∈Rc×n为训练数据库语音片段的语音情感类别标签向量,C为语音情感类别的类数,n为训练数据库语音片段的个数,Xs∈Rd×n为训练数据库语音片段的特征向量,d为特征向量的维数,P∈Rd×c为稀疏投影矩阵,PT为P的转秩矩阵,
Figure BDA0002478916840000063
为Frobenius范数的平方,λ、μ为控制正则项的权衡系数,
Figure BDA0002478916840000064
Xt∈Rd×m为测试数据库语音片段的特征向量,m为测试数据库语音片段的段数,
Figure BDA0002478916840000071
Figure BDA0002478916840000072
分别为训练数据库、测试数据库中情感类别属于第c类的语音片段的集合,nc、mc分别为测试数据库中情感类别属于第c类的语音片段的个数,|| ||2,1为2,1范数。In the formula,
Figure BDA0002478916840000062
means to find the matrix P that minimizes the expression in parentheses, L s ∈ R c×n is the speech emotion category label vector of the speech segment in the training database, C is the number of speech emotion classes, n is the number of speech segments in the training database, X s ∈R d×n is the feature vector of the speech segment in the training database, d is the dimension of the feature vector, P∈R d×c is the sparse projection matrix, P T is the rank transformation matrix of P,
Figure BDA0002478916840000063
is the square of the Frobenius norm, λ and μ are the trade-off coefficients that control the regular term,
Figure BDA0002478916840000064
X t ∈R d×m is the feature vector of the test database speech segment, m is the number of segments of the test database speech segment,
Figure BDA0002478916840000071
Figure BDA0002478916840000072
are the sets of speech clips whose emotion category belongs to the c-th category in the training database and the test database, respectively, n c , m c are the number of speech clips whose emotion category belongs to the c-th category in the test database, || || 2,1 is the 2,1 norm.

其中,利用已知标签的训练数据库与未知标签的测试数据库对其联合训练的方法具体包括:Wherein, the method for jointly training the training database with the known label and the test database with the unknown label specifically includes:

(3-1)将所述最小二乘回归模型转换为:(3-1) Convert the least squares regression model to:

Figure BDA0002478916840000073
Figure BDA0002478916840000073

s.t.P=Qs.t.P=Q

(3-2)通过上述转换后的最小二乘回归模型,估算测试数据库中所有语音片段对应的语音情感类别伪标签形成的伪标签矩阵

Figure BDA0002478916840000074
(3-2) Estimate the pseudo-label matrix formed by the pseudo-labels of the speech emotion categories corresponding to all speech segments in the test database through the transformed least squares regression model
Figure BDA0002478916840000074

(3-3)根据伪标签矩阵

Figure BDA0002478916840000075
统计得到
Figure BDA0002478916840000076
和mc,进而计算得到
Figure BDA0002478916840000077
(3-3) According to the pseudo-label matrix
Figure BDA0002478916840000075
Statistics get
Figure BDA0002478916840000076
and m c , and then calculate
Figure BDA0002478916840000077

(3-4)基于

Figure BDA0002478916840000078
对转换后的最小二乘回归模型利用增广拉格朗日乘子法进行求解,得到投影矩阵估计值
Figure BDA0002478916840000079
(3-4) Based on
Figure BDA0002478916840000078
The transformed least squares regression model is solved by the augmented Lagrange multiplier method to obtain the estimated value of the projection matrix
Figure BDA0002478916840000079

(3-5)根据投影矩阵估计值

Figure BDA00024789168400000710
采用下式对伪标签矩阵
Figure BDA00024789168400000711
进行更新:(3-5) Estimated value according to projection matrix
Figure BDA00024789168400000710
The pseudo-label matrix is
Figure BDA00024789168400000711
To update:

Figure BDA00024789168400000712
Figure BDA00024789168400000712

Figure BDA00024789168400000713
Figure BDA00024789168400000713

式中,

Figure BDA00024789168400000714
表示中间辅助变量,
Figure BDA00024789168400000715
Figure BDA00024789168400000716
第i列第j行的元素,
Figure BDA00024789168400000717
表示求取第i列元素值最大的一行的行数j,
Figure BDA00024789168400000718
是伪标签矩阵
Figure BDA00024789168400000719
第i列第k行的元素;In the formula,
Figure BDA00024789168400000714
represents an intermediate auxiliary variable,
Figure BDA00024789168400000715
for
Figure BDA00024789168400000716
the element in column i and row j,
Figure BDA00024789168400000717
Represents the row number j of the row with the largest element value in the i-th column,
Figure BDA00024789168400000718
is the pseudo-label matrix
Figure BDA00024789168400000719
the element in column i and row k;

(3-6)采用更新后的伪标签矩阵

Figure BDA00024789168400000720
返回执行步骤(3-3),直至达到预设的循环次数后,将循环结束后得到的的投影矩阵估计值
Figure BDA00024789168400000721
作为学习得到的投影矩阵P。(3-6) Using the updated pseudo-label matrix
Figure BDA00024789168400000720
Return to step (3-3), until the preset number of cycles is reached, the estimated value of the projection matrix obtained after the cycle ends
Figure BDA00024789168400000721
as the learned projection matrix P.

进一步的,步骤(3-2)具体包括:Further, step (3-2) specifically includes:

(3-2-1)利用转换后的最小二乘回归模型不加正则项的公式,求得投影矩阵估计值的初始值

Figure BDA0002478916840000081
(3-2-1) Obtain the initial value of the estimated value of the projection matrix by using the formula of the transformed least squares regression model without adding a regular term
Figure BDA0002478916840000081

Figure BDA0002478916840000082
Figure BDA0002478916840000082

(3-2-2)根据投影矩阵的初始值

Figure BDA0002478916840000083
采用下式得到伪标签矩阵的初始值:(3-2-2) According to the initial value of the projection matrix
Figure BDA0002478916840000083
Use the following formula to get the initial value of the pseudo-label matrix:

Figure BDA0002478916840000084
Figure BDA0002478916840000084

Figure BDA0002478916840000085
Figure BDA0002478916840000085

式中,

Figure BDA0002478916840000086
表示中间辅助变量,
Figure BDA0002478916840000087
是伪标签矩阵的初始值
Figure BDA0002478916840000088
第i列第k行的元素。伪标签矩阵
Figure BDA0002478916840000089
的每一列只有其对应的类别那一行为1,其余行都为0。In the formula,
Figure BDA0002478916840000086
represents an intermediate auxiliary variable,
Figure BDA0002478916840000087
is the initial value of the pseudo-label matrix
Figure BDA0002478916840000088
The element in column i and row k. Pseudo-label matrix
Figure BDA0002478916840000089
Each column of is only 1 for its corresponding category, and the other rows are 0.

步骤(3-4)具体包括:Step (3-4) specifically includes:

(3-4-1)获取所述最小二乘回归模型的增广拉格朗日方程:(3-4-1) Obtain the augmented Lagrangian equation of the least squares regression model:

Figure BDA00024789168400000810
Figure BDA00024789168400000810

式中,T为拉格朗日乘子,k>0为一个正则参数,tr()表示求矩阵的迹;In the formula, T is the Lagrange multiplier, k>0 is a regular parameter, and tr() represents the trace of the matrix;

(3-4-2)保持P、T、k不变,更新Q:(3-4-2) Keep P, T, and k unchanged, and update Q:

将增广拉格朗日方程中与变量Q有关的部分提出,得到:The part related to the variable Q in the augmented Lagrangian equation is proposed, and we get:

Figure BDA00024789168400000811
Figure BDA00024789168400000811

求解上式,得到:Solving the above formula, we get:

Figure BDA00024789168400000812
Figure BDA00024789168400000812

(3-4-3)保持Q、T、k不变,更新P:(3-4-3) Keep Q, T, and k unchanged, and update P:

将增广拉格朗日方程中与变量P有关的部分提出,得到:The part related to the variable P in the augmented Lagrangian equation is proposed, and we get:

Figure BDA00024789168400000813
Figure BDA00024789168400000813

求解上式,得到:Solving the above formula, we get:

Figure BDA0002478916840000091
Figure BDA0002478916840000091

Pi是P的第i个列向量,Ti是T的第i个列向量;Pi is the ith column vector of P, and Ti is the ith column vector of T;

(3-4-4)保持Q、P不变,更新T、k:(3-4-4) Keep Q and P unchanged, and update T and k:

T=T+k(P-C)T=T+k(P-C)

k=min(ρk,kmax)k=min(ρk,k max )

式中,kmax是预设k的最大值,ρ是缩放系数,ρ>1;In the formula, k max is the maximum value of the preset k, ρ is the scaling factor, ρ>1;

(3-4-5)检查是否收敛:(3-4-5) Check for convergence:

检查||P-Q||<ε是否成立,若否,则返回执行步骤(3-4-2),若是或迭代次数大于设置值,则将此时的P的值作为所求的稀疏投影矩阵,||||表示求数据中的最大元素,Check whether ||PQ|| <ε is true, if not, return to step (3-4-2), if or the number of iterations is greater than the set value, the value of P at this time is used as the required sparse projection matrix , |||| means to find the largest element in the data,

ε表示收敛阈值。ε represents the convergence threshold.

(4)对于测试数据库中待识别语音片段,按照步骤(2)得到特征向量,并采用学习到的稀疏投影矩阵,得到对应的语音情感类别标签。(4) For the speech segment to be recognized in the test database, the feature vector is obtained according to step (2), and the learned sparse projection matrix is used to obtain the corresponding speech emotion category label.

具体方法为采用下式计算类别标签:The specific method is to use the following formula to calculate the category label:

Figure BDA0002478916840000092
Figure BDA0002478916840000092

Figure BDA0002478916840000093
Figure BDA0002478916840000093

式中,P为我们学习到的最终的投影矩阵,Xt表示测试数据库语音片段的特征向量集合,即待识别语音片段的特征向量集合,

Figure BDA0002478916840000094
表示中间辅助变量,j*表示待识别语音片段的语音情感类别标签。In the formula, P is the final projection matrix we have learned, X t represents the feature vector set of the speech segment in the test database, that is, the feature vector set of the speech segment to be recognized,
Figure BDA0002478916840000094
represents the intermediate auxiliary variable, and j * represents the speech emotion category label of the speech segment to be recognized.

本实施例还提供了一种基于联合分布最小二乘回归的跨数据库语音情感识别装置,包括处理器及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述程序时实现上述方法。This embodiment also provides a cross-database speech emotion recognition device based on joint distributed least squares regression, including a processor and a computer program stored in a memory and executable on the processor, the processor executing the program implement the above method.

为了验证本发明的有效性,在语音情感数据库Berlin、eNTERFACE和CAISA数据库上我们两两分别进行了实验。在每一组实验中,我们将两种数据库分别作为源域和目标域,其中源域是作为训练集提供训练数据和标签,目标域是作为测试集,仅仅提供测试数据,不提供任何标签。为了更有效的检测识别准确率,我们采用了非加权平均召回率(UAR)与加权平均召回率(WAR)两种检测方法。其中,UAR表示每一类正确预测的数量除以每一类参与测试的数量,再对所有类的正确率取代数平均;而WAR表示所有正确预测的数量除以所有参与测试的数量,不考虑每一类数量的影响。综合考虑UAR和 WAR可以有效避免类别数量不平衡的影响。作为对比实验,我们选取了子空间学习中经典且高效的几类算法,分别为:SVM、TCA、TKL、DaLSR、DoSL。实验结果如下表1所示,其中,本方法在表中表示为英文缩写JDLSR,数据集为源域/目标域,E、B、 C分别为eNTERFACE、Berlin、CASIA数据集的缩写,评价标准为UAR/WAR。In order to verify the effectiveness of the present invention, we conduct experiments in pairs on the speech emotion database Berlin, eNTERFACE and CAISA database respectively. In each set of experiments, we use the two databases as the source domain and the target domain, where the source domain is used as the training set to provide training data and labels, and the target domain is used as the test set, providing only test data and no labels. In order to detect the recognition accuracy more effectively, we adopt two detection methods: Unweighted Average Recall (UAR) and Weighted Average Recall (WAR). Among them, UAR represents the number of correct predictions of each class divided by the number of participating tests for each class, and then averages the number of correct replacements for all classes; and WAR represents the number of all correct predictions divided by the number of all participating tests, regardless of The impact of each category quantity. Comprehensive consideration of UAR and WAR can effectively avoid the impact of unbalanced number of categories. As a comparative experiment, we selected several classic and efficient algorithms in subspace learning, namely: SVM, TCA, TKL, DaLSR, DoSL. The experimental results are shown in Table 1 below. Among them, the method is represented by the English abbreviation JDLSR in the table, the data set is the source domain/target domain, E, B, and C are the abbreviations of the eNTERFACE, Berlin, and CASIA data sets, respectively. The evaluation standard is UAR/WAR.

实验结果表明,基于本发明提出的微表情识别方法,取得了较高的跨数据库微表情识别率。The experimental results show that, based on the micro-expression recognition method proposed by the present invention, a high cross-database micro-expression recognition rate is achieved.

表1Table 1

Figure 1
Figure 1

Claims (8)

1.一种基于联合分布最小二乘回归的跨数据库语音情感识别方法,其特征在于该方法包括:1. a cross-database speech emotion recognition method based on joint distribution least squares regression, is characterized in that the method comprises: (1)获取两个语音数据库,分别作为训练数据库和测试数据库,其中,训练语音数据库中包含有若干语音片段和对应的语音情感类别标签,测试数据库中仅包含有若干待识别语音片段;(1) two voice databases are obtained as training database and test database respectively, wherein, the training voice database contains several voice fragments and corresponding voice emotion category labels, and the test database only contains several voice fragments to be recognized; (2)利用若干声学低维描述子对语音片段进行处理并进行统计,将统计得到的每个信息作为一个情感特征,并将多个情感特征组成向量作为对应语音片段的特征向量;(2) Use several acoustic low-dimensional descriptors to process and count the speech fragments, take each information obtained by the statistics as an emotional feature, and use a plurality of emotional features to form a vector as the feature vector of the corresponding speech fragment; (3)建立基于联合分布的最小二乘回归模型,利用已知标签的训练数据库与未知标签的测试数据库对其联合训练,得到一个连接语音片段与语音情感类别标签之间的稀疏投影矩阵;(3) Establish a least-squares regression model based on joint distribution, and use the training database of known labels and the test database of unknown labels to jointly train it to obtain a sparse projection matrix connecting the speech fragments and speech emotion category labels; (4)对于测试数据库中待识别语音片段,按照步骤(2)得到特征向量,并采用学习到的稀疏投影矩阵,得到对应的语音情感类别标签。(4) For the speech segment to be recognized in the test database, the feature vector is obtained according to step (2), and the learned sparse projection matrix is used to obtain the corresponding speech emotion category label. 2.根据权利要求1所述的基于联合分布最小二乘回归的跨数据库语音情感识别方法,其特征在于:步骤(2)具体包括:2. the cross-database speech emotion recognition method based on joint distribution least squares regression according to claim 1, is characterized in that: step (2) specifically comprises: (2-1)对于每个语音片段,计算其16个声学低维描述子值和对应增量参数;所述16个声学低维描述子分别为:时间信号过零率、帧能量均方根、基音频率、谐波信噪比以及梅尔顿频率倒谱系数1-12;(2-1) For each speech segment, calculate its 16 acoustic low-dimensional descriptor values and corresponding increment parameters; the 16 acoustic low-dimensional descriptors are: time signal zero-crossing rate, frame energy root mean square , fundamental frequency, harmonic signal-to-noise ratio and Melton frequency cepstral coefficient 1-12; (2-2)对于每个语音片段,分别对其16个声学低维描述子进行12种统计函数处理,所述12种统计函数分别为求平均值、标准差、峰态、偏度、最大值、最小值、相对位置、相对范围,以及两个线性回归系数及其均方误差;(2-2) For each speech segment, 12 kinds of statistical functions are processed for its 16 acoustic low-dimensional descriptors respectively, and the 12 kinds of statistical functions are mean value, standard deviation, kurtosis, skewness, maximum value, minimum value, relative position, relative range, and two linear regression coefficients and their mean squared errors; (2-3)将统计得到的每个信息作为一个情感特征,并将多个情感特征组成向量作为对应语音片段的特征向量。(2-3) Each information obtained by statistics is regarded as an emotional feature, and a plurality of emotional features are formed into a vector as the feature vector of the corresponding speech segment. 3.根据权利要求1所述的基于联合分布最小二乘回归的跨数据库语音情感识别方法,其特征在于:步骤(3)建立的最小二乘回归模型为:3. the cross-database speech emotion recognition method based on joint distribution least squares regression according to claim 1, is characterized in that: the least squares regression model that step (3) sets up is:
Figure FDA0002478916830000011
Figure FDA0002478916830000011
式中,
Figure FDA0002478916830000012
表示找到使括号内式子最小的矩阵P,Ls∈Rc×n为训练数据库语音片段的语音情感类别标签向量,C为语音情感类别的类数,n为训练数据库语音片段的个数,Xs∈Rd×n为训练数据库语音片段的特征向量,d为特征向量的维数,P∈Rd×c为稀疏投影矩阵,PT为P的转秩矩阵,
Figure FDA0002478916830000021
为Frobenius范数的平方,λ、μ为控制正则项的权衡系数,
Figure FDA0002478916830000022
Xt∈Rd×m为测试数据库语音片段的特征向量,m为测试数据库语音片段的段数,
Figure FDA0002478916830000023
Figure FDA0002478916830000024
分别为训练数据库、测试数据库中情感类别属于第c类的语音片段的集合,nc、mc分别为测试数据库中情感类别属于第c类的语音片段的个数,|| ||2,1为2,1范数。
In the formula,
Figure FDA0002478916830000012
means to find the matrix P that minimizes the expression in parentheses, L s ∈ R c×n is the speech emotion category label vector of the speech segment in the training database, C is the number of speech emotion classes, n is the number of speech segments in the training database, X s ∈ R d×n is the feature vector of the speech segment in the training database, d is the dimension of the feature vector, P∈R d×c is the sparse projection matrix, P T is the rank transformation matrix of P,
Figure FDA0002478916830000021
is the square of the Frobenius norm, λ and μ are the trade-off coefficients that control the regular term,
Figure FDA0002478916830000022
X t ∈R d×m is the feature vector of the test database speech segment, m is the number of segments of the test database speech segment,
Figure FDA0002478916830000023
Figure FDA0002478916830000024
are the sets of speech clips whose emotion category belongs to the c-th category in the training database and the test database, respectively, n c , m c are the number of speech clips whose emotion category belongs to the c-th category in the test database, || || 2,1 is the 2,1 norm.
4.根据权利要求3所述的基于联合分布最小二乘回归的跨数据库语音情感识别方法,其特征在于:步骤(3)中所述利用已知标签的训练数据库与未知标签的测试数据库对其进行联合训练的方法具体包括:4. the cross-database speech emotion recognition method based on joint distribution least squares regression according to claim 3, is characterized in that: described in step (3) utilizes the training database of known label and the test database of unknown label to its The methods of joint training include: (3-1)将所述最小二乘回归模型转换为:(3-1) Convert the least squares regression model to:
Figure FDA0002478916830000025
Figure FDA0002478916830000025
s.t.P=Qs.t.P=Q (3-2)通过上述转换后的最小二乘回归模型,估算测试数据库中所有语音片段对应的语音情感类别伪标签形成的伪标签矩阵
Figure FDA0002478916830000026
(3-2) Estimate the pseudo-label matrix formed by the pseudo-labels of the speech emotion categories corresponding to all speech segments in the test database through the transformed least squares regression model
Figure FDA0002478916830000026
(3-3)根据伪标签矩阵
Figure FDA0002478916830000027
统计得到
Figure FDA0002478916830000028
和mc,进而计算得到
Figure FDA0002478916830000029
(3-3) According to the pseudo-label matrix
Figure FDA0002478916830000027
Statistics get
Figure FDA0002478916830000028
and m c , and then calculate
Figure FDA0002478916830000029
(3-4)基于
Figure FDA00024789168300000210
对转换后的最小二乘回归模型利用增广拉格朗日乘子法进行求解,得到投影矩阵估计值
Figure FDA00024789168300000211
(3-4) Based on
Figure FDA00024789168300000210
The transformed least squares regression model is solved by the augmented Lagrange multiplier method to obtain the estimated value of the projection matrix
Figure FDA00024789168300000211
(3-5)根据投影矩阵估计值
Figure FDA00024789168300000212
采用下式对伪标签矩阵
Figure FDA00024789168300000213
进行更新:
(3-5) Estimated value according to projection matrix
Figure FDA00024789168300000212
The pseudo-label matrix is
Figure FDA00024789168300000213
To update:
Figure FDA00024789168300000214
Figure FDA00024789168300000214
Figure FDA00024789168300000215
Figure FDA00024789168300000215
式中,
Figure FDA00024789168300000216
表示中间辅助变量,
Figure FDA00024789168300000217
Figure FDA00024789168300000218
第i列第j行的元素,
Figure FDA00024789168300000219
表示求取第i列元素值最大的一行的行数j,
Figure FDA00024789168300000220
是伪标签矩阵
Figure FDA00024789168300000221
第i列第k行的元素;
In the formula,
Figure FDA00024789168300000216
represents an intermediate auxiliary variable,
Figure FDA00024789168300000217
for
Figure FDA00024789168300000218
the element in column i and row j,
Figure FDA00024789168300000219
Represents the row number j of the row with the largest element value in the i-th column,
Figure FDA00024789168300000220
is the pseudo-label matrix
Figure FDA00024789168300000221
the element in column i and row k;
(3-6)采用更新后的伪标签矩阵
Figure FDA0002478916830000031
返回执行步骤(3-3),直至达到预设的循环次数后,将循环结束后得到的的投影矩阵估计值
Figure FDA0002478916830000032
作为学习得到的投影矩阵P。
(3-6) Using the updated pseudo-label matrix
Figure FDA0002478916830000031
Return to step (3-3), until the preset number of cycles is reached, the estimated value of the projection matrix obtained after the cycle ends
Figure FDA0002478916830000032
as the learned projection matrix P.
5.根据权利要求4所述的基于联合分布最小二乘回归的跨数据库语音情感识别方法,其特征在于:步骤(3-2)具体包括:5. the cross-database speech emotion recognition method based on joint distribution least squares regression according to claim 4, is characterized in that: step (3-2) specifically comprises: (3-2-1)利用转换后的最小二乘回归模型不加正则项的公式,求得投影矩阵估计值的初始值
Figure FDA0002478916830000033
(3-2-1) Obtain the initial value of the estimated value of the projection matrix by using the formula of the transformed least squares regression model without adding a regular term
Figure FDA0002478916830000033
Figure FDA0002478916830000034
Figure FDA0002478916830000034
(3-2-2)根据投影矩阵的初始值
Figure FDA0002478916830000035
采用下式得到伪标签矩阵的初始值:
(3-2-2) According to the initial value of the projection matrix
Figure FDA0002478916830000035
Use the following formula to get the initial value of the pseudo-label matrix:
Figure FDA0002478916830000036
Figure FDA0002478916830000036
Figure FDA0002478916830000037
Figure FDA0002478916830000037
式中,
Figure FDA0002478916830000038
表示中间辅助变量,
Figure FDA0002478916830000039
是伪标签矩阵的初始值
Figure FDA00024789168300000310
第i列第k行的元素。
In the formula,
Figure FDA0002478916830000038
represents an intermediate auxiliary variable,
Figure FDA0002478916830000039
is the initial value of the pseudo-label matrix
Figure FDA00024789168300000310
The element in column i and row k.
6.根据权利要求4所述的基于联合分布最小二乘回归的跨数据库语音情感识别方法,其特征在于:步骤(3-4)具体包括:6. the cross-database speech emotion recognition method based on joint distribution least squares regression according to claim 4, is characterized in that: step (3-4) specifically comprises: (3-4-1)获取所述最小二乘回归模型的增广拉格朗日方程:(3-4-1) Obtain the augmented Lagrangian equation of the least squares regression model:
Figure FDA00024789168300000311
Figure FDA00024789168300000311
式中,T为拉格朗日乘子,k>0为一个正则参数,tr()表示求矩阵的迹;In the formula, T is the Lagrange multiplier, k>0 is a regular parameter, and tr() represents the trace of the matrix; (3-4-2)保持P、T、k不变,更新Q:(3-4-2) Keep P, T, and k unchanged, and update Q: 将增广拉格朗日方程中与变量Q有关的部分提出,得到:The part related to the variable Q in the augmented Lagrangian equation is proposed, and we get:
Figure FDA00024789168300000312
Figure FDA00024789168300000312
求解上式,得到:Solving the above formula, we get:
Figure FDA00024789168300000313
Figure FDA00024789168300000313
(3-4-3)保持Q、T、k不变,更新P:(3-4-3) Keep Q, T, and k unchanged, and update P: 将增广拉格朗日方程中与变量P有关的部分提出,得到:The part related to the variable P in the augmented Lagrangian equation is proposed, and we get:
Figure FDA0002478916830000041
Figure FDA0002478916830000041
求解上式,得到:Solving the above formula, we get:
Figure FDA0002478916830000042
Figure FDA0002478916830000042
Pi是P的第i个列向量,Ti是T的第i个列向量;Pi is the ith column vector of P, and Ti is the ith column vector of T; (3-4-4)保持Q、P不变,更新T、k:(3-4-4) Keep Q and P unchanged, and update T and k: T=T+k(P-C)T=T+k(P-C) k=min(ρk,kmax)k=min(ρk,k max ) 式中,kmax是预设k的最大值,ρ是缩放系数,ρ>1;In the formula, k max is the maximum value of the preset k, ρ is the scaling factor, ρ>1; (3-4-5)检查是否收敛:(3-4-5) Check for convergence: 检查||P-Q||<ε是否成立,若否,则返回执行步骤(3-4-2),若是或迭代次数大于设置值,则将此时的P的值作为所求的稀疏投影矩阵,|| ||表示求数据中的最大元素,ε表示收敛阈值。Check whether ||PQ|| <ε is true, if not, return to step (3-4-2), if or the number of iterations is greater than the set value, the value of P at this time is used as the required sparse projection matrix , || || represents the largest element in the data, and ε represents the convergence threshold.
7.根据权利要求1所述的基于联合分布最小二乘回归的跨数据库语音情感识别方法,其特征在于:步骤(4)中所述测试数据库语音情感类别标签的计算方法为:7. the cross-database speech emotion recognition method based on joint distribution least squares regression according to claim 1, is characterized in that: the calculation method of test database speech emotion class label described in step (4) is: 采用下式计算:Calculated using the following formula:
Figure FDA0002478916830000043
Figure FDA0002478916830000043
Figure FDA0002478916830000044
Figure FDA0002478916830000044
式中,P为步骤(3)学习到的投影矩阵,Xt表示测试数据库语音片段的特征向量集合,即待识别语音片段的特征向量集合,
Figure FDA0002478916830000045
表示中间辅助变量,j*表示待识别语音片段的语音情感类别标签。
In the formula, P is the projection matrix learned in step (3), X t represents the feature vector set of the speech segment in the test database, that is, the feature vector set of the speech segment to be recognized,
Figure FDA0002478916830000045
represents the intermediate auxiliary variable, and j * represents the speech emotion category label of the speech segment to be recognized.
8.一种基于联合分布最小二乘回归的跨数据库语音情感识别装置,包括处理器及存储在存储器上并可在处理器上运行的计算机程序,其特征在于:所述处理器执行所述程序时实现权利要求1-6中任意一项所述的方法。8. A cross-database speech emotion recognition device based on joint distribution least squares regression, comprising a processor and a computer program stored on a memory and running on the processor, characterized in that: the processor executes the program When implementing the method of any one of claims 1-6.
CN202010372728.2A 2020-05-06 2020-05-06 Cross-database speech emotion recognition method and device based on joint distribution least square regression Active CN111583966B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010372728.2A CN111583966B (en) 2020-05-06 2020-05-06 Cross-database speech emotion recognition method and device based on joint distribution least square regression

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010372728.2A CN111583966B (en) 2020-05-06 2020-05-06 Cross-database speech emotion recognition method and device based on joint distribution least square regression

Publications (2)

Publication Number Publication Date
CN111583966A true CN111583966A (en) 2020-08-25
CN111583966B CN111583966B (en) 2022-06-28

Family

ID=72113186

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010372728.2A Active CN111583966B (en) 2020-05-06 2020-05-06 Cross-database speech emotion recognition method and device based on joint distribution least square regression

Country Status (1)

Country Link
CN (1) CN111583966B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112397092A (en) * 2020-11-02 2021-02-23 天津理工大学 Unsupervised cross-library speech emotion recognition method based on field adaptive subspace
CN113112994A (en) * 2021-04-21 2021-07-13 江苏师范大学 Cross-corpus emotion recognition method based on graph convolution neural network
CN115035915A (en) * 2022-05-31 2022-09-09 东南大学 Cross-database speech emotion recognition method and device based on implicit alignment subspace learning
CN115171662A (en) * 2022-06-29 2022-10-11 东南大学 Cross-library speech emotion recognition method and device based on CISF (common information System) model
CN115497508A (en) * 2022-08-23 2022-12-20 东南大学 CDAR model-based cross-library speech emotion recognition method and device

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120221333A1 (en) * 2011-02-24 2012-08-30 International Business Machines Corporation Phonetic Features for Speech Recognition
CN103594084A (en) * 2013-10-23 2014-02-19 江苏大学 Voice emotion recognition method and system based on joint penalty sparse representation dictionary learning
US9892726B1 (en) * 2014-12-17 2018-02-13 Amazon Technologies, Inc. Class-based discriminative training of speech models
CN110120231A (en) * 2019-05-15 2019-08-13 哈尔滨工业大学 Across corpus emotion identification method based on adaptive semi-supervised Non-negative Matrix Factorization
CN110390955A (en) * 2019-07-01 2019-10-29 东南大学 A cross-database speech emotion recognition method based on deep domain adaptive convolutional neural network
CN111048117A (en) * 2019-12-05 2020-04-21 南京信息工程大学 Cross-library speech emotion recognition method based on target adaptation subspace learning

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120221333A1 (en) * 2011-02-24 2012-08-30 International Business Machines Corporation Phonetic Features for Speech Recognition
CN103594084A (en) * 2013-10-23 2014-02-19 江苏大学 Voice emotion recognition method and system based on joint penalty sparse representation dictionary learning
US9892726B1 (en) * 2014-12-17 2018-02-13 Amazon Technologies, Inc. Class-based discriminative training of speech models
CN110120231A (en) * 2019-05-15 2019-08-13 哈尔滨工业大学 Across corpus emotion identification method based on adaptive semi-supervised Non-negative Matrix Factorization
CN110390955A (en) * 2019-07-01 2019-10-29 东南大学 A cross-database speech emotion recognition method based on deep domain adaptive convolutional neural network
CN111048117A (en) * 2019-12-05 2020-04-21 南京信息工程大学 Cross-library speech emotion recognition method based on target adaptation subspace learning

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
YUAN ZONG ET AL.: "Cross-Corpus Speech Emotion Recognition Based on Domain-adaptive Least Squares Regression", 《IEEE》 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112397092A (en) * 2020-11-02 2021-02-23 天津理工大学 Unsupervised cross-library speech emotion recognition method based on field adaptive subspace
CN113112994A (en) * 2021-04-21 2021-07-13 江苏师范大学 Cross-corpus emotion recognition method based on graph convolution neural network
CN113112994B (en) * 2021-04-21 2023-11-07 江苏师范大学 Cross-corpus emotion recognition method based on graph convolutional neural network
CN115035915A (en) * 2022-05-31 2022-09-09 东南大学 Cross-database speech emotion recognition method and device based on implicit alignment subspace learning
CN115171662A (en) * 2022-06-29 2022-10-11 东南大学 Cross-library speech emotion recognition method and device based on CISF (common information System) model
CN115497508A (en) * 2022-08-23 2022-12-20 东南大学 CDAR model-based cross-library speech emotion recognition method and device

Also Published As

Publication number Publication date
CN111583966B (en) 2022-06-28

Similar Documents

Publication Publication Date Title
CN111583966B (en) Cross-database speech emotion recognition method and device based on joint distribution least square regression
Sincan et al. Autsl: A large scale multi-modal turkish sign language dataset and baseline methods
CN112800998B (en) Multi-mode emotion recognition method and system integrating attention mechanism and DMCCA
Yang et al. Visual goal-step inference using wikihow
CN108805009A (en) Classroom learning state monitoring method based on multimodal information fusion and system
Du et al. Spatio-temporal encoder-decoder fully convolutional network for video-based dimensional emotion recognition
CN111126263A (en) Electroencephalogram emotion recognition method and device based on double-hemisphere difference model
CN112397092A (en) Unsupervised cross-library speech emotion recognition method based on field adaptive subspace
Takano et al. Bigram-based natural language model and statistical motion symbol model for scalable language of humanoid robots
CN111048117A (en) Cross-library speech emotion recognition method based on target adaptation subspace learning
CN116029305A (en) Chinese attribute-level emotion analysis method, system, equipment and medium based on multitask learning
CN105426882A (en) Method for rapidly positioning human eyes in human face image
CN116110119A (en) Human behavior recognition method and system based on self-attention active contrast coding
CN108549703A (en) A kind of training method of the Mongol language model based on Recognition with Recurrent Neural Network
Wang et al. Early facial expression recognition using hidden markov models
Han et al. Towards hard few-shot relation classification
CN119323818A (en) Student emotion analysis method and system based on multi-mode dynamic memory big model
CN105632485A (en) Language distance relation obtaining method based on language identification system
Shu et al. Gaze behavior based depression severity estimation
Chai et al. Communication tool for the hard of hearings: A large vocabulary sign language recognition system
Ren et al. Subject-independent natural action recognition
Cui et al. SinKD: Sinkhorn Distance Minimization for Knowledge Distillation
CN118133837A (en) Thinking course learning method and related device based on education universe
Parvini et al. An algorithmic approach for static and dynamic gesture recognition utilising mechanical and biomechanical characteristics
CN110879966A (en) Student class attendance comprehension degree evaluation method based on face recognition and image processing

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant