CN103810994B

CN103810994B - Speech emotional inference method based on emotion context and system

Info

Publication number: CN103810994B
Application number: CN201310401319.0A
Authority: CN
Inventors: 毛启容; 白李娟; 王丽
Original assignee: Jiangsu University
Current assignee: Jiangsu University
Priority date: 2013-09-05
Filing date: 2013-09-05
Publication date: 2016-09-14
Anticipated expiration: 2033-09-05
Also published as: CN103810994A

Abstract

The invention discloses a kind of speech emotional inference method based on emotion context and system, the method includes: extracts context speech emotional feature and traditional voice affective characteristics in adjacent emotion statement, sets up context model and conventional model respectively by the difference of feature classification；Continuous speech to be analyzed is divided into the emotion statement sequence that emotion is relatively independent, then use fusion method based on affective interaction matrix context model and conventional model the result of decision of the current emotion statement of continuous speech to be analyzed to be merged, obtain preliminary recognition result；From whole continuous speech angle to be analyzed, the emotional category emotion Context Reasoning rule of each statement is adjusted, obtains the emotional category sequence of continuous speech to be analyzed.The present invention uses the emotion reasoning algorithm of emotion context, by affective interaction matrix, the affective state of emotion statement to be analyzed is analyzed and is adjusted, thus improves the accuracy rate of continuous speech emotion recognition.

Description

Speech emotional inference method based on emotion context and system

Technical field

The present invention relates to Speech processing, sentiment analysis and mode identification technology, particularly relate to a kind of upper and lower based on emotion The speech emotional inference method of literary composition and system.

Background technology

Promotion intellectuality, the development of the novel human-machine interaction technology of hommization and application are had by the development of speech emotion recognition technology Important effect.The affective state of speaker is ground by each field in recent years how to use computer technology automatically to identify from voice The extensive concern of the person of studying carefully.In speech emotion recognition research field, researchers start gradually to pay close attention to contextual information to improving emotion The impact of recognition accuracy.So-called context refers to express relevant object and object to be analyzed self to object emotion to be analyzed Personal information (including: sex, age, culture, language, schooling, talk background etc.) and when nearest one section Between the information such as affective state.

Prior art one analyzes sex, subject matter, speaker, the linguistic context contextual informations such as the content effect to emotion recognition of speaking, But it is analyzed mainly for isolated, non-natural simple sentence, yet the emotional speech of continuous expression under natural environment is not retouched State and process.Prior art two begins to focus on contextual information entrained between word and surrounding, it is proposed that context environmental, Dynamic environment and sentence global context 3 class totally 5 kinds of environmental characteristics, and pass through experimental demonstration contextual information to improving emotion recognition The contribution of accuracy rate, but the scheme that this article is proposed needs to build a large amount of and abundant emotion lexicon, and require in feelings Perception not before must identify the content of speaking of speaker, content aware accuracy rate of speaking can affect the accuracy rate of emotion recognition, And the identification of content of speaking adds the time complexity of emotion recognition.Prior art three always according to voice acoustic features and without Identify the content of speaking of speaker, analyze influencing each other of the person-to-person affective state of two talked with, drawn dialogue The transference matrix of both sides.

But, in prior art, the emotion recognition of continuous speech is just for each current sentence analysis, in order to solve prior art Defect, therefore, the present invention provides a kind of speech emotional inference method based on emotion context and system, mainly utilizes mankind's feelings Sense is expressed and change is a continuous print process, between affective state and the affective state i.e. will expressed that object to be analyzed is current There is the feature of certain association, the continuous speech for single speaker carries out emotion recognition, has invented emotion contextual feature Extracting method and speech emotional inference method based on emotion context, the present invention solves without identifying that speaker speaks content Under conditions of, the problem improving continuous speech emotion recognition rate.

Summary of the invention

The present invention is directed to the defect that in background technology, the emotion recognition of continuous speech is analyzed just for each current sentence, it is provided that one Plant speech emotional inference method based on emotion context and system, the extracting method of invention speech emotional contextual feature and foundation Efficient speech emotional inference pattern based on emotion context, constitutes complete speech emotional reasoning side based on emotion context Method.The final accuracy rate improving continuous speech emotion recognition.

To achieve these goals, the technical scheme that the embodiment of the present invention provides is as follows:

A kind of speech emotional inference method based on emotion context, described method includes:

S1, in adjacent emotion statement, extract context speech emotional feature and traditional voice affective characteristics, by feature classification not With setting up context model and conventional model respectively；

S2, continuous speech to be analyzed is divided into the emotion statement sequence that emotion is relatively independent, and extracts the upper of described emotion statement Hereafter speech emotional feature and traditional voice affective characteristics. then it is respectively adopted context model and conventional model is identified, Go out the two model and band is analyzed the decision vector of emotion statement；

Continuous speech to be analyzed is worked as by S3, employing fusion method based on affective interaction matrix by context model and conventional model The result of decision of front emotion statement merges, and obtains preliminary recognition result；

S4, from whole continuous speech angle to be analyzed, the emotional category emotion Context Reasoning rule of each statement is adjusted Whole, obtain the emotional category sequence of continuous speech to be analyzed.

As a further improvement on the present invention, described step S3 includes:

When utilizing conventional model and context model that maximum two class of emotion statement decision vector to be analyzed is merged, introduce The existing affective interaction matrix counted, market of going forward side by side sense Interactive matrix processes, obtains emotion context Interactive matrix, context Interactive matrix carries out fusion reasoning together with two decision vectors to the emotional category of emotion statement.

As a further improvement on the present invention, described step S4 includes:

Emotion Context Reasoning rule utilizes the emotional expression of people to have successional feature, according to the emotion class of the most adjacent statement The other emotional category to current emotion statement is adjusted.

As a further improvement on the present invention, before and after the adjacent emotion statement in described step S1 is, adjacent emotion statement is previous Rear 1/3 sound section of part of sentence and the whole statement of a rear statement.

As a further improvement on the present invention, described context speech emotional feature includes: the dynamic affective characteristics of context, up and down Literary composition difference affective characteristics, the dynamic affective characteristics in context edge and context edge difference affective characteristics.

As a further improvement on the present invention, the dynamic affective characteristics of described context be adjacent emotion statement previous sentence rear 1/3 In whole sound section of sound section of part and latter one in 101 dimension traditional voice affective characteristicses with rate of change, mean change and association side The speech emotional behavioral characteristics of 33 dimensions of difference correlation.

As a further improvement on the present invention, described context difference affective characteristics is first after previous sentence to adjacent emotion statement Whole sound section of 1/3 sound section and latter one extracts 101 traditional dimension speech emotional features respectively, the most again the two is done difference The feature obtained after dividing operation.

As a further improvement on the present invention, the dynamic affective characteristics in described context edge is the previous sentence from adjacent emotion statement Rear 1/3 sound section of part is moved with the 33 dimension speech emotionals extracted in the adjacent sentence in edge of 1/3 sound section of part composition before latter State feature.

As a further improvement on the present invention, described hereafter edge difference affective characteristics is by context difference feelings in the adjacent sentence in edge The feature that sense feature extracting method extracts.

Correspondingly, a kind of speech emotional inference system based on emotion context, described system includes:

Training unit, for extracting context speech emotional feature and traditional voice affective characteristics, by spy in adjacent emotion statement The difference levying classification sets up context model and conventional model respectively；

Recognition unit, for continuous speech to be analyzed is divided into the emotion statement sequence that emotion is relatively independent, extracts described respectively The context speech emotional feature of statement and traditional voice affective characteristics, be then respectively adopted the context model and tradition trained Model carries out emotion recognition to current statement, draws current statement decision vector on two models；

Fusion recognition unit, for the decision-making to the current emotion statement of continuous speech to be analyzed by context model and conventional model Result merges, and obtains preliminary recognition result；

Adjustment unit, for advising the emotional category emotion Context Reasoning of each statement from whole continuous speech angle to be analyzed Then it is adjusted, obtains the emotional category sequence of continuous speech to be analyzed.

The method have the advantages that

1, between continuous emotion statement, successfully extract context speech emotional feature, and carry from single emotion statement by its auxiliary The traditional voice affective characteristics taken, thus improve the emotion recognition efficiency of continuous speech；

2, the affective interaction matrix of existing statistics is utilized dexterously, by emotion language to be identified based on context speech emotional feature The affective state of sentence and emotion statement volume affective state to be identified based on traditional voice affective characteristics carry out emotion reasoning fusion, To the preliminary emotion recognition result to emotion statement to be identified；

3, the emotion change utilizing continuous emotion statement has the feature of stability, has formulated emotion Context Reasoning rule to whole Individual continuous identification voice carries out context-sensitive adjustment.

Accompanying drawing explanation

Fig. 1 is speech emotional inference method frame diagram based on emotion context in an embodiment of the present invention；

Fig. 2 is emotion reasoning algorithm flow chart based on emotion context in an embodiment of the present invention.

Detailed description of the invention

Describe the present invention below with reference to each embodiment shown in the drawings.But these embodiments are not limiting as this Invention, structure, method or conversion functionally that those of ordinary skill in the art is made according to these embodiments are all wrapped Containing within the scope of the present invention.

The invention discloses a kind of speech emotional inference method based on emotion context, including:

Specifically include:

Step 1: train speech emotion recognition model based on traditional voice affective characteristics.

Step 1.1: the emotional speech signal in training storehouse is carried out pretreatment, including preemphasis, windowing process, framing, end Point detection.

Step 1.2: the emotion statement in training set is extracted conventional traditional voice affective characteristics 101 and ties up, including Mel-cepstrum The acoustics of the voices such as coefficient, fundamental frequency, duration, intensity, amplitude, tonequality and formant and prosodic features.

The step 1.3 feature to extracting uses the character pair of neutral statement to be normalized, and then uses SFFS (Sequential Forward Floating Search) method carries out feature selection, after feature selection, remaining 56 traditional voice affective characteristicses.

Step 1.4: use 56 dimension traditional voice affective characteristics training SVM classifier of emotion statement in training set, obtain base Speech emotion recognition model in traditional voice affective characteristics.

Step 2: train speech emotion recognition model based on context speech emotional feature.

Step 2.1: the emotion statement in the pretreated training set of step 1.1 is extracted context speech emotional feature, Including: the dynamic affective characteristics of context, context difference affective characteristics, the dynamic affective characteristics in context edge, context edge Difference affective characteristics totally 268 dimension.

Step 2.2: the context speech emotional feature extracting step 2.1 uses the character pair of neutral statement to be normalized, Then SFFS (Sequential Forward Floating Search) method is used to carry out feature selection, remaining after feature selection 91 context speech emotional features.

Step 2.3: use the 91 dimension context speech emotional features training SVM (Support that emotion statement in training set is extracted Vector Machine support vector machine) grader, obtain speech emotion recognition model based on context speech emotional feature.

Step 3: identify the affective state of emotion statement to be identified

Step 3.1: continuous emotional speech signal to be identified is carried out pretreatment, including preemphasis, windowing process, automatic segmentation, Framing and end-point detection.

Step 3.2: extract the 56 dimension traditional voice affective characteristicses selected through step 1.2 of emotional speech signal to be identified.

Step 3.3: the speech emotion recognition model based on traditional voice affective characteristics that input step 1.4 training obtains is known Not, spy to recognition result be expressed as TP.

Step 3.4: extract the 91 dimension context speech emotional features selected through step 2.2 of emotional speech signal to be identified.

Step 3.5: the speech emotion recognition model based on context speech emotional feature that input step 2.3 training obtains is carried out Identify, spy to recognition result be expressed as CP.

Step 4: according to the recognition result TP of speech emotion recognition model based on traditional voice affective characteristics with based on context The recognition result CP of the speech emotion recognition model of speech emotional feature, uses blending algorithm to merge the recognition result of two models, Tentatively obtain the confidence level of the emotional category belonging to voice signal to be identified and this result.

Step 5: use rule of inference based on emotion context, according to statement before and after emotion statement to be analyzed in continuous speech Affective state, the affective state being embodied emotion statement to be analyzed is adjusted, obtain emotion statement to be analyzed finally belonging to Affective state.

Correspondingly, the invention also discloses a kind of speech emotional inference system based on emotion context, including:

Recognition unit, for continuous speech to be analyzed is divided into the emotion statement sequence that emotion is relatively independent, extracts these respectively The context speech emotional feature of statement and traditional voice affective characteristics, be then respectively adopted the context model and tradition trained Model carries out emotion recognition to current statement, draws current statement decision vector on two models.

Fusion recognition unit, for the decision-making to the current emotion statement of continuous speech to be analyzed by context model and conventional model Result merges, and obtains preliminary recognition result；Adjustment unit, is used for from whole continuous speech angle to be analyzed each language The emotional category emotion Context Reasoning rule of sentence is adjusted, and obtains the emotional category sequence of continuous speech to be analyzed.

With detailed description of the invention, the present invention is further elaborated below in conjunction with the accompanying drawings:

As it is shown in figure 1, be emotion inference system block diagram based on emotion context in the embodiment of the invention, mainly divide For four-stage: training stage, cognitive phase, fusion recognition stage and emotion based on emotion Context Reasoning rule adjust rank Section.

1, the training stage

Training stage sets up speech emotion recognition model based on traditional voice affective characteristics and based on context speech emotional feature Speech emotion recognition model, be divided into three steps:

(1) emotional speech Signal Pretreatment.

This step is to use traditional speech signal pre-processing method that emotional speech signal is carried out pretreatment, including preemphasis, adds Window process, framing, end-point detection.

(2) extraction of traditional voice affective characteristics and speech emotion recognition model training based on traditional voice affective characteristics.

(2-1) extraction of current emotion statement included MFCC cepstrum, fundamental frequency, duration, intensity, amplitude, tonequality and be total to Shake the acoustics of the voices such as peak and prosodic features, and extracts the maximum of these features, minima and change on emotion statement respectively The statistical natures such as change scope.The extracting method of these features is not belonging to the part of the present invention, does not the most do narration in detail.Extract Specific features is shown in Table 1.

The description of table 1 traditional voice affective characteristics

(2-2) use the feature of neutral emotion that the extracted feature of (2-1) step is normalized, then use SFFS to 101 Dimension traditional voice affective characteristics carries out feature selection, remaining 56 dimensions after feature selection.

(2-3) use 56 dimension traditional voice affective characteristicses after selecting to speech emotion recognition based on traditional voice affective characteristics Model is trained, and the identification model in present embodiment uses SVM.

(3) extraction of context speech emotional feature and speech emotion recognition model training based on context speech emotional feature.

(3-1) context speech emotional feature, including the dynamic affective characteristics of context, context difference affective characteristics are extracted, on The hereafter dynamic affective characteristics in edge, context edge difference affective characteristics.

(3-1-1) extraction of the dynamic affective characteristics of context: two emotion statements of continuous print are extracted short-time energies, zero-crossing rate, Mel cepstrum coefficients (front 12 coefficients), fundamental frequency, tonequality, mute rate, the maximum of first three formant coefficient, The statistical nature such as minima and excursion totally 33 dimensional feature.Specific features is shown in Table 2.

The description of table 2 speech emotional behavioral characteristics

(3-1-2) extraction of context difference affective characteristics: two emotion statements of continuous print are extracted every statement in short-term respectively Energy, zero-crossing rate, mel cepstrum coefficients (front 12 coefficients), fundamental frequency, tonequality, mute rate, first three formant Statistical nature totally 101 dimensional features such as the features such as coefficient and the maximum of these affective characteristicses, minima and excursion.So After deduct the corresponding affective characteristics of previous sentence statement with the affective characteristics of a rear emotion statement again to obtain 101 dimension contexts poor Divide affective characteristics.

(3-1-3) extraction of the dynamic affective characteristics in context edge: extract from continuous two emotion statements rear the 1/3 of last sentence Sound section starts before rear a word the short-time energy of one section of emotion statement of 1/3 sound section of cut-off, zero-crossing rate, mel cepstrum Coefficient (front 12 coefficients), fundamental frequency, tonequality, mute rate, the maximum of first three formant coefficient, minima with And statistical nature totally 33 dimensional feature such as excursion.

(3-1-4) extraction of context edge difference affective characteristics: extract in continuous two emotion statements after last sentence respectively 1/3 sound section and the short-time energy of front 1/3 sound section of two fragment of rear a word, zero-crossing rate, mel cepstrum coefficients (front 12 Individual coefficient), fundamental frequency, tonequality, mute rate, the feature such as first three formant coefficient and the maximum of these features, The statistical nature such as minima and excursion totally 101 dimensional feature.Again with the 101 dimension affective characteristicses of first 1/3 section of rear a word The corresponding affective characteristics of deduct last sentence latter 1/3 section obtains 101 dimension context edge difference affective characteristicses.

(3-2) use the feature of neutral emotion that (3-1-1), (3-1-2), (3-1-3) and (3-1-4) extracted feature of step is carried out Normalization, then uses SFFS that 268 dimension context speech emotional features are carried out feature selection, remaining after feature selection 91 dimensions.

(3-3) use 91 dimension context speech emotional features after selecting to speech emotional based on context speech emotional feature Identification model is trained, and identification model here uses SVM.

2, cognitive phase

Cognitive phase is by emotion statement to be identified after characteristic of correspondence is extracted, the feature input first stage that will be extracted The model trained, calculates this emotion statement affective state recognition result on each model, and point three steps are implemented.

(1) use segmentation method based on energy envelope and dwell interval that continuous speech is carried out segmentation continuous emotional speech signal.

(2) the emotional speech signal after segmentation being carried out pretreatment, the method used is with (1st) step of training stage.

(3) extraction of traditional voice feature and speech emotion recognition based on traditional voice affective characteristics in emotion statement to be identified.

(3-1) after feature selection 56 dimension traditional voice affective characteristicses of current statement in emotion statement to be identified, institute are extracted The method used is with (2-1) step of training stage.

(3-2) affective state of current statement in emotion statement to be identified is identified.

Traditional voice affective characteristics input first stage (2-3) of current statement in the statement to be identified that this stage (3-1) step is extracted The speech emotion recognition model based on traditional voice affective characteristics that step has trained, calculates this emotion statement institute to be identified The affective state embodied.

(4) extract the context speech emotional feature of current statement in emotion statement to be identified and use extracted context voice The context speech emotion recognition Model Identification current statement that affective characteristics identification trains through training stage (3-3) step is comprised Affective state.

(3-1) extraction of the context speech emotional feature of emotion statement to be identified, the extracting method used is with the training stage (3-1) step, this step only extracts remaining 91 dimension context speech emotional features after feature selection.

(3-2) the 91 dimension context speech emotional feature input first stage of the statement to be identified that this stage (3-1) step is extracted (3-3) the speech emotion recognition model based on context speech emotional feature that step has trained, draws this emotion statement to be identified The affective state comprised.

3, the fusion recognition stage

According to step in cognitive phase (3-2) draw emotion statement to be identified based on traditional voice affective characteristics affective state and The affective state of the emotion statement based on context speech emotional feature to be identified that step (3-2) draws, according to following fusion method Two recognition results identifying model are merged, tentatively draws the final affective state that emotion statement to be identified is comprised.

Fusion method:

If test sample integrates as Test={ts₁, ts₂..., ts_n, the emotion statement list that currently need to identify is shown as ts_j, model is treated Identify sample ts_jFinal identify that emotional category is expressed as PreLabel (ts_j).If E={e₁, e₂..., e_mFor there being m class emotion Target emotional category collection, matrix IM represents context Interactive matrix, as shown in formula (1):

IM = [\begin{matrix} {IP}_{(1, 1)}, & \cdot \cdot \cdot & {IP}_{(1, j)}, & \cdot \cdot \cdot & {IP}_{(1, m)} \\ {IP}_{(i, 1)}, & \cdot \cdot \cdot & {IP}_{(i, j)}, & \cdot \cdot \cdot & {IP}_{(i, m)} \\ {IP}_{(m, 1)}, & \cdot \cdot \cdot & {IP}_{(m, j)}, & \cdot \cdot \cdot & {IP}_{(m, m)} \end{matrix}] - - - (1)

Wherein, vector IM_i=(IP_{(i, 1)}..., IP_{(i, j)}..., IP_{(i, m))}The most vectorial for emotion context, represent front The emotional category of one emotion statement is e_iTime, current emotion statement belongs to the probability of all kinds of affective state.And element c_{I, j}Represent and work as The emotional category of previous sentence emotion statement is e_iTime, current emotion statement belongs to emotional category e_jProbability.Wait to know if TP represents Sorrow of separation sense statement through based on tradition affective characteristics identifications Model Identification after output press the descending sequence of probit probability to Amount, is expressed as TP=(tp₁, tp₂, tp₃..., tp_i..., tp_m), wherein, tp_iRepresent that emotion statement to be identified passes through Draw after identification Model Identification based on traditional voice affective characteristics belongs to emotion e_iProbability.CP represents emotion language to be identified Sentence output after identification Model Identification based on context affective characteristics, by the probability vector of the descending sequence of probit, represents For CP=(cp₁, cp₂, cp₃..., cp_i..., cp_m), wherein, cp_iRepresent that emotion statement to be identified is through based on upper Hereafter draw after the identification Model Identification of speech emotional feature belongs to emotion e_iProbability.If reliability rating TrustLevel (ts_j)= {A₁, A₂, A₃Representing the reliability rating to each the reasoning results, its reliability rating is divided into A₁、A₂And A₃Three Estate, letter Appointing degree is A₁>A₂>A₃.Fusion method is divided into 2 steps to implement.

(1) data prepare.

This part is that the data fusion that will carry out does data preparation, and data, in addition to probability vector TP and CP, also need basis The emotion statement ts that currently need to identify_jPrevious emotion statement ts_j-1Emotional category from emotion context Interactive matrix select correspondence Emotion context vector IM alternately_i.Wherein, emotion context Interactive matrix is that root counts in dialogue from Chinese Typical Representative drama Refining in the emotion change Interactive matrix of two people's dialogues under scene, Interactive matrix is as shown in table 3.This affective interaction matrix Add up 4000 sections of dialogues of up to a hundred hours, covered man and woman is talked with by man, man by female, female.

Table 3 Interactive matrix

In table 3, marked the affective interaction rule of two people's dialogues of A and B.This table is divided into left and right two parts, represents In the case of the emotional category of one people determines, there is distribution and the probability size of above-mentioned 6 kinds of emotions in another person.The wherein left side Part represents when the affective state of given B, the emotion probability distribution information of A；In like manner, right-hand component represents and is determining A's During affective state, the emotion distribution that B is likely to occur.The emotion caused due to the individual variation of people to eliminate left and right both sides is divided The difference of cloth, is averaging the distribution probability of emotion pair in two tables, obtains the Interactive matrix shown in table 4, be defined as context Interactive matrix.In context Interactive matrix shown in the computational methods of probability such as formula (2), in formula, AIP_{(I, j)}Represent when A is emotion e_iTime, B is emotion e_jProbability；BIP_{(I, j)}Represent when B is emotion e_iTime, A is emotion e_jProbability: IP_{(I, j)}Represent current One is emotion e_iTime, latter one is emotion e_jProbability.

Table 4 context Interactive matrix

{IP}_{(i, j)} = \frac{{AIP}_{(i, j)} + {BIP}_{(i, j)}}{2} - - - (2)

(2) with emotion reasoning algorithm based on emotion context, ready for upper step data being carried out emotion reasoning fusion, algorithm is retouched State as follows:

Input: emotion statement ts to be identified_j；

Probability output vector T P based on tradition affective characteristics identification model；

Probability output vector CP based on context affective characteristics identification model；

Affective interaction vector IM_i。

Output: emotion statement ts_jEmotional category PreLabel (ts belonging to interim_j)。

The concrete reasoning process of algorithm is described as follows:

(1) if tp₁With cp₁Emotional category identical, the emotional category PreLabel (ts of sample the most to be tested_j) it is labeled as tp₁(cp₁) Classification, reliability rating is A₃, algorithm terminates；

(2) otherwise, tp₁With cp₁Emotional category different, now come with maximum classification together by the classification that probability is second largest Decision-making.

If (2-1) tp₂With cp₂Emotional category identical, be divided into following 3 species:

If (2-1-1) IP_{(i, 1)}With tp₂(cp₂) emotional category identical, the emotional category PreLabel (ts of sample the most to be tested_j) labelling For IP_{(i, 1)}Classification, reliability rating is A₂, algorithm terminates；

If (2-2-2) IP_{(i, 1)}With tp₁(or cp₁) emotional category identical, the emotional category PreLabel (ts of sample the most to be tested_j) It is labeled as tp₁(or cp₁) classification, reliability rating is A₂, algorithm terminates；

(2-2-3) otherwise, element tp is calculated₁And cp₁Confidence level, then the emotional category that confidence level the greater is corresponding is labeled as treating Emotional category PreLabel (the ts of test sample_j), reliability rating is A₁, algorithm terminates.

If (2-2) tp₂With cp₂Emotional category different, be divided into following 2 kinds of conditions:

If 1. tp₁With cp₂Emotional category identical, (or tp₂With cp₁Emotional category identical), the emotion of sample the most to be tested PreLabel(ts_j) it is labeled as tp₁(or cp₁) classification, reliability rating is A₂, algorithm terminates；

The most otherwise, element tp is calculated₁、tp₂、cp₁And cp₂The confidence level of four, then the emotional category that confidence level the maximum is corresponding It is labeled as the emotional category PreLabel (ts of sample to be tested_j), reliability rating is A₁Algorithm terminates.

Wherein, in algorithm two models probability output vector T P and in TC the confidence level of each element be designated as conf_{(I, j)}, represent Emotion statement ts_iBelong to emotional category e_jConfidence level, shown in computational methods such as formula (4), this confidence level is made up of two parts, one Part comes from the output vector self of model, is designated as Pconf_{(I, j)}, shown in its computational methods such as formula (3)；Another part comes from Context Interactive matrix.In formula, P_{(I, j)}Represent the emotion statement ts identifying model output_iBelong to emotional category e_jProbability.

{Pconf}_{(i, j)} = (p_{(i, j)} - \frac{1}{m - 1} {\underset{k = 1}{Σ}}_{k &NotEqual; j}^{m} p_{(i, k)}) - - - (3)

{conf}_{(i, j)} = c_{(i, j)} * {Pconf}_{(i, j)} = c_{(i, j)} * (p_{(i, j)} - \frac{1}{m - 1} {\underset{k = 1}{Σ}}_{k &NotEqual; j}^{m} p_{(i, k)}) - - - (4)

Algorithm, for the arbitrary judgment of employing maximum of probability emotional category in the past, increases time big probability emotional category and context is mutual Matrix assists it to treat test sample emotional category finally to judge.Meanwhile, the different arbitration schemes of each rule are arranged different Reliability rating, its structure is as shown in Figure 2.

4, the emotion metamorphosis stage based on emotion Context Reasoning rule

To test set Test={ts₁, ts₂..., ts_nEmotion statement each to be identified obtain preliminary sentencing by as above algorithm reasoning Certainly result.The reliability rating that in the situation of sudden change and reasoning algorithm, different differentiation schemes are arranged seldom is there is, to test according to emotion The recognition result of each emotion statement of collection Test carries out context-sensitive adjustment according to emotion Context Reasoning rule.

The sentence that the order of emotion Context Reasoning rule currently need to adjust is ts_jSentence (j=2,3 ..., n), use ts_j-1And ts_j+1 To ts_jThe result auxiliary of sentence adjusts.Work as ts_j-1And ts_j+1Emotional category when differing, do not process；No person, ts_jEmotion class Other point of following three situation is adjusted, wherein, and Label (ts_j) represent emotion statement ts_jFinal emotional category.

Rule 1 works as ts_jReliability rating be A₁Time, it is believed that ts_jSelf recognition result is credible, does not do any correction, it may be assumed that

TrustLevel ({ts}_{j}) = A_{1} &DoubleRightArrow; Label ({ts}_{j}) = PreLabel ({ts}_{j});

Rule 2 works as ts_jReliability rating be A₂Time, it is believed that ts_jRecognition result is suspicious.According to ts_j-1And ts_j+1Result treatment, If ts_j-1And ts_j+1Reliability rating be not the most A₃, i.e. the two reliability rating is A₁Or A₂, it is believed that ts_j-1And ts_j+1Result can Letter, and to ts_jModified result be same emotional category；Otherwise, do not revise, it may be assumed that

Rule 3 works as ts_jReliability rating be A₃Time, it is believed that ts_jRecognition result is insincere, and revising its emotional category is ts_j-1(ts_j+1) Emotional category, it may be assumed that

(TrustLevel ({ts}_{j}) = A_{3}) &DoubleRightArrow; Label ({ts}_{j}) = Label ({ts}_{j - 1})

This rule makes full use of the contextual information between statement, changes, from emotion, the consideration that seldom suddenlys change, the preliminary knot to reasoning algorithm Fruit adjusts again, obtains the final emotional category of each emotion statement in Test test set.

The method have the advantages that

The present invention breaks through the deficiency that existing speech-emotion recognition method is only analyzed from emotion simple sentence, from adjacent emotion statement Emotion change existence be mutually related characteristic, use the emotion reasoning algorithm of emotion context, by affective interaction matrix, right The affective state of emotion statement to be analyzed is analyzed and adjusts, thus improves the accuracy rate of continuous speech emotion recognition.

Compared with prior art, the method have the advantages that

It is to be understood that, although this specification is been described by according to embodiment, but the most each embodiment only comprises an independence Technical scheme, this narrating mode of description be only the most for clarity sake, those skilled in the art should using description as One entirety, the technical scheme in each embodiment can also through appropriately combined, formed it will be appreciated by those skilled in the art that its His embodiment.

The a series of detailed description of those listed above is only for illustrating of the feasibility embodiment of the present invention, they And be not used to limit the scope of the invention, all equivalent implementations or changes made without departing from skill of the present invention spirit all should Within being included in protection scope of the present invention.

Claims

1. a speech emotional inference method based on emotion context, it is characterised in that described method includes:

S1, in adjacent emotion statement, extract context speech emotional feature and traditional voice affective characteristics, set up context model and conventional model respectively by the difference of feature classification；

S2, continuous speech to be analyzed is divided into the emotion statement sequence that emotion is relatively independent, and extract context speech emotional feature and the traditional voice affective characteristics of described emotion statement, then it is respectively adopted context model and conventional model is identified, draw the two model decision vector to emotion statement to be analyzed；

The result of decision of the current emotion statement of continuous speech to be analyzed is merged by S3, employing fusion method context model based on affective interaction matrix and conventional model, obtains preliminary recognition result；When utilizing conventional model and context model that maximum two class of emotion statement decision vector to be analyzed is merged, introduce the existing affective interaction matrix counted, market of going forward side by side sense Interactive matrix processes, obtaining emotion context Interactive matrix, context Interactive matrix carries out fusion reasoning together with two decision vectors to the emotional category of emotion statement；

S4, from whole continuous speech angle to be analyzed, the emotional category emotion Context Reasoning rule of each statement is adjusted, obtains the emotional category sequence of continuous speech to be analyzed.

Method the most according to claim 1, it is characterised in that described step S4 includes:

Emotion Context Reasoning rule utilizes the emotional expression of people to have successional feature, is adjusted the emotional category of current emotion statement according to the emotional category of the most adjacent statement.

Method the most according to claim 1, it is characterised in that the adjacent emotion statement in described step S1 is rear 1/3 sound section of part and the whole statement of a rear statement of the previous sentence of adjacent emotion statement front and back.

Method the most according to claim 3, it is characterised in that described context speech emotional feature includes: the dynamic affective characteristics of context, context difference affective characteristics, the dynamic affective characteristics in context edge and context edge difference affective characteristics.

Method the most according to claim 4, it is characterized in that, the dynamic affective characteristics of described context is the speech emotional behavioral characteristics that rear 1/3 sound section of part of the previous sentence of adjacent emotion statement ties up 33 dimensions relevant to rate of change, mean change and covariance in traditional voice affective characteristicses with in whole sound section of latter 101.

Method the most according to claim 4, it is characterized in that, described context difference affective characteristics is whole sound section of latter 1/3 sound section and latter of first previous sentence to adjacent emotion statement and extracts 101 traditional dimension speech emotional features, the feature obtained after the most again the two being done difference operation respectively.

Method the most according to claim 5, it is characterized in that, the dynamic affective characteristics in described context edge be rear 1/3 sound section of part of the previous sentence from adjacent emotion statement with latter one before 1/3 sound section part composition the adjacent sentence in edge extract 33 dimension speech emotional behavioral characteristics.

Method the most according to claim 7, it is characterised in that described context edge difference affective characteristics is the feature extracted by context difference emotional characteristic extraction method in the adjacent sentence in edge.

9. a speech emotional inference system based on emotion context, it is characterised in that described system includes:

Training unit, for extracting context speech emotional feature and traditional voice affective characteristics in adjacent emotion statement, sets up context model and conventional model respectively by the difference of feature classification；

Recognition unit, for continuous speech to be analyzed being divided into the emotion statement sequence that emotion is relatively independent, extract context speech emotional feature and the traditional voice affective characteristics of described statement respectively, then it is respectively adopted the context model trained and conventional model and current statement is carried out emotion recognition, draw current statement decision vector on two models；

Fusion recognition unit, for context model and conventional model being merged the result of decision of the current emotion statement of continuous speech to be analyzed, obtains preliminary recognition result；When utilizing conventional model and context model that maximum two class of emotion statement decision vector to be analyzed is merged, introduce the existing affective interaction matrix counted, market of going forward side by side sense Interactive matrix processes, obtaining emotion context Interactive matrix, context Interactive matrix carries out fusion reasoning together with two decision vectors to the emotional category of emotion statement；

Adjustment unit, for being adjusted the emotional category emotion Context Reasoning rule of each statement from whole continuous speech angle to be analyzed, obtains the emotional category sequence of continuous speech to be analyzed.