CN103810994B - Speech emotional inference method based on emotion context and system - Google Patents

Speech emotional inference method based on emotion context and system Download PDF

Info

Publication number
CN103810994B
CN103810994B CN201310401319.0A CN201310401319A CN103810994B CN 103810994 B CN103810994 B CN 103810994B CN 201310401319 A CN201310401319 A CN 201310401319A CN 103810994 B CN103810994 B CN 103810994B
Authority
CN
China
Prior art keywords
emotion
context
statement
speech
emotional
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201310401319.0A
Other languages
Chinese (zh)
Other versions
CN103810994A (en
Inventor
毛启容
白李娟
王丽
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangsu University
Original Assignee
Jiangsu University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangsu University filed Critical Jiangsu University
Priority to CN201310401319.0A priority Critical patent/CN103810994B/en
Publication of CN103810994A publication Critical patent/CN103810994A/en
Application granted granted Critical
Publication of CN103810994B publication Critical patent/CN103810994B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Machine Translation (AREA)
  • Measurement Of The Respiration, Hearing Ability, Form, And Blood Characteristics Of Living Organisms (AREA)

Abstract

The invention discloses a kind of speech emotional inference method based on emotion context and system, the method includes: extracts context speech emotional feature and traditional voice affective characteristics in adjacent emotion statement, sets up context model and conventional model respectively by the difference of feature classification;Continuous speech to be analyzed is divided into the emotion statement sequence that emotion is relatively independent, then use fusion method based on affective interaction matrix context model and conventional model the result of decision of the current emotion statement of continuous speech to be analyzed to be merged, obtain preliminary recognition result;From whole continuous speech angle to be analyzed, the emotional category emotion Context Reasoning rule of each statement is adjusted, obtains the emotional category sequence of continuous speech to be analyzed.The present invention uses the emotion reasoning algorithm of emotion context, by affective interaction matrix, the affective state of emotion statement to be analyzed is analyzed and is adjusted, thus improves the accuracy rate of continuous speech emotion recognition.

Description

Speech emotional inference method based on emotion context and system
Technical field
The present invention relates to Speech processing, sentiment analysis and mode identification technology, particularly relate to a kind of upper and lower based on emotion The speech emotional inference method of literary composition and system.
Background technology
Promotion intellectuality, the development of the novel human-machine interaction technology of hommization and application are had by the development of speech emotion recognition technology Important effect.The affective state of speaker is ground by each field in recent years how to use computer technology automatically to identify from voice The extensive concern of the person of studying carefully.In speech emotion recognition research field, researchers start gradually to pay close attention to contextual information to improving emotion The impact of recognition accuracy.So-called context refers to express relevant object and object to be analyzed self to object emotion to be analyzed Personal information (including: sex, age, culture, language, schooling, talk background etc.) and when nearest one section Between the information such as affective state.
Prior art one analyzes sex, subject matter, speaker, the linguistic context contextual informations such as the content effect to emotion recognition of speaking, But it is analyzed mainly for isolated, non-natural simple sentence, yet the emotional speech of continuous expression under natural environment is not retouched State and process.Prior art two begins to focus on contextual information entrained between word and surrounding, it is proposed that context environmental, Dynamic environment and sentence global context 3 class totally 5 kinds of environmental characteristics, and pass through experimental demonstration contextual information to improving emotion recognition The contribution of accuracy rate, but the scheme that this article is proposed needs to build a large amount of and abundant emotion lexicon, and require in feelings Perception not before must identify the content of speaking of speaker, content aware accuracy rate of speaking can affect the accuracy rate of emotion recognition, And the identification of content of speaking adds the time complexity of emotion recognition.Prior art three always according to voice acoustic features and without Identify the content of speaking of speaker, analyze influencing each other of the person-to-person affective state of two talked with, drawn dialogue The transference matrix of both sides.
But, in prior art, the emotion recognition of continuous speech is just for each current sentence analysis, in order to solve prior art Defect, therefore, the present invention provides a kind of speech emotional inference method based on emotion context and system, mainly utilizes mankind's feelings Sense is expressed and change is a continuous print process, between affective state and the affective state i.e. will expressed that object to be analyzed is current There is the feature of certain association, the continuous speech for single speaker carries out emotion recognition, has invented emotion contextual feature Extracting method and speech emotional inference method based on emotion context, the present invention solves without identifying that speaker speaks content Under conditions of, the problem improving continuous speech emotion recognition rate.
Summary of the invention
The present invention is directed to the defect that in background technology, the emotion recognition of continuous speech is analyzed just for each current sentence, it is provided that one Plant speech emotional inference method based on emotion context and system, the extracting method of invention speech emotional contextual feature and foundation Efficient speech emotional inference pattern based on emotion context, constitutes complete speech emotional reasoning side based on emotion context Method.The final accuracy rate improving continuous speech emotion recognition.
To achieve these goals, the technical scheme that the embodiment of the present invention provides is as follows:
A kind of speech emotional inference method based on emotion context, described method includes:
S1, in adjacent emotion statement, extract context speech emotional feature and traditional voice affective characteristics, by feature classification not With setting up context model and conventional model respectively;
S2, continuous speech to be analyzed is divided into the emotion statement sequence that emotion is relatively independent, and extracts the upper of described emotion statement Hereafter speech emotional feature and traditional voice affective characteristics. then it is respectively adopted context model and conventional model is identified, Go out the two model and band is analyzed the decision vector of emotion statement;
Continuous speech to be analyzed is worked as by S3, employing fusion method based on affective interaction matrix by context model and conventional model The result of decision of front emotion statement merges, and obtains preliminary recognition result;
S4, from whole continuous speech angle to be analyzed, the emotional category emotion Context Reasoning rule of each statement is adjusted Whole, obtain the emotional category sequence of continuous speech to be analyzed.
As a further improvement on the present invention, described step S3 includes:
When utilizing conventional model and context model that maximum two class of emotion statement decision vector to be analyzed is merged, introduce The existing affective interaction matrix counted, market of going forward side by side sense Interactive matrix processes, obtains emotion context Interactive matrix, context Interactive matrix carries out fusion reasoning together with two decision vectors to the emotional category of emotion statement.
As a further improvement on the present invention, described step S4 includes:
Emotion Context Reasoning rule utilizes the emotional expression of people to have successional feature, according to the emotion class of the most adjacent statement The other emotional category to current emotion statement is adjusted.
As a further improvement on the present invention, before and after the adjacent emotion statement in described step S1 is, adjacent emotion statement is previous Rear 1/3 sound section of part of sentence and the whole statement of a rear statement.
As a further improvement on the present invention, described context speech emotional feature includes: the dynamic affective characteristics of context, up and down Literary composition difference affective characteristics, the dynamic affective characteristics in context edge and context edge difference affective characteristics.
As a further improvement on the present invention, the dynamic affective characteristics of described context be adjacent emotion statement previous sentence rear 1/3 In whole sound section of sound section of part and latter one in 101 dimension traditional voice affective characteristicses with rate of change, mean change and association side The speech emotional behavioral characteristics of 33 dimensions of difference correlation.
As a further improvement on the present invention, described context difference affective characteristics is first after previous sentence to adjacent emotion statement Whole sound section of 1/3 sound section and latter one extracts 101 traditional dimension speech emotional features respectively, the most again the two is done difference The feature obtained after dividing operation.
As a further improvement on the present invention, the dynamic affective characteristics in described context edge is the previous sentence from adjacent emotion statement Rear 1/3 sound section of part is moved with the 33 dimension speech emotionals extracted in the adjacent sentence in edge of 1/3 sound section of part composition before latter State feature.
As a further improvement on the present invention, described hereafter edge difference affective characteristics is by context difference feelings in the adjacent sentence in edge The feature that sense feature extracting method extracts.
Correspondingly, a kind of speech emotional inference system based on emotion context, described system includes:
Training unit, for extracting context speech emotional feature and traditional voice affective characteristics, by spy in adjacent emotion statement The difference levying classification sets up context model and conventional model respectively;
Recognition unit, for continuous speech to be analyzed is divided into the emotion statement sequence that emotion is relatively independent, extracts described respectively The context speech emotional feature of statement and traditional voice affective characteristics, be then respectively adopted the context model and tradition trained Model carries out emotion recognition to current statement, draws current statement decision vector on two models;
Fusion recognition unit, for the decision-making to the current emotion statement of continuous speech to be analyzed by context model and conventional model Result merges, and obtains preliminary recognition result;
Adjustment unit, for advising the emotional category emotion Context Reasoning of each statement from whole continuous speech angle to be analyzed Then it is adjusted, obtains the emotional category sequence of continuous speech to be analyzed.
The method have the advantages that
1, between continuous emotion statement, successfully extract context speech emotional feature, and carry from single emotion statement by its auxiliary The traditional voice affective characteristics taken, thus improve the emotion recognition efficiency of continuous speech;
2, the affective interaction matrix of existing statistics is utilized dexterously, by emotion language to be identified based on context speech emotional feature The affective state of sentence and emotion statement volume affective state to be identified based on traditional voice affective characteristics carry out emotion reasoning fusion, To the preliminary emotion recognition result to emotion statement to be identified;
3, the emotion change utilizing continuous emotion statement has the feature of stability, has formulated emotion Context Reasoning rule to whole Individual continuous identification voice carries out context-sensitive adjustment.
Accompanying drawing explanation
Fig. 1 is speech emotional inference method frame diagram based on emotion context in an embodiment of the present invention;
Fig. 2 is emotion reasoning algorithm flow chart based on emotion context in an embodiment of the present invention.
Detailed description of the invention
Describe the present invention below with reference to each embodiment shown in the drawings.But these embodiments are not limiting as this Invention, structure, method or conversion functionally that those of ordinary skill in the art is made according to these embodiments are all wrapped Containing within the scope of the present invention.
The invention discloses a kind of speech emotional inference method based on emotion context, including:
S1, in adjacent emotion statement, extract context speech emotional feature and traditional voice affective characteristics, by feature classification not With setting up context model and conventional model respectively;
S2, continuous speech to be analyzed is divided into the emotion statement sequence that emotion is relatively independent, and extracts the upper of described emotion statement Hereafter speech emotional feature and traditional voice affective characteristics. then it is respectively adopted context model and conventional model is identified, Go out the two model and band is analyzed the decision vector of emotion statement;
Continuous speech to be analyzed is worked as by S3, employing fusion method based on affective interaction matrix by context model and conventional model The result of decision of front emotion statement merges, and obtains preliminary recognition result;
S4, from whole continuous speech angle to be analyzed, the emotional category emotion Context Reasoning rule of each statement is adjusted Whole, obtain the emotional category sequence of continuous speech to be analyzed.
Specifically include:
Step 1: train speech emotion recognition model based on traditional voice affective characteristics.
Step 1.1: the emotional speech signal in training storehouse is carried out pretreatment, including preemphasis, windowing process, framing, end Point detection.
Step 1.2: the emotion statement in training set is extracted conventional traditional voice affective characteristics 101 and ties up, including Mel-cepstrum The acoustics of the voices such as coefficient, fundamental frequency, duration, intensity, amplitude, tonequality and formant and prosodic features.
The step 1.3 feature to extracting uses the character pair of neutral statement to be normalized, and then uses SFFS (Sequential Forward Floating Search) method carries out feature selection, after feature selection, remaining 56 traditional voice affective characteristicses.
Step 1.4: use 56 dimension traditional voice affective characteristics training SVM classifier of emotion statement in training set, obtain base Speech emotion recognition model in traditional voice affective characteristics.
Step 2: train speech emotion recognition model based on context speech emotional feature.
Step 2.1: the emotion statement in the pretreated training set of step 1.1 is extracted context speech emotional feature, Including: the dynamic affective characteristics of context, context difference affective characteristics, the dynamic affective characteristics in context edge, context edge Difference affective characteristics totally 268 dimension.
Step 2.2: the context speech emotional feature extracting step 2.1 uses the character pair of neutral statement to be normalized, Then SFFS (Sequential Forward Floating Search) method is used to carry out feature selection, remaining after feature selection 91 context speech emotional features.
Step 2.3: use the 91 dimension context speech emotional features training SVM (Support that emotion statement in training set is extracted Vector Machine support vector machine) grader, obtain speech emotion recognition model based on context speech emotional feature.
Step 3: identify the affective state of emotion statement to be identified
Step 3.1: continuous emotional speech signal to be identified is carried out pretreatment, including preemphasis, windowing process, automatic segmentation, Framing and end-point detection.
Step 3.2: extract the 56 dimension traditional voice affective characteristicses selected through step 1.2 of emotional speech signal to be identified.
Step 3.3: the speech emotion recognition model based on traditional voice affective characteristics that input step 1.4 training obtains is known Not, spy to recognition result be expressed as TP.
Step 3.4: extract the 91 dimension context speech emotional features selected through step 2.2 of emotional speech signal to be identified.
Step 3.5: the speech emotion recognition model based on context speech emotional feature that input step 2.3 training obtains is carried out Identify, spy to recognition result be expressed as CP.
Step 4: according to the recognition result TP of speech emotion recognition model based on traditional voice affective characteristics with based on context The recognition result CP of the speech emotion recognition model of speech emotional feature, uses blending algorithm to merge the recognition result of two models, Tentatively obtain the confidence level of the emotional category belonging to voice signal to be identified and this result.
Step 5: use rule of inference based on emotion context, according to statement before and after emotion statement to be analyzed in continuous speech Affective state, the affective state being embodied emotion statement to be analyzed is adjusted, obtain emotion statement to be analyzed finally belonging to Affective state.
Correspondingly, the invention also discloses a kind of speech emotional inference system based on emotion context, including:
Training unit, for extracting context speech emotional feature and traditional voice affective characteristics, by spy in adjacent emotion statement The difference levying classification sets up context model and conventional model respectively;
Recognition unit, for continuous speech to be analyzed is divided into the emotion statement sequence that emotion is relatively independent, extracts these respectively The context speech emotional feature of statement and traditional voice affective characteristics, be then respectively adopted the context model and tradition trained Model carries out emotion recognition to current statement, draws current statement decision vector on two models.
Fusion recognition unit, for the decision-making to the current emotion statement of continuous speech to be analyzed by context model and conventional model Result merges, and obtains preliminary recognition result;Adjustment unit, is used for from whole continuous speech angle to be analyzed each language The emotional category emotion Context Reasoning rule of sentence is adjusted, and obtains the emotional category sequence of continuous speech to be analyzed.
With detailed description of the invention, the present invention is further elaborated below in conjunction with the accompanying drawings:
As it is shown in figure 1, be emotion inference system block diagram based on emotion context in the embodiment of the invention, mainly divide For four-stage: training stage, cognitive phase, fusion recognition stage and emotion based on emotion Context Reasoning rule adjust rank Section.
1, the training stage
Training stage sets up speech emotion recognition model based on traditional voice affective characteristics and based on context speech emotional feature Speech emotion recognition model, be divided into three steps:
(1) emotional speech Signal Pretreatment.
This step is to use traditional speech signal pre-processing method that emotional speech signal is carried out pretreatment, including preemphasis, adds Window process, framing, end-point detection.
(2) extraction of traditional voice affective characteristics and speech emotion recognition model training based on traditional voice affective characteristics.
(2-1) extraction of current emotion statement included MFCC cepstrum, fundamental frequency, duration, intensity, amplitude, tonequality and be total to Shake the acoustics of the voices such as peak and prosodic features, and extracts the maximum of these features, minima and change on emotion statement respectively The statistical natures such as change scope.The extracting method of these features is not belonging to the part of the present invention, does not the most do narration in detail.Extract Specific features is shown in Table 1.
The description of table 1 traditional voice affective characteristics
(2-2) use the feature of neutral emotion that the extracted feature of (2-1) step is normalized, then use SFFS to 101 Dimension traditional voice affective characteristics carries out feature selection, remaining 56 dimensions after feature selection.
(2-3) use 56 dimension traditional voice affective characteristicses after selecting to speech emotion recognition based on traditional voice affective characteristics Model is trained, and the identification model in present embodiment uses SVM.
(3) extraction of context speech emotional feature and speech emotion recognition model training based on context speech emotional feature.
(3-1) context speech emotional feature, including the dynamic affective characteristics of context, context difference affective characteristics are extracted, on The hereafter dynamic affective characteristics in edge, context edge difference affective characteristics.
(3-1-1) extraction of the dynamic affective characteristics of context: two emotion statements of continuous print are extracted short-time energies, zero-crossing rate, Mel cepstrum coefficients (front 12 coefficients), fundamental frequency, tonequality, mute rate, the maximum of first three formant coefficient, The statistical nature such as minima and excursion totally 33 dimensional feature.Specific features is shown in Table 2.
The description of table 2 speech emotional behavioral characteristics
(3-1-2) extraction of context difference affective characteristics: two emotion statements of continuous print are extracted every statement in short-term respectively Energy, zero-crossing rate, mel cepstrum coefficients (front 12 coefficients), fundamental frequency, tonequality, mute rate, first three formant Statistical nature totally 101 dimensional features such as the features such as coefficient and the maximum of these affective characteristicses, minima and excursion.So After deduct the corresponding affective characteristics of previous sentence statement with the affective characteristics of a rear emotion statement again to obtain 101 dimension contexts poor Divide affective characteristics.
(3-1-3) extraction of the dynamic affective characteristics in context edge: extract from continuous two emotion statements rear the 1/3 of last sentence Sound section starts before rear a word the short-time energy of one section of emotion statement of 1/3 sound section of cut-off, zero-crossing rate, mel cepstrum Coefficient (front 12 coefficients), fundamental frequency, tonequality, mute rate, the maximum of first three formant coefficient, minima with And statistical nature totally 33 dimensional feature such as excursion.
(3-1-4) extraction of context edge difference affective characteristics: extract in continuous two emotion statements after last sentence respectively 1/3 sound section and the short-time energy of front 1/3 sound section of two fragment of rear a word, zero-crossing rate, mel cepstrum coefficients (front 12 Individual coefficient), fundamental frequency, tonequality, mute rate, the feature such as first three formant coefficient and the maximum of these features, The statistical nature such as minima and excursion totally 101 dimensional feature.Again with the 101 dimension affective characteristicses of first 1/3 section of rear a word The corresponding affective characteristics of deduct last sentence latter 1/3 section obtains 101 dimension context edge difference affective characteristicses.
(3-2) use the feature of neutral emotion that (3-1-1), (3-1-2), (3-1-3) and (3-1-4) extracted feature of step is carried out Normalization, then uses SFFS that 268 dimension context speech emotional features are carried out feature selection, remaining after feature selection 91 dimensions.
(3-3) use 91 dimension context speech emotional features after selecting to speech emotional based on context speech emotional feature Identification model is trained, and identification model here uses SVM.
2, cognitive phase
Cognitive phase is by emotion statement to be identified after characteristic of correspondence is extracted, the feature input first stage that will be extracted The model trained, calculates this emotion statement affective state recognition result on each model, and point three steps are implemented.
(1) use segmentation method based on energy envelope and dwell interval that continuous speech is carried out segmentation continuous emotional speech signal.
(2) the emotional speech signal after segmentation being carried out pretreatment, the method used is with (1st) step of training stage.
(3) extraction of traditional voice feature and speech emotion recognition based on traditional voice affective characteristics in emotion statement to be identified.
(3-1) after feature selection 56 dimension traditional voice affective characteristicses of current statement in emotion statement to be identified, institute are extracted The method used is with (2-1) step of training stage.
(3-2) affective state of current statement in emotion statement to be identified is identified.
Traditional voice affective characteristics input first stage (2-3) of current statement in the statement to be identified that this stage (3-1) step is extracted The speech emotion recognition model based on traditional voice affective characteristics that step has trained, calculates this emotion statement institute to be identified The affective state embodied.
(4) extract the context speech emotional feature of current statement in emotion statement to be identified and use extracted context voice The context speech emotion recognition Model Identification current statement that affective characteristics identification trains through training stage (3-3) step is comprised Affective state.
(3-1) extraction of the context speech emotional feature of emotion statement to be identified, the extracting method used is with the training stage (3-1) step, this step only extracts remaining 91 dimension context speech emotional features after feature selection.
(3-2) the 91 dimension context speech emotional feature input first stage of the statement to be identified that this stage (3-1) step is extracted (3-3) the speech emotion recognition model based on context speech emotional feature that step has trained, draws this emotion statement to be identified The affective state comprised.
3, the fusion recognition stage
According to step in cognitive phase (3-2) draw emotion statement to be identified based on traditional voice affective characteristics affective state and The affective state of the emotion statement based on context speech emotional feature to be identified that step (3-2) draws, according to following fusion method Two recognition results identifying model are merged, tentatively draws the final affective state that emotion statement to be identified is comprised.
Fusion method:
If test sample integrates as Test={ts1, ts2..., tsn, the emotion statement list that currently need to identify is shown as tsj, model is treated Identify sample tsjFinal identify that emotional category is expressed as PreLabel (tsj).If E={e1, e2..., emFor there being m class emotion Target emotional category collection, matrix IM represents context Interactive matrix, as shown in formula (1):
IM = IP ( 1 , 1 ) , · · · IP ( 1 , j ) , · · · IP ( 1 , m ) IP ( i , 1 ) , · · · IP ( i , j ) , · · · IP ( i , m ) IP ( m , 1 ) , · · · IP ( m , j ) , · · · IP ( m , m ) - - - ( 1 )
Wherein, vector IMi=(IP(i, 1)..., IP(i, j)..., IP(i, m))The most vectorial for emotion context, represent front The emotional category of one emotion statement is eiTime, current emotion statement belongs to the probability of all kinds of affective state.And element cI, jRepresent and work as The emotional category of previous sentence emotion statement is eiTime, current emotion statement belongs to emotional category ejProbability.Wait to know if TP represents Sorrow of separation sense statement through based on tradition affective characteristics identifications Model Identification after output press the descending sequence of probit probability to Amount, is expressed as TP=(tp1, tp2, tp3..., tpi..., tpm), wherein, tpiRepresent that emotion statement to be identified passes through Draw after identification Model Identification based on traditional voice affective characteristics belongs to emotion eiProbability.CP represents emotion language to be identified Sentence output after identification Model Identification based on context affective characteristics, by the probability vector of the descending sequence of probit, represents For CP=(cp1, cp2, cp3..., cpi..., cpm), wherein, cpiRepresent that emotion statement to be identified is through based on upper Hereafter draw after the identification Model Identification of speech emotional feature belongs to emotion eiProbability.If reliability rating TrustLevel (tsj)= {A1, A2, A3Representing the reliability rating to each the reasoning results, its reliability rating is divided into A1、A2And A3Three Estate, letter Appointing degree is A1>A2>A3.Fusion method is divided into 2 steps to implement.
(1) data prepare.
This part is that the data fusion that will carry out does data preparation, and data, in addition to probability vector TP and CP, also need basis The emotion statement ts that currently need to identifyjPrevious emotion statement tsj-1Emotional category from emotion context Interactive matrix select correspondence Emotion context vector IM alternatelyi.Wherein, emotion context Interactive matrix is that root counts in dialogue from Chinese Typical Representative drama Refining in the emotion change Interactive matrix of two people's dialogues under scene, Interactive matrix is as shown in table 3.This affective interaction matrix Add up 4000 sections of dialogues of up to a hundred hours, covered man and woman is talked with by man, man by female, female.
Table 3 Interactive matrix
In table 3, marked the affective interaction rule of two people's dialogues of A and B.This table is divided into left and right two parts, represents In the case of the emotional category of one people determines, there is distribution and the probability size of above-mentioned 6 kinds of emotions in another person.The wherein left side Part represents when the affective state of given B, the emotion probability distribution information of A;In like manner, right-hand component represents and is determining A's During affective state, the emotion distribution that B is likely to occur.The emotion caused due to the individual variation of people to eliminate left and right both sides is divided The difference of cloth, is averaging the distribution probability of emotion pair in two tables, obtains the Interactive matrix shown in table 4, be defined as context Interactive matrix.In context Interactive matrix shown in the computational methods of probability such as formula (2), in formula, AIP(I, j)Represent when A is emotion eiTime, B is emotion ejProbability;BIP(I, j)Represent when B is emotion eiTime, A is emotion ejProbability: IP(I, j)Represent current One is emotion eiTime, latter one is emotion ejProbability.
Table 4 context Interactive matrix
IP ( i , j ) = AIP ( i , j ) + BIP ( i , j ) 2 - - - ( 2 )
(2) with emotion reasoning algorithm based on emotion context, ready for upper step data being carried out emotion reasoning fusion, algorithm is retouched State as follows:
Input: emotion statement ts to be identifiedj
Probability output vector T P based on tradition affective characteristics identification model;
Probability output vector CP based on context affective characteristics identification model;
Affective interaction vector IMi
Output: emotion statement tsjEmotional category PreLabel (ts belonging to interimj)。
The concrete reasoning process of algorithm is described as follows:
(1) if tp1With cp1Emotional category identical, the emotional category PreLabel (ts of sample the most to be testedj) it is labeled as tp1(cp1) Classification, reliability rating is A3, algorithm terminates;
(2) otherwise, tp1With cp1Emotional category different, now come with maximum classification together by the classification that probability is second largest Decision-making.
If (2-1) tp2With cp2Emotional category identical, be divided into following 3 species:
If (2-1-1) IP(i, 1)With tp2(cp2) emotional category identical, the emotional category PreLabel (ts of sample the most to be testedj) labelling For IP(i, 1)Classification, reliability rating is A2, algorithm terminates;
If (2-2-2) IP(i, 1)With tp1(or cp1) emotional category identical, the emotional category PreLabel (ts of sample the most to be testedj) It is labeled as tp1(or cp1) classification, reliability rating is A2, algorithm terminates;
(2-2-3) otherwise, element tp is calculated1And cp1Confidence level, then the emotional category that confidence level the greater is corresponding is labeled as treating Emotional category PreLabel (the ts of test samplej), reliability rating is A1, algorithm terminates.
If (2-2) tp2With cp2Emotional category different, be divided into following 2 kinds of conditions:
If 1. tp1With cp2Emotional category identical, (or tp2With cp1Emotional category identical), the emotion of sample the most to be tested PreLabel(tsj) it is labeled as tp1(or cp1) classification, reliability rating is A2, algorithm terminates;
The most otherwise, element tp is calculated1、tp2、cp1And cp2The confidence level of four, then the emotional category that confidence level the maximum is corresponding It is labeled as the emotional category PreLabel (ts of sample to be testedj), reliability rating is A1Algorithm terminates.
Wherein, in algorithm two models probability output vector T P and in TC the confidence level of each element be designated as conf(I, j), represent Emotion statement tsiBelong to emotional category ejConfidence level, shown in computational methods such as formula (4), this confidence level is made up of two parts, one Part comes from the output vector self of model, is designated as Pconf(I, j), shown in its computational methods such as formula (3);Another part comes from Context Interactive matrix.In formula, P(I, j)Represent the emotion statement ts identifying model outputiBelong to emotional category ejProbability.
Pconf ( i , j ) = ( p ( i , j ) - 1 m - 1 Σ k = 1 k ≠ j m p ( i , k ) ) - - - ( 3 )
conf ( i , j ) = c ( i , j ) * Pconf ( i , j ) = c ( i , j ) * ( p ( i , j ) - 1 m - 1 Σ k = 1 k ≠ j m p ( i , k ) ) - - - ( 4 )
Algorithm, for the arbitrary judgment of employing maximum of probability emotional category in the past, increases time big probability emotional category and context is mutual Matrix assists it to treat test sample emotional category finally to judge.Meanwhile, the different arbitration schemes of each rule are arranged different Reliability rating, its structure is as shown in Figure 2.
4, the emotion metamorphosis stage based on emotion Context Reasoning rule
To test set Test={ts1, ts2..., tsnEmotion statement each to be identified obtain preliminary sentencing by as above algorithm reasoning Certainly result.The reliability rating that in the situation of sudden change and reasoning algorithm, different differentiation schemes are arranged seldom is there is, to test according to emotion The recognition result of each emotion statement of collection Test carries out context-sensitive adjustment according to emotion Context Reasoning rule.
The sentence that the order of emotion Context Reasoning rule currently need to adjust is tsjSentence (j=2,3 ..., n), use tsj-1And tsj+1 To tsjThe result auxiliary of sentence adjusts.Work as tsj-1And tsj+1Emotional category when differing, do not process;No person, tsjEmotion class Other point of following three situation is adjusted, wherein, and Label (tsj) represent emotion statement tsjFinal emotional category.
Rule 1 works as tsjReliability rating be A1Time, it is believed that tsjSelf recognition result is credible, does not do any correction, it may be assumed that
TrustLevel ( ts j ) = A 1 ⇒ Label ( ts j ) = PreLabel ( ts j ) ;
Rule 2 works as tsjReliability rating be A2Time, it is believed that tsjRecognition result is suspicious.According to tsj-1And tsj+1Result treatment, If tsj-1And tsj+1Reliability rating be not the most A3, i.e. the two reliability rating is A1Or A2, it is believed that tsj-1And tsj+1Result can Letter, and to tsjModified result be same emotional category;Otherwise, do not revise, it may be assumed that
Rule 3 works as tsjReliability rating be A3Time, it is believed that tsjRecognition result is insincere, and revising its emotional category is tsj-1(tsj+1) Emotional category, it may be assumed that
( TrustLevel ( ts j ) = A 3 ) ⇒ Label ( ts j ) = Label ( ts j - 1 )
This rule makes full use of the contextual information between statement, changes, from emotion, the consideration that seldom suddenlys change, the preliminary knot to reasoning algorithm Fruit adjusts again, obtains the final emotional category of each emotion statement in Test test set.
The method have the advantages that
1, between continuous emotion statement, successfully extract context speech emotional feature, and carry from single emotion statement by its auxiliary The traditional voice affective characteristics taken, thus improve the emotion recognition efficiency of continuous speech;
2, the affective interaction matrix of existing statistics is utilized dexterously, by emotion language to be identified based on context speech emotional feature The affective state of sentence and emotion statement volume affective state to be identified based on traditional voice affective characteristics carry out emotion reasoning fusion, To the preliminary emotion recognition result to emotion statement to be identified;
3, the emotion change utilizing continuous emotion statement has the feature of stability, has formulated emotion Context Reasoning rule to whole Individual continuous identification voice carries out context-sensitive adjustment.
The present invention breaks through the deficiency that existing speech-emotion recognition method is only analyzed from emotion simple sentence, from adjacent emotion statement Emotion change existence be mutually related characteristic, use the emotion reasoning algorithm of emotion context, by affective interaction matrix, right The affective state of emotion statement to be analyzed is analyzed and adjusts, thus improves the accuracy rate of continuous speech emotion recognition.
Compared with prior art, the method have the advantages that
1, between continuous emotion statement, successfully extract context speech emotional feature, and carry from single emotion statement by its auxiliary The traditional voice affective characteristics taken, thus improve the emotion recognition efficiency of continuous speech;
2, the affective interaction matrix of existing statistics is utilized dexterously, by emotion language to be identified based on context speech emotional feature The affective state of sentence and emotion statement volume affective state to be identified based on traditional voice affective characteristics carry out emotion reasoning fusion, To the preliminary emotion recognition result to emotion statement to be identified;
3, the emotion change utilizing continuous emotion statement has the feature of stability, has formulated emotion Context Reasoning rule to whole Individual continuous identification voice carries out context-sensitive adjustment.
It is to be understood that, although this specification is been described by according to embodiment, but the most each embodiment only comprises an independence Technical scheme, this narrating mode of description be only the most for clarity sake, those skilled in the art should using description as One entirety, the technical scheme in each embodiment can also through appropriately combined, formed it will be appreciated by those skilled in the art that its His embodiment.
The a series of detailed description of those listed above is only for illustrating of the feasibility embodiment of the present invention, they And be not used to limit the scope of the invention, all equivalent implementations or changes made without departing from skill of the present invention spirit all should Within being included in protection scope of the present invention.

Claims (9)

1. a speech emotional inference method based on emotion context, it is characterised in that described method includes:
S1, in adjacent emotion statement, extract context speech emotional feature and traditional voice affective characteristics, set up context model and conventional model respectively by the difference of feature classification;
S2, continuous speech to be analyzed is divided into the emotion statement sequence that emotion is relatively independent, and extract context speech emotional feature and the traditional voice affective characteristics of described emotion statement, then it is respectively adopted context model and conventional model is identified, draw the two model decision vector to emotion statement to be analyzed;
The result of decision of the current emotion statement of continuous speech to be analyzed is merged by S3, employing fusion method context model based on affective interaction matrix and conventional model, obtains preliminary recognition result;When utilizing conventional model and context model that maximum two class of emotion statement decision vector to be analyzed is merged, introduce the existing affective interaction matrix counted, market of going forward side by side sense Interactive matrix processes, obtaining emotion context Interactive matrix, context Interactive matrix carries out fusion reasoning together with two decision vectors to the emotional category of emotion statement;
S4, from whole continuous speech angle to be analyzed, the emotional category emotion Context Reasoning rule of each statement is adjusted, obtains the emotional category sequence of continuous speech to be analyzed.
Method the most according to claim 1, it is characterised in that described step S4 includes:
Emotion Context Reasoning rule utilizes the emotional expression of people to have successional feature, is adjusted the emotional category of current emotion statement according to the emotional category of the most adjacent statement.
Method the most according to claim 1, it is characterised in that the adjacent emotion statement in described step S1 is rear 1/3 sound section of part and the whole statement of a rear statement of the previous sentence of adjacent emotion statement front and back.
Method the most according to claim 3, it is characterised in that described context speech emotional feature includes: the dynamic affective characteristics of context, context difference affective characteristics, the dynamic affective characteristics in context edge and context edge difference affective characteristics.
Method the most according to claim 4, it is characterized in that, the dynamic affective characteristics of described context is the speech emotional behavioral characteristics that rear 1/3 sound section of part of the previous sentence of adjacent emotion statement ties up 33 dimensions relevant to rate of change, mean change and covariance in traditional voice affective characteristicses with in whole sound section of latter 101.
Method the most according to claim 4, it is characterized in that, described context difference affective characteristics is whole sound section of latter 1/3 sound section and latter of first previous sentence to adjacent emotion statement and extracts 101 traditional dimension speech emotional features, the feature obtained after the most again the two being done difference operation respectively.
Method the most according to claim 5, it is characterized in that, the dynamic affective characteristics in described context edge be rear 1/3 sound section of part of the previous sentence from adjacent emotion statement with latter one before 1/3 sound section part composition the adjacent sentence in edge extract 33 dimension speech emotional behavioral characteristics.
Method the most according to claim 7, it is characterised in that described context edge difference affective characteristics is the feature extracted by context difference emotional characteristic extraction method in the adjacent sentence in edge.
9. a speech emotional inference system based on emotion context, it is characterised in that described system includes:
Training unit, for extracting context speech emotional feature and traditional voice affective characteristics in adjacent emotion statement, sets up context model and conventional model respectively by the difference of feature classification;
Recognition unit, for continuous speech to be analyzed being divided into the emotion statement sequence that emotion is relatively independent, extract context speech emotional feature and the traditional voice affective characteristics of described statement respectively, then it is respectively adopted the context model trained and conventional model and current statement is carried out emotion recognition, draw current statement decision vector on two models;
Fusion recognition unit, for context model and conventional model being merged the result of decision of the current emotion statement of continuous speech to be analyzed, obtains preliminary recognition result;When utilizing conventional model and context model that maximum two class of emotion statement decision vector to be analyzed is merged, introduce the existing affective interaction matrix counted, market of going forward side by side sense Interactive matrix processes, obtaining emotion context Interactive matrix, context Interactive matrix carries out fusion reasoning together with two decision vectors to the emotional category of emotion statement;
Adjustment unit, for being adjusted the emotional category emotion Context Reasoning rule of each statement from whole continuous speech angle to be analyzed, obtains the emotional category sequence of continuous speech to be analyzed.
CN201310401319.0A 2013-09-05 2013-09-05 Speech emotional inference method based on emotion context and system Active CN103810994B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310401319.0A CN103810994B (en) 2013-09-05 2013-09-05 Speech emotional inference method based on emotion context and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310401319.0A CN103810994B (en) 2013-09-05 2013-09-05 Speech emotional inference method based on emotion context and system

Publications (2)

Publication Number Publication Date
CN103810994A CN103810994A (en) 2014-05-21
CN103810994B true CN103810994B (en) 2016-09-14

Family

ID=50707674

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310401319.0A Active CN103810994B (en) 2013-09-05 2013-09-05 Speech emotional inference method based on emotion context and system

Country Status (1)

Country Link
CN (1) CN103810994B (en)

Families Citing this family (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107305773B (en) * 2016-04-15 2021-02-09 美特科技(苏州)有限公司 Voice emotion recognition method
CN106598948B (en) * 2016-12-19 2019-05-03 杭州语忆科技有限公司 Emotion identification method based on shot and long term Memory Neural Networks combination autocoder
CN106991172B (en) * 2017-04-05 2020-04-28 安徽建筑大学 Method for establishing multi-mode emotion interaction database
CN108346436B (en) 2017-08-22 2020-06-23 腾讯科技(深圳)有限公司 Voice emotion detection method and device, computer equipment and storage medium
CN108039181B (en) * 2017-11-02 2021-02-12 北京捷通华声科技股份有限公司 Method and device for analyzing emotion information of sound signal
CN108664469B (en) * 2018-05-07 2021-11-19 首都师范大学 Emotion category determination method and device and server
CN109256150B (en) * 2018-10-12 2021-11-30 北京创景咨询有限公司 Speech emotion recognition system and method based on machine learning
CN110047517A (en) * 2019-04-24 2019-07-23 京东方科技集团股份有限公司 Speech-emotion recognition method, answering method and computer equipment
CN112418254A (en) * 2019-08-20 2021-02-26 北京易真学思教育科技有限公司 Emotion recognition method, device, equipment and storage medium
CN111274807B (en) * 2020-02-03 2022-05-10 华为技术有限公司 Text information processing method and device, computer equipment and readable storage medium
CN111754979A (en) * 2020-07-21 2020-10-09 南京智金科技创新服务中心 Intelligent voice recognition method and device
CN113689886B (en) * 2021-07-13 2023-05-30 北京工业大学 Voice data emotion detection method and device, electronic equipment and storage medium
CN113889150B (en) * 2021-10-15 2023-08-29 北京工业大学 Speech emotion recognition method and device
WO2024010485A1 (en) * 2022-07-07 2024-01-11 Nvidia Corporation Inferring emotion from speech in audio data using deep learning

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102881284A (en) * 2012-09-03 2013-01-16 江苏大学 Unspecific human voice and emotion recognition method and system

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102881284A (en) * 2012-09-03 2013-01-16 江苏大学 Unspecific human voice and emotion recognition method and system

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
An Enhanced Speech Emotion Recognition System Based on Discourse Information;Chun Chen etc;《Lecture Notes in Computer Science》;20061231;第3991卷;第452页 *
基于声学上下文的语音情感特征提取与分析;白李娟 等;《小型微型计算机系统》;20130630;第34卷(第6期);第1452-1455页 *
基于特征空间分解与融合的语音情感识别;黄程韦 等;《信号处理》;20100630;第26卷(第6期);第835-842页 *
白李娟等.基于声学上下文的语音情感特征提取与分析.《小型微型计算机系统》.2013,第34卷(第6期),1451-1456. *

Also Published As

Publication number Publication date
CN103810994A (en) 2014-05-21

Similar Documents

Publication Publication Date Title
CN103810994B (en) Speech emotional inference method based on emotion context and system
CN108597541B (en) Speech emotion recognition method and system for enhancing anger and happiness recognition
Xie et al. Speech emotion classification using attention-based LSTM
CN108564942B (en) Voice emotion recognition method and system based on adjustable sensitivity
CN109599129B (en) Voice depression recognition system based on attention mechanism and convolutional neural network
CN107993665B (en) Method for determining role of speaker in multi-person conversation scene, intelligent conference method and system
CN102723078B (en) Emotion speech recognition method based on natural language comprehension
CN106503805A (en) A kind of bimodal based on machine learning everybody talk with sentiment analysis system and method
CN103345922B (en) A kind of large-length voice full-automatic segmentation method
CN103258532B (en) A kind of Chinese speech sensibility recognition methods based on fuzzy support vector machine
CN107220235A (en) Speech recognition error correction method, device and storage medium based on artificial intelligence
CN107393554A (en) In a kind of sound scene classification merge class between standard deviation feature extracting method
CN104008754B (en) Speech emotion recognition method based on semi-supervised feature selection
Semwal et al. Automatic speech emotion detection system using multi-domain acoustic feature selection and classification models
CN106875943A (en) A kind of speech recognition system for big data analysis
CN107221344A (en) A kind of speech emotional moving method
CN105931635A (en) Audio segmentation method and device
Szep et al. Paralinguistic Classification of Mask Wearing by Image Classifiers and Fusion.
CN106297769B (en) A kind of distinctive feature extracting method applied to languages identification
CN115147521A (en) Method for generating character expression animation based on artificial intelligence semantic analysis
Adiba et al. Towards immediate backchannel generation using attention-based early prediction model
CN105070300A (en) Voice emotion characteristic selection method based on speaker standardization change
Gupta On building spoken language understanding systems for low resourced languages
Shu et al. Time-frequency performance study on urban sound classification with convolutional neural network
Lingampeta et al. Human emotion recognition using acoustic features with optimized feature selection and fusion techniques

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant