CN107545905A

CN107545905A - Emotion identification method based on sound property

Info

Publication number: CN107545905A
Application number: CN201710720391.8A
Authority: CN
Inventors: 王超
Original assignee: Beijing He Guang Artificial Intelligent Robot Technology Co Ltd
Current assignee: Beijing He Guang Artificial Intelligent Robot Technology Co Ltd
Priority date: 2017-08-21
Filing date: 2017-08-21
Publication date: 2018-01-05
Anticipated expiration: 2037-08-21
Also published as: CN107545905B

Abstract

The invention provides the Emotion identification method based on sound property, including：Sound recording module reads voice signal；Sound pretreatment module identifies the language belonging to read voice signal, and the voice signal read is carried out into subordinate sentence by sentence, obtains the pretreated voice signal of language tag；Acoustic processing module is calculated, extraction speech characteristic parameter according to the language tag of pretreated voice signal by default method；Language tag and the speech characteristic parameter that is extracted of the mood processing module according to pretreated voice signal, obtain the Emotion identification result of every a word, the Emotion identification result describes in a probabilistic manner；Mood post-processing module obtains the Emotion identification result of every a word of voice signal, and Emotion identification result is adjusted according to default mode, obtains the Emotion identification result of the voice signal.Method provided by the invention, it is possible to increase the accuracy rate of Emotion identification.

Description

Emotion identification method based on sound property

Technical field

The present invention relates to Emotion identification field, more particularly to a kind of Emotion identification method based on sound property.

Background technology

With the development of science and technology natural language processing, especially language identification have had been applied in increasing industry, Such as mobile phone speech assistant, self-assisted voice service, among these services, improve the energy that the mood in language is identified Power is the important method improved service quality.

In other field, the function of Emotion identification is assigned such as on speech communication instrument, both call sides can be helped timely Understand mutual mood, promote exchange.The mood of identification learning person in remote teaching, when learner is because encountering a difficulty or can not The material of understanding and show anxiety and it is discontented when, teacher or system can adjust teaching method and progress, or give mood tune Section instructs.In language navigation field, when voice mood identification function identifies that driver is in the state of emotional instability, system It can give and remind, or automatically adjust drive parameter to prevent the generation of accident.

The existing Emotion identification method based on sound property, either using vector Splittable mahalanobis distance diagnostic method, Principle component analysis, neural net method or hidden Markov model, often only consider the content of a voice, simultaneously as The difference of different language culture, monolingual voice can only be directed to by, which also tending to, carries out Emotion identification, so as to cause voice mood The accuracy rate of identification is not high enough.

The content of the invention

To solve problem above, the invention provides a kind of Emotion identification method based on sound property, it is possible to increase feelings The accuracy rate of thread identification.

The embodiments of the invention provide a kind of Emotion identification method based on sound property, including：

Sound recording module reads voice signal；

Sound pretreatment module identifies the language belonging to read voice signal, and the voice signal read is pressed Sentence carries out subordinate sentence, obtains the pretreated voice signal of language tag；

Acoustic processing module is calculated, extraction language according to the language tag of pretreated voice signal by default method Sound characteristic parameter；

Language tag and the speech characteristic parameter that is extracted of the mood processing module according to pretreated voice signal, are obtained To the Emotion identification result of every a word, the Emotion identification result describes in a probabilistic manner；

Mood post-processing module obtains the Emotion identification result of every a word of voice signal, according to default mode to feelings Thread recognition result is adjusted, and obtains the Emotion identification result of the voice signal.

Preferably, the phonetic feature, in addition to：

Prosodic features, including rising tune, falling tone, accent and stress.

Preferably, the Emotion identification result for obtaining every a word, it is by using principle component analysis or mixed Gaussian Modelling or hidden Markov model are calculated.

Preferably, the Emotion identification result for obtaining every a word, to obtain by the following method：

Every a word and the distance of every kind of mood is calculated using vector Splittable mahalanobis distance diagnostic method；

According to default method, distance values are converted into probability numbers, made：Apart from smaller, probability it is bigger and, it is all general Rate sum is 1；

Emotion identification result using the probability as every a word.

Preferably, it is described that Emotion identification result is adjusted according to default mode, including：

The combined chance after some adjustment is carried out to Emotion identification result is calculated according to the first formula, chooses combined chance most High Adjusted Option is adjusted to Emotion identification result, and first formula is：

P=K (θ) α^n-i(1-α)ⁱ

Wherein, K (θ) is the probable value corresponding to the quantity for the mood that voice signal includes, and is obtained by sample by statistics, For the preset function of monotone decreasing, θ is the quantity for the mood that voice signal includes, α be to the Emotion identification of every a word just True rate, the sentence quantity that n is included by voice signal, i are the quantity of the sentence of adjustment Emotion identification result.

Preferably, the Emotion identification method based on sound property, in addition to：

Sound turns character module and reads voice signal and be converted into text information；

Word Emotion identification module is segmented the text information changed out, and is examined in mood word database Rope, wherein the mood word database purchase has the word corresponding to different moods；

When the quantity of word that the text information is included corresponding to certain mood exceedes predetermined threshold value, and include other feelings It is this kind of mood by the Emotion identification of the voice signal when quantity of word corresponding to thread is less than predetermined threshold value.

Preferably, the sound recording module reads voice signal, including：

Sound recording module reads voice signal；

Sound recording module checks the length of voice signal, when the length of voice signal exceedes default threshold value, sound The voice signal is segmented by recording module, so that of length no more than default threshold value of each section of voice signal.

A kind of Emotion identification method based on sound property provided by the invention, it is possible to increase the accuracy rate of Emotion identification.

Other features and advantages of the present invention will be illustrated in the following description, also, partly becomes from specification Obtain it is clear that or being understood by implementing the present invention.The purpose of the present invention and other advantages can be by the explanations write Specifically noted structure is realized and obtained in book, claims and accompanying drawing.

Below by drawings and examples, technical scheme is described in further detail.

Brief description of the drawings

Accompanying drawing is used for providing a further understanding of the present invention, and a part for constitution instruction, the reality with the present invention Apply example to be used to explain the present invention together, be not construed as limiting the invention.In the accompanying drawings：

Fig. 1 is a kind of flow chart of the Emotion identification method based on sound property in the embodiment of the present invention.

Embodiment

The preferred embodiments of the present invention are illustrated below in conjunction with accompanying drawing, it will be appreciated that described herein preferred real Apply example to be merely to illustrate and explain the present invention, be not intended to limit the present invention.

The embodiments of the invention provide a kind of Emotion identification method based on sound property, as shown in figure 1, including：

Sound recording module reads voice signal；

By being identified in advance to the language belonging to voice signal, so as to using targetedly sound property to sound Mood be identified, add to voice signal carry out Emotion identification accuracy rate, while by using mood post-process mould Block, identified using the mood of one section of sound as overall, further increase the accuracy rate that Emotion identification is carried out to voice signal.

In one embodiment of the invention, the phonetic feature, in addition to：

Prosodic features, including rising tune, falling tone, accent and stress.By with reference to prosodic features, making phonetic feature more complete Face, so as to be easier to realize the Emotion identification of higher accuracy.

In one embodiment of the invention, the Emotion identification result for obtaining every a word, it is by using pivot Analytic approach or mixed Gauss model method or hidden Markov model are calculated.By using principle component analysis or mixed Gaussian mould Type method or hidden Markov model are calculated, and can directly obtain the Emotion identification result described in a probabilistic manner, convenient Identified using mood post-processing module using the mood of one section of sound as overall, mood knowledge is carried out to voice signal so as to increase Other accuracy rate.

In one embodiment of the invention, the Emotion identification result for obtaining every a word, for by the following method Obtain：

Emotion identification result using the probability as every a word.

The Emotion identification result that can be described in a probabilistic manner by this method, the post processing of convenient use mood Module identifies using the mood of one section of sound as overall, so as to increase the accuracy rate that Emotion identification is carried out to voice signal.

In one embodiment of the invention, it is described that Emotion identification result is adjusted according to default mode, including：

P=K (θ) α^n-i(1-α)ⁱ

By using mood post-processing module, the probability by one section of sound emotional change of consideration and the mood to every words The probability of mistake is identified, is identified using the mood of one section of sound as overall, further increases and mood is carried out to voice signal The accuracy rate of identification.

In one embodiment of the invention, the Emotion identification method based on sound property, in addition to：

Turn the mode of word by using sound, can be in the case where word there is obvious mood to be inclined to, it is more accurate to accomplish Emotion identification.

In one embodiment of the invention, the sound recording module reads voice signal, including：

Sound recording module reads voice signal；

By limiting the length of every section of sound, can limit using mood post-processing module using the mood of one section of sound as Integrally come the amount of calculation identified, the speed that Emotion identification is carried out to voice signal is improved.

The present invention is the flow with reference to method according to embodiments of the present invention, equipment (system) and computer program product Figure and/or block diagram describe.It should be understood that can be by every first-class in computer program instructions implementation process figure and/or block diagram Journey and/or the flow in square frame and flow chart and/or block diagram and/or the combination of square frame.These computer programs can be provided The processors of all-purpose computer, special-purpose computer, Embedded Processor or other programmable data processing devices is instructed to produce A raw machine so that produced by the instruction of computer or the computing device of other programmable data processing devices for real The device for the function of being specified in present one flow of flow chart or one square frame of multiple flows and/or block diagram or multiple square frames.

These computer program instructions, which may be alternatively stored in, can guide computer or other programmable data processing devices with spy Determine in the computer-readable memory that mode works so that the instruction being stored in the computer-readable memory, which produces, to be included referring to Make the manufacture of device, the command device realize in one flow of flow chart or multiple flows and/or one square frame of block diagram or The function of being specified in multiple square frames.

These computer program instructions can be also loaded into computer or other programmable data processing devices so that counted Series of operation steps is performed on calculation machine or other programmable devices to produce computer implemented processing, so as in computer or The instruction performed on other programmable devices is provided for realizing in one flow of flow chart or multiple flows and/or block diagram one The step of function of being specified in individual square frame or multiple square frames.

Obviously, those skilled in the art can carry out the essence of various changes and modification without departing from the present invention to the present invention God and scope.So, if these modifications and variations of the present invention belong to the scope of the claims in the present invention and its equivalent technologies Within, then the present invention is also intended to comprising including these changes and modification.

Claims

A kind of 1. Emotion identification method based on sound property, it is characterised in that including：

Sound recording module reads voice signal；

Sound pretreatment module identifies the language belonging to read voice signal, and by the voice signal read by sentence Subordinate sentence is carried out, obtains the pretreated voice signal of language tag；

Acoustic processing module is calculated, extraction voice spy according to the language tag of pretreated voice signal by default method Levy parameter；

Language tag and the speech characteristic parameter that is extracted of the mood processing module according to pretreated voice signal, are obtained every The Emotion identification result of a word, the Emotion identification result describe in a probabilistic manner；

Mood post-processing module obtains the Emotion identification result of every a word of voice signal, and mood is known according to default mode Other result is adjusted, and obtains the Emotion identification result of the voice signal.
2. the method as described in claim 1, it is characterised in that the phonetic feature, in addition to：

Prosodic features, including rising tune, falling tone, accent and stress.
3. the method as described in claim 1, it is characterised in that the Emotion identification result for obtaining every a word, to pass through It is calculated using principle component analysis or mixed Gauss model method or hidden Markov model.
4. the method as described in claim 1, it is characterised in that the Emotion identification result for obtaining every a word, to pass through Following methods obtain：

Every a word and the distance of every kind of mood is calculated using vector Splittable mahalanobis distance diagnostic method；

According to default method, distance values are converted into probability numbers, made：Apart from smaller, probability it is bigger and, all probability it With for 1；

Emotion identification result using the probability as every a word.
5. the method as described in claim 1, it is characterised in that described to be adjusted according to default mode to Emotion identification result It is whole, including：

The combined chance after some adjustment is carried out to Emotion identification result is calculated according to the first formula, chooses combined chance highest Adjusted Option is adjusted to Emotion identification result, and first formula is：

P=K (θ) α^n-i(1-α)ⁱ

Wherein, K (θ) is the probable value corresponding to the quantity for the mood that voice signal includes, and is obtained by sample by statistics, for list The preset function successively decreased is adjusted, θ is the quantity for the mood that voice signal includes, and α is the accuracy to the Emotion identification of every a word, The sentence quantity that n is included by voice signal, i are the quantity of the sentence of adjustment Emotion identification result.
6. the method as described in claim 1, it is characterised in that also include：

Sound turns character module and reads voice signal and be converted into text information；

Word Emotion identification module is segmented the text information changed out, and is retrieved in mood word database, Wherein described mood word database purchase has the word corresponding to different moods；

When the quantity of word that the text information is included corresponding to certain mood exceedes predetermined threshold value, and include other mood institutes It is this kind of mood by the Emotion identification of the voice signal when quantity of corresponding word is less than predetermined threshold value.
7. the method as described in claim 1, it is characterised in that the sound recording module reads voice signal, including：

Sound recording module reads voice signal；

Sound recording module checks the length of voice signal, when the length of voice signal exceedes default threshold value, sound typing The voice signal is segmented by module, so that of length no more than default threshold value of each section of voice signal.