CN107545905B

CN107545905B - Emotion recognition method based on sound characteristics

Info

Publication number: CN107545905B
Application number: CN201710720391.8A
Authority: CN
Inventors: 王超
Original assignee: Beijing Uai Robot Technology Co ltd
Current assignee: Beijing Uai Robot Technology Co ltd
Priority date: 2017-08-21
Filing date: 2017-08-21
Publication date: 2021-01-05
Anticipated expiration: 2037-08-21
Also published as: CN107545905A

Abstract

The invention provides a method for recognizing emotion based on sound characteristics, which comprises the following steps: the voice recording module reads a voice signal; the voice preprocessing module identifies the language to which the read voice signal belongs, and performs sentence division on the read voice signal according to sentences to obtain a preprocessed voice signal with a language mark; the voice processing module calculates and extracts voice characteristic parameters according to a preset method and the language mark of the preprocessed voice signal; the emotion processing module obtains an emotion recognition result of each sentence according to the language mark of the preprocessed sound signal and the extracted voice characteristic parameters, and the emotion recognition result is described in a probability manner; the emotion post-processing module acquires an emotion recognition result of each sentence of the voice signal, and adjusts the emotion recognition result according to a preset mode to obtain the emotion recognition result of the voice signal. The method provided by the invention can improve the accuracy of emotion recognition.

Description

Emotion recognition method based on sound characteristics

Technical Field

The invention relates to the field of emotion recognition, in particular to an emotion recognition method based on sound characteristics.

Background

With the advancement of science and technology, natural language processing, especially language recognition, has been applied in more and more industries, such as mobile phone voice assistant, self-help voice service, etc., among which, improving the ability to recognize emotion in language is an important method for improving service quality.

In other fields, such as endowing emotion recognition function on a voice communication tool, the two parties in communication can be helped to know the emotion of each other in time and promote communication. In distance teaching, the learner's mood is identified, and when the learner exhibits anxiety and dissatisfaction due to encountering problems or unintelligible material, the teacher or system may adjust the way and progress of the teaching, or give mood-adjusting guidance. In the field of language navigation, when the voice emotion recognition function recognizes that the driver is in an emotional unstable state, the system can give a reminder or automatically adjust driving parameters to prevent accidents.

The existing emotion recognition method based on voice characteristics only considers the content of a sentence of voice no matter a vector segmentation Markov distance discrimination method, a principal component analysis method, a neural network method or a hidden Markov model is used, and meanwhile, due to the difference of different language cultures, emotion recognition can only be performed on the voice of a single language, so that the accuracy of voice emotion recognition is not high enough.

Disclosure of Invention

In order to solve the problems, the invention provides a method for recognizing emotion based on voice characteristics, which can improve the accuracy of emotion recognition.

The embodiment of the invention provides an emotion recognition method based on sound characteristics, which comprises the following steps:

the voice recording module reads a voice signal;

the voice preprocessing module identifies the language to which the read voice signal belongs, and performs sentence division on the read voice signal according to sentences to obtain a preprocessed voice signal with a language mark;

the voice processing module calculates and extracts voice characteristic parameters according to a preset method and the language mark of the preprocessed voice signal;

the emotion processing module obtains an emotion recognition result of each sentence according to the language mark of the preprocessed sound signal and the extracted voice characteristic parameters, and the emotion recognition result is described in a probability manner;

the emotion post-processing module acquires an emotion recognition result of each sentence of the voice signal, and adjusts the emotion recognition result according to a preset mode to obtain the emotion recognition result of the voice signal.

Preferably, the voice feature further includes:

prosodic features including rising, falling, accent, and accent.

Preferably, the emotion recognition result for each sentence is calculated by using a principal component analysis method, a mixture gaussian model method or a hidden markov model.

Preferably, the emotion recognition result of each sentence is obtained by:

calculating the distance between each sentence and each emotion by using a vector segmentation type Mahalanobis distance discrimination method;

according to a preset method, the distance value is converted into a probability value, so that: the smaller the distance, the greater the probability and the sum of all probabilities is 1;

and taking the probability as the emotion recognition result of each sentence.

Preferably, the adjusting the emotion recognition result according to a preset mode includes:

calculating a comprehensive probability after adjusting the emotion recognition result according to a first formula, and selecting an adjusting scheme with the highest comprehensive probability to adjust the emotion recognition result, wherein the first formula is as follows:

P＝K(θ)α^n-i(1-α)ⁱ

the probability value corresponding to the number of emotions contained in the sound signal is obtained through sample statistics and is a monotonically decreasing preset function, the number of emotions contained in the sound signal is theta, the accuracy rate of emotion recognition for each sentence is alpha, the number of sentences contained in the sound signal is n, and the number of sentences for which emotion recognition results are adjusted is i.

Preferably, the emotion recognition method based on voice characteristics further includes:

the voice-to-text module reads the voice signal and converts the voice signal into text information;

the character emotion recognition module divides the converted character information into words and searches in an emotion word database, wherein words corresponding to different emotions are stored in the emotion word database;

and when the word information contains words corresponding to a certain emotion and the number of words corresponding to other emotions is lower than a preset threshold value, recognizing the emotion of the voice signal as the emotion.

Preferably, the voice recording module reads voice signals, and includes:

the voice recording module reads a voice signal;

the voice recording module checks the length of the voice signal, and when the length of the voice signal exceeds a preset threshold value, the voice recording module segments the voice signal so that the length of each segment of the voice signal does not exceed the preset threshold value.

The emotion recognition method based on the voice characteristics can improve the accuracy of emotion recognition.

Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.

The technical solution of the present invention is further described in detail by the accompanying drawings and embodiments.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and not to limit the invention. In the drawings:

fig. 1 is a flowchart of an emotion recognition method based on voice characteristics according to an embodiment of the present invention.

Detailed Description

The preferred embodiments of the present invention will be described in conjunction with the accompanying drawings, and it will be understood that they are described herein for the purpose of illustration and explanation and not limitation.

The embodiment of the invention provides an emotion recognition method based on sound characteristics, as shown in fig. 1, comprising the following steps:

the voice recording module reads a voice signal;

The language to which the sound signal belongs is identified in advance, so that the emotion of the sound can be identified by using the specific sound characteristics, the accuracy of emotion identification on the sound signal is increased, and meanwhile, the emotion of one section of sound is identified as a whole by using the emotion post-processing module, so that the accuracy of emotion identification on the sound signal is further increased.

In an embodiment of the present invention, the voice feature further includes:

prosodic features including rising, falling, accent, and accent. By combining with the rhythm characteristics, the voice characteristics are more comprehensive, and the emotion recognition with higher accuracy is easier to realize.

In an embodiment of the present invention, the emotion recognition result for each sentence is calculated by using a principal component analysis method, a mixed gaussian model method or a hidden markov model. By using a principal component analysis method or a mixed Gaussian model method or a hidden Markov model for calculation, a emotion recognition result described in a probability mode can be directly obtained, and an emotion post-processing module is conveniently used for recognizing the emotion of a section of voice as a whole, so that the accuracy of emotion recognition on a voice signal is increased.

In an embodiment of the present invention, the emotion recognition result for each sentence is obtained by:

and taking the probability as the emotion recognition result of each sentence.

By the method, the emotion recognition result described in a probability mode can be obtained, and the emotion of a section of voice can be recognized as a whole by using the emotion post-processing module conveniently, so that the accuracy of emotion recognition on the voice signal is increased.

In an embodiment of the present invention, the adjusting the emotion recognition result according to a preset manner includes:

P＝K(θ)α^n-i(1-α)ⁱ

By using the emotion post-processing module, the emotion of a section of voice is recognized as a whole by considering the probability of emotion change of a section of voice and the probability of emotion recognition error for each section of voice, and the accuracy of emotion recognition on voice signals is further increased.

In an embodiment of the present invention, the emotion recognition method based on voice characteristics further includes:

By using the mode of converting voice into characters, more accurate emotion recognition can be achieved under the condition that words have obvious emotion tendency.

In one embodiment of the present invention, the sound recording module reads a sound signal, including:

the voice recording module reads a voice signal;

By limiting the length of each section of sound, the calculation amount of recognizing the emotion of one section of sound as a whole by using the emotion post-processing module can be limited, and the speed of recognizing the emotion of the sound signal is increased.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims

1. A method for emotion recognition based on voice characteristics, comprising:

the voice recording module reads a voice signal;

the emotion post-processing module acquires an emotion recognition result of each sentence of the voice signal, and adjusts the emotion recognition result according to a preset mode to obtain the emotion recognition result of the voice signal;

the adjusting of the emotion recognition result according to the preset mode comprises the following steps:

P＝K(θ)α^n-i(1-α)ⁱ

2. The method of claim 1, wherein the speech feature further comprises:

prosodic features including rising, falling, accent, and accent.

3. The method of claim 1, wherein the emotion recognition result for each sentence is obtained by calculation using a principal component analysis method or a mixture gaussian model method or a hidden markov model.

4. The method of claim 1, wherein the emotion recognition result for each sentence is obtained by:

and taking the probability as the emotion recognition result of each sentence.

5. The method of claim 1, further comprising:

6. The method of claim 1, wherein the sound entry module reads a sound signal comprising:

the voice recording module reads a voice signal;