CN111145785A

CN111145785A - Emotion recognition method and device based on voice

Info

Publication number: CN111145785A
Application number: CN201811285508.5A
Authority: CN
Inventors: 张冲; 叶荣华; 刘松; 韦梁
Original assignee: Guangzhou Lingpai Technology Co Ltd
Current assignee: Guangzhou Lingpai Technology Co Ltd
Priority date: 2018-11-02
Filing date: 2018-11-02
Publication date: 2020-05-12

Abstract

The invention discloses a method and a device for emotion recognition based on voice, which comprises the following steps: 1) respectively collecting emotion voice data of characters like, anger and sadness; 2) adopting PCA algorithm to respectively perform dimensionality reduction processing on the emotional sound data of happiness, anger and sadness; 3) then, carrying out endpoint detection on the emotion voice data of the happiness, the anger and the grief after the dimensionality reduction processing to extract three characteristic parameters of a Mel-frequency cepstrum coefficient, a formant and a zero crossing rate, establishing a Gaussian mixture model for the characteristic parameters, respectively training out Gaussian mixture models of the emotion voice of the happiness, the anger and the grief, and establishing an emotion voice database consisting of the happiness-Gaussian mixture model, the anger-Gaussian mixture model and the grief-Gaussian mixture model; 4) collecting voice fragments to be recognized; the emotion recognition method based on the voice is high in recognition accuracy.

Description

Emotion recognition method and device based on voice

Technical Field

The invention relates to a method and a device for emotion recognition based on voice.

Background

The emotion is a state integrating human feelings, ideas and behaviors, and plays an important role in human-to-human communication. Emotion is a state that integrates human feelings, thoughts, and behaviors, and includes a human psychological response to external or self-stimulation, including a physiological response accompanying such a psychological response. The mood plays a ubiquitous role in the daily work and life of people. In medical care, if the emotional state of a patient, particularly a patient with an expression disorder, can be known, different care measures can be taken according to the emotion of the patient, so that the care amount is increased. In the product development process, if the emotional state of the user in the product using process can be identified and the user experience is known, the product function can be improved, and a product more suitable for the user requirement is designed. In various human-machine interaction systems, human-machine interaction becomes more friendly and natural if the system can recognize the emotional state of a human. The analysis and recognition of emotion are important interdisciplinary research subjects in fields such as neuroscience, psychology, cognitive science, computer science, artificial intelligence and the like, and therefore, methods for recognizing emotion are provided for various fields.

Disclosure of Invention

The invention aims to provide a method and a device for recognizing emotion based on voice, which have high recognition accuracy.

In order to solve the problems, the invention adopts the following technical scheme:

a speech-based emotion recognition method includes the following steps:

1) respectively collecting emotion voice data of characters like, anger and sadness;

2) adopting PCA algorithm to respectively perform dimensionality reduction processing on the emotional sound data of happiness, anger and sadness;

3) then, carrying out endpoint detection on the emotion voice data of the happiness, the anger and the grief after the dimensionality reduction processing to extract three characteristic parameters of a Mel-frequency cepstrum coefficient, a formant and a zero crossing rate, establishing a Gaussian mixture model for the characteristic parameters, respectively training out Gaussian mixture models of the emotion voice of the happiness, the anger and the grief, and establishing an emotion voice database consisting of the happiness-Gaussian mixture model, the anger-Gaussian mixture model and the grief-Gaussian mixture model;

4) collecting voice fragments to be recognized;

5) carrying out anti-aliasing filtering, analog-to-digital conversion, pre-emphasis preprocessing and endpoint detection on the collected voice fragments, extracting three characteristic parameters including a Mel-frequency cepstrum coefficient, a formant and a zero crossing rate, establishing a contrast Gaussian mixture model for the characteristic parameters, and then respectively matching with a happiness-Gaussian mixture model, an anger-Gaussian mixture model and a sadness-Gaussian mixture model in an emotion voice database;

6) and comparing the Gaussian mixture model with one of the Happy-Gaussian mixture model, the anger-Gaussian mixture model and the Fulirague-Gaussian mixture model in the emotional sound database, wherein the overlapping rate of the model is more than a set threshold value, and judging the model to have the same emotion as the model.

Preferably, the set value of the threshold is 38 to 70%.

Preferably, the emotion recognition method further includes step 7) of performing a labeling corresponding to the emotion of a happy emotion, a anger emotion, or a sadness on the determined contrasted gaussian mixture model, and updating the labeled contrasted gaussian mixture model into the emotion sound database.

Preferably, the duration of the voice segment is 2-6 s.

The invention also provides a emotion recognition device based on voice, which comprises

The voice acquisition module is used for collecting voice fragments to be recognized;

the audio processing module is used for performing dimensionality reduction processing on the collected emotion voice data of character happiness, anger and sadness, and performing anti-aliasing filtering, analog-to-digital conversion and pre-emphasis preprocessing on the collected voice segments;

the data processing module is used for carrying out endpoint detection, extracting three characteristic parameters of a Mel-frequency cepstrum coefficient, a formant and a zero crossing rate, establishing a Gaussian mixture model for the characteristic parameters, and respectively training out Gaussian mixture models of emotional sounds like happiness, anger and grief;

and the database module is used for storing the happiness-Gaussian mixture model, the anger-Gaussian mixture model and the sadness-Gaussian mixture model.

Preferably, the audio processing module comprises an analog-to-digital converter, an audio output device, an anti-aliasing filter and a pre-emphasis circuit.

Preferably, the audio processing module and the database module are both physically connected with the data processing module.

The invention has the beneficial effects that: by aiming at the requirement of voice recognition, three emotion voice standard databases are firstly established, and voice recognition reference is set. Corresponding sound files are extracted aiming at three emotions, characteristic parameters such as mel-frequency cepstrum coefficients, formants, zero crossing rates and the like are extracted, Gaussian mixture models of three groups of emotions are built, the voice to be recognized can be effectively compared separately, the models are built separately for three emotions of happiness, anger and grief, compared with the traditional modeling mode, the process is simplified, meanwhile, the mode of singly comparing is adopted, the recognition efficiency can be effectively improved, the processing is improved, and the technical problems of complex current voice emotion recognition processing process, high realization difficulty, low accuracy and low efficiency are solved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a flow chart of a speech-based emotion recognition method of the present invention.

Fig. 2 is a connection block diagram of a speech-based emotion recognition apparatus according to the present invention.

In the figure:

1. a sound collection module; 2. an audio processing module; 3. a data processing module; 4. a database module; 5. an analog-to-digital converter; 6. an audio output device; 7. an anti-aliasing filter; 8. a pre-emphasis circuit.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments.

Example 1

As shown in fig. 1, a speech-based emotion recognition method includes the steps of:

4) collecting voice fragments to be recognized;

In the present embodiment, the set value of the threshold is 45%.

In this embodiment, the duration of the speech segment is 2 s.

As shown in fig. 2, the embodiment further provides a speech-based emotion recognition apparatus, which includes

The voice acquisition module 1 is used for collecting voice fragments to be recognized;

the audio processing module 2 is used for performing dimensionality reduction processing on the collected emotion voice data of character happiness, anger and sadness, and performing anti-aliasing filtering, analog-to-digital conversion and pre-emphasis preprocessing on the collected voice segments;

the data processing module 3 is used for performing endpoint detection, extracting three characteristic parameters of a mel-frequency cepstrum coefficient, a formant and a zero crossing rate, establishing a Gaussian mixture model for the characteristic parameters, and respectively training out Gaussian mixture models of emotional sounds like, anger and grief;

and the database module 4 is used for storing a happiness-Gaussian mixture model, an anger-Gaussian mixture model and a sadness-Gaussian mixture model.

In this embodiment, the audio processing module 2 includes an analog-to-digital converter 5, an audio output device 6, an anti-aliasing filter 7, and a pre-emphasis circuit 8.

In this embodiment, the audio processing module 2 and the database module 4 are both physically connected to the data processing module 3.

Example 2

4) collecting voice fragments to be recognized;

In the present embodiment, the set value of the threshold is 70%.

In this embodiment, the emotion recognition method further includes step 7) of labeling the determined contrasted gaussian mixture model with corresponding emotion of happiness, anger and sadness, and updating the labeled contrasted gaussian mixture model into the emotion sound database.

In this embodiment, the duration of the speech segment is 6 s.

Example 3

4) collecting voice fragments to be recognized;

In the present embodiment, the set value of the threshold is 60%.

In this embodiment, the duration of the speech segment is 5 s.

As shown in the figure, the embodiment further provides a speech-based emotion recognition device, which includes a sound collection module 1, configured to collect a speech segment to be recognized;

The invention has the beneficial effects that: by aiming at the requirement of voice recognition, three emotion voice standard databases are firstly established, and voice recognition reference is set. Corresponding sound files are extracted aiming at three emotions, characteristic parameters such as mel-frequency cepstrum coefficient, formants, zero crossing rate and the like are extracted, Gaussian mixture models of three groups of emotions are built, the voice to be recognized can be effectively and separately compared, the models are built by separately setting three emotions of happiness, anger and grief, compared with the traditional modeling mode, the process is simplified, meanwhile, the mode of separately comparing is adopted, the recognition efficiency can be effectively improved, the processing is improved, the technical problems that the current voice emotion recognition processing process is complex, the realization difficulty is high, the accuracy rate is low, and the efficiency is low are solved

The above description is only an embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that are not thought of through the inventive work should be included in the scope of the present invention.

Claims

1. A speech-based emotion recognition method is characterized by comprising the following steps:

4) collecting voice fragments to be recognized;

2. A speech based emotion recognition method as claimed in claim 1, wherein: the set value of the threshold is 38-70%.

3. A speech based emotion recognition method as claimed in claim 2, wherein: and 7) marking the compared Gaussian mixture model after judgment corresponding to the happiness, the anger and the sadness, and updating the marked labels into an emotion sound database.

4. A speech based emotion recognition method as claimed in claim 3, wherein: the duration of the voice segment is 2-6 s.

5. A speech-based emotion recognition apparatus, characterized in that: comprises that

6. A speech based emotion recognition apparatus as claimed in claim 5, wherein: the audio processing module comprises an analog-to-digital converter, an audio output device, an anti-aliasing filter and a pre-emphasis circuit.

7. A speech based emotion recognition apparatus as claimed in claim 6, wherein: and the audio processing module and the database module are both physically connected with the data processing module.