CN112488219A

CN112488219A - Mood consolation method and system based on GRU and mobile terminal

Info

Publication number: CN112488219A
Application number: CN202011417391.9A
Authority: CN
Inventors: 姜文刚; 谢虹
Original assignee: Jiangsu University of Science and Technology
Current assignee: Jiangsu University of Science and Technology
Priority date: 2020-12-07
Filing date: 2020-12-07
Publication date: 2021-03-12

Abstract

A method, a system and a mobile terminal for consolation of emotion based on GRU belong to the technical field of intelligent equipment, and comprise the following steps: recording the facial expressions and the sounds into a system, and training the received data so as to realize the recognition of the emotion of the user; according to the recognized emotion, the terminal makes a preset comfort action; the mobile terminal records the time, the adjusting mode, the adjusting duration and other information of each emotion adjustment of the user; the obtained information is brought into a GRU network for relearning; data reports are generated periodically. The invention has the beneficial effects that: the double-layer direction GRU is adopted to respectively simulate human auditory and visual processing paths to process the emotional information of voice and facial expressions, the GRU can overcome the problems of gradient disappearance and explosion during RNN modeling, the training time is shorter than that of LSTM, and the overfitting problem is less. After the attention mechanism is introduced, the influence weight of important time sequence characteristics can be improved, non-important time sequence characteristics are restrained, and the classification effect of the model is improved.

Description

Mood consolation method and system based on GRU and mobile terminal

Technical Field

The invention belongs to the technical field of intelligent equipment, and particularly relates to a method and a system for emotion comfort based on GRU (generalized gre unit) voice and image recognition.

Background

With the development of society, the rhythm of life of people is faster and faster. Sadness emotion at ordinary times is more and more reluctant, and the application field of expression recognition is wide at present, but the emotion recognition is rarely applied to human emotion directly. Tragedies that occur because of psychological factors are coming out endlessly, especially in student groups.

People generally carry the mobile terminal when going out, the time for using the mobile terminal is also in the trend of increasing year by year, and the interaction between the mobile terminal and the mobile terminal also becomes the main mode for expressing the view of people and releasing the emotion.

Disclosure of Invention

In order to solve the technical problems, the invention provides a method, a system and a mobile terminal for consolation based on the emotion of a GRU (generalized gre). A camera of the mobile terminal is used for taking a picture in real time, a microphone of the mobile terminal is used for recording the picture, the picture is put into a long-time memory GRU network for training, so that the emotion of a user is classified, and the mobile terminal is controlled to consolation the user by using various optional methods according to a judgment result.

A GRU-based emotional comfort method comprises the following steps:

step 1, inputting facial expressions and voices of a person into a system, and putting collected picture and voice data into a GRU network algorithm for training so as to realize the recognition of the emotion of a user;

step 2, the terminal performs a preset comfort action according to the recognized emotion of the user;

step 3, the mobile terminal records the time of each emotion adjustment of the user, the mode used for adjusting the emotion, the adjustment duration and the evaluation information of the user on the adjustment effect;

step 4, bringing the information fed back in the step 3 into a GRU network for relearning so as to adapt to behavior preferences of different users;

and 5, periodically generating a data report.

Preferably, the emotion recognition in step 1 specifically includes the following steps:

step 11, inputting video and audio;

step 12, preprocessing the audio and extracting 43-dimensional effective features; processing the video to extract 26-dimensional effective features of the video;

step 13, carrying the effective characteristics of the audio and the video into a GRU network for training;

and step 14, carrying out decision layer fusion algorithm to identify the emotion of the user.

Preferably, the video processing method in step 12 includes the following steps:

step 121: extracting image frames, namely extracting one picture every 3 frames;

step 122: extracting 68 coordinates of the facial feature points from the step 121 by using a Dlib library;

step 123: on the basis of the coordinates of the 68 feature points, the distance length between 26 two points is selected as an expression feature;

step 124: the 26-dimensional features are fed into the GRU network training and testing.

Preferably, the extraction of the voice effective features in step 12 includes the following steps:

step 125: for the preprocessing work of the audio, the window length is set to be 0.025s respectively, and the time interval for extracting the speech emotion characteristics is 0.01 s;

step 126: and performing feature extraction, wherein 43-dimensional feature vectors for representing the speech emotion are extracted in total, and the 13-dimensional MFCC features, the 2-dimensional MFCC dynamic difference parameters comprise MFCC1 order difference and 2 order difference respectively, and the 26-dimensional Fbank features and the 2-dimensional standard differences comprise MFCC and Fbank standard differences respectively.

Preferably, the decision layer fusion algorithm in step 14 includes the following steps:

step 141: and splicing the 43-dimensional feature vector extracted by the voice and the 26-dimensional feature vector extracted by the video into a 59-dimensional emotion feature vector and standardizing the 59-dimensional emotion feature vector.

Step 142: the standardized features are sent to a GRU network for training and testing;

step 143: and integrating the voice and facial expression emotion recognition results output by the GRU in a weighting mode.

Preferably, the emotion is classified into 6 types through emotion recognition in step 2, and the mobile terminal will respectively react differently according to different emotion types:

if the emotion is identified as happy, the mobile terminal does not make any reaction;

if the emotion recognition is surprise, the mobile terminal automatically pops up a webpage search bar to enable the user to search for things which are surprised by the user;

if the emotion is recognized as fear, disgust, sadness or angry emotion, the mobile terminal plays light music or plays a funny video, if a certain time passes, the emotion is recognized to be still fear through emotion recognition, and the mobile terminal automatically contacts preset contacts to seek artificial psychological comfort;

preferably, the mobile terminal records the mode of emotion adjustment and the manual evaluation information of the user in each time in step 4, the user learns through the GRU network, the relationship among the mode of emotion adjustment, the type of emotion, the adjustment effect and the adjustment duration of each user is determined in a personalized manner, and the related data is stored in the mobile terminal for the user to call and learn again when the user uses the mobile terminal next time.

The GRU-based emotion comfort system realizes the steps of the comfort method.

A GRU based emotional comfort mobile terminal comprising a memory, a processor, a camera, a screen, a speaker, a microphone, a communication device and a GRU based emotional comfort program stored on the memory and executable on the processor, the GRU based emotional comfort program when executed by the processor implementing the steps of the GRU based emotional comfort method as described above.

The invention has the beneficial effects that:

the audio/video emotion recognition in emotion calculation has important application value for deep level cognition in the fields of human-computer interaction and the like, in order to overcome the problem that the recognition accuracy of a single-modal model depends on emotion types, a multi-modal emotion recognition model based on a GRU network is provided, the emotion information of voice and facial expressions is processed by respectively simulating human auditory and visual processing paths by adopting a double-layer direction GRU, and the GRU can overcome the problems of gradient loss and explosion in RNN modeling, is shorter than LSTM training time and has fewer overfitting problems. After the attention mechanism is introduced, the influence weight of important time sequence characteristics can be improved, non-important time sequence characteristics are restrained, and the classification effect of the model is improved. Meanwhile, the problem that the traditional discrete emotion six-classification method cannot measure the degree and has the problems of similar appearance and simultaneous coexistence of multiple emotions is considered.

Drawings

FIG. 1 is an overall flow chart of a GRU-based emotional comfort method of the present invention;

fig. 2 is a schematic flow chart of emotion recognition.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The invention provides a GRU-based emotion comfort method, which comprises the following steps:

and 5, periodically generating a data report.

step 11, inputting video and audio;

The multi-modal emotion recognition method provided by the invention mainly trains audio data and video data respectively by relying on a GRU network in a double-layer direction, and the double-layer direction respectively simulates human auditory and visual processing paths to process voice and facial expression video information. Fig. 2 is an emotion recognition module of the present design. In the overall design, in order to improve the training and test training efficiency, the design extracts few effective features, an audio channel extracts 43-dimensional effective features, a video channel extracts 26-dimensional effective features, and the total effective features are only 69-dimensional, so that the model can complete GRU multi-mode emotion recognition which is performed in real time and has excellent performance. The model selects a decision layer fusion method on feature fusion, so that the emotion is classified finally.

The GRU-based emotion comfort system realizes the steps of the comfort method.

Claims

1. A GRU-based emotional comfort method is characterized by comprising the following steps:

and 5, periodically generating a data report.

2. The GRU-based emotional comfort method according to claim 1, wherein the emotion recognition in step 1 specifically comprises the following steps:

step 11, inputting video and audio;

3. A GRU-based emotional comfort method according to claim 2, wherein the processing method of the video in step 12 comprises the following steps:

4. The GRU-based emotional comfort method of claim 2, wherein the extraction of the speech-active features in step 12 comprises the steps of:

5. The GRU-based emotional comfort method of claim 2, wherein the decision layer fusion algorithm in step 14 comprises the following steps:

step 141: splicing the 43-dimensional feature vector extracted by the voice and the 26-dimensional feature vector extracted by the video into 59-dimensional emotion feature vectors and standardizing the 59-dimensional emotion feature vectors;

6. The GRU-based emotion comforting method according to claim 1, wherein the emotions are classified into 6 types through emotion recognition in step 2, and the mobile terminal will respectively react differently according to different emotion types:

if the emotion is recognized as fear, disgust, sadness or anger, the mobile terminal plays a happy music or a funny video, if a certain time passes, the emotion is recognized to be still fear through emotion recognition, and the mobile terminal automatically contacts a preset contact person to seek artificial psychological comfort.

7. The method for consolation of emotion based on GRU of claim 1, wherein in step 4 the mobile terminal records the mode of emotion adjustment each time and the manual evaluation information of the user, learns through the GRU network, determines the relationship between the mode of emotion adjustment, the type of emotion, the effect of adjustment and the duration of adjustment for each user in a personalized manner, and stores the relevant data in the mobile terminal for the user to call and relearn when using next time.

8. A GRU based emotional comfort system, characterized in that it implements the steps of the GRU based emotional comfort method of claims 1 to 7.

9. A GRU based emotional comfort mobile terminal, characterized in that the mobile terminal comprises a memory, a processor, a camera, a screen, a speaker, a microphone, communication means and a GRU based emotional comfort program stored on the memory and executable on the processor, which when executed by the processor implements the steps of the GRU based emotional comfort method of claims 1 to 7.