CN109558935A

CN109558935A - Emotion recognition and exchange method and system based on deep learning

Info

Publication number: CN109558935A
Application number: CN201811434491.5A
Authority: CN
Inventors: 黄欢
Original assignee: Individual
Current assignee: Individual
Priority date: 2018-11-28
Filing date: 2018-11-28
Publication date: 2019-04-02

Abstract

The invention discloses a kind of emotion recognition and exchange method based on deep learning, comprising the following steps: training set establishment step；Model training step；Data collection steps, including acquisition expression information Step and acquisition language message step；Emotion recognition step；Interactive step.Speech recognition is dexterously combined together by the present invention with Expression Recognition, goes out the emotion of user according to voice and expression comprehensive descision, thus judge the true idea of user instantly most likely what, and provide answer the most suitable.The technical issues of present invention solves the limitation due to speech recognition technology at present, is only difficult to accurate judgement user mood from voice messaging.The invention also discloses a kind of emotion recognition and interactive system based on deep learning.

Description

Emotion recognition and exchange method and system based on deep learning

Technical field

The present invention relates to a kind of artificial intelligence (AI) technologies, and in particular to a kind of emotion recognition and friendship based on deep learning Mutual method.The invention further relates to a kind of emotion recognition and interactive system based on deep learning.

Background technique

With the development of society, China steps into aging society, the ratio that elderly population account for total population is higher and higher. Meanwhile by policy implication, the problem of the generally existing few sonization of the elderly at this stage, many elderlys at one's side all without children It accompanies.On the other hand, with the rapid development of economy, the operating pressure of young parent is very big, work busy leads to child all the year round It is more and more that infancy lacks the case where company.

With the development of artificial intelligence technology, it is desirable to robot come often with the left and right of old man or child.But it is existing Some chat robots are dialog mode robot, can only realize that drama formula is chatted, can not be made accurately according to the emotion of user Response, be still frosty machine for old man or child, intimate company can not be played the role of.

Summary of the invention

Technical problem to be solved by the invention is to provide a kind of emotion recognition and exchange method based on deep learning, it It can accurately identify the emotion of user and make corresponding feedback.

In order to solve the above technical problems, the present invention is based on the technology solution party of the emotion recognition of deep learning and exchange method Case is, comprising the following steps:

Training set establishment step: human face expression is classified；Face picture is collected, by every picture according to different expressions Classification is put into different files, to establish training set；

Model training step: building neural network, is trained, is obtained by depth to neural network using training set The neural network model of habit；

In another embodiment, the neural network includes convolutional layer, pond layer, full articulamentum, softmax expression type Classification layer.

In another embodiment, the neural network includes:

First convolutional layer, for extracting the feature of input picture；Activation primitive is set as ReLU function；First convolutional layer Output result enters the first pond layer；

First pond layer continues to extract face characteristic from the output result of the first convolutional layer using maximum pond mode； The output result of first pond layer enters the second convolutional layer；

Second convolutional layer, for extracting feature from the output result of the first pond layer, activation primitive is set as ReLU letter Number；

Second pond layer continues to extract face characteristic using the output result of the second convolutional layer as the input of this layer；

First full articulamentum is concentrated the face characteristic extracted, obtains using the result of the second pond layer as input To biggish feature vector；

Second full articulamentum, using the result of the first full articulamentum as input, face characteristic is carried out again it is highly concentrated, Obtain lesser feature vector；

Softmax expression type classification layer, using the output result of the second full articulamentum as input, by convolution operation, A possibility that obtaining the confidence level, i.e. various expression classifications of various expression classification results index；A possibility that pantomimia classification It is 100% that index, which adds up,.

In another embodiment, as follows to the training method of neural network in the model training step:

The face picture collected is inputted into neural network；

Using Python, tensorflow deep learning frame and the library OpenCv；

Several picture input neural networks are obtained from training set every time, operate positive pass by convolution pondization from level to level It broadcasts, obtains some classification results, which is predicted value；

By the classification results error originated from input function, error is obtained compared with desired value, degree is identified by error judgment；

Gradient vector is determined by backpropagation again；

Each weight is adjusted finally by gradient vector, so that error is gradually tended to 0 or convergent trend to predicted value It adjusts；

It repeats the above process until reaching maximum number of iterations.

Data collection steps: including acquisition expression information Step and acquisition language message step；

The acquisition expression information Step: the current expression signal of user is obtained；Current expression signal is inputted into the mind Through network model, neural network model extracts the picture feature of current expression signal, and a possibility that export various expression classifications Index；It will likely recognition result of the expression classification as Expression Recognition corresponding to sex index maximum value；

The acquisition language message step: obtaining the current speech signal of user, and the text for extracting current speech signal is special Levy the recognition result as speech recognition；

In another embodiment, the acquisition language message step includes: the library the pyauido acquisition for calling Python included Voice messaging, recording audio, and save as wav voice document；If not collecting voice messaging in setting is N seconds continuous, recognize It is finished for the words, terminates to record, wherein N >=2；Text envelope is converted by the voice of preservation using the api interface of Baidu's voice Breath.

Emotion recognition step: the recognition result of the recognition result of Expression Recognition and speech recognition is combined, is used Family currently has the expression content of emotion, the as current emotion recognition result of user；

Interactive step: inputting intelligent answer module for the current emotion recognition result of user, intelligent answer module according to The current emotional expression content in family, exports corresponding answer sentence.

The present invention also provides a kind of emotion recognition and interactive system based on deep learning, its technical solution is that, packet It includes:

Training set establishes module: being configured as establishing training set；In training set comprising multiple according to different expression classifications into The face picture of row classification；

Model training module: it is configured as being trained the neural network built using training set, obtain through too deep Spend the neural network model of study；The neural network model can judge in real time the expression of face, and export various tables A possibility that feelings classification index；

Data acquisition module: including expression information acquisition unit and verbal information capture unit；

Expression information acquisition unit: it is configured as obtaining the current expression signal of user；Current expression signal is inputted into institute Neural network model is stated, the picture feature of current expression signal is extracted, and exports current most probable Expression Recognition result；

Verbal information capture unit: it is configured as obtaining the current speech signal of user, extracts phonetic feature, carry out voice Identification, and export the recognition result of current speech signal；

Emotion recognition module: being configured as the recognition result according to Expression Recognition result and voice signal, and synthesis obtains use The current emotional expression content in family；

Intelligent answer module: it is configured as storing the answer sentence under various situations, scene；

Interactive module: it is configured as the current emotional expression content of user inputting intelligent answer module, intelligent answer mould The root tuber emotional expression content current according to user, exports corresponding answer sentence.

What the present invention can achieve has the technical effect that

Speech recognition is dexterously combined together by the present invention with Expression Recognition, goes out to use according to voice and expression comprehensive descision The emotion at family, thus judge the true idea of user instantly most likely what, and provide answer the most suitable.The present invention The technical issues of solving the limitation at present due to speech recognition technology, being only difficult to accurate judgement user mood from voice messaging.

The present invention is applied to robot, can acquire expression of user during speaking while exchanging with user, To accurately identify the emotion of user and make corresponding feedback.

The present invention uses deep learning algorithm, can recognize that a variety of expression indexes of face, to be accurately judged to use Family most probable mood instantly.

Detailed description of the invention

It should be understood by those skilled in the art that following explanation is only schematically to illustrate the principle of the present invention, the principle It can apply in many ways, to realize many different alternative embodiments.These explanations are only used for showing religion of the invention Lead the General Principle of content, it is not intended to which limitation is conceived in this disclosed invention.

It is incorporated in the present specification and forms part of this specification that accompanying drawing shows embodiment of the present invention, and And the principle for explaining the present invention together with the detailed description of general description and following drawings above.

The present invention will be further described in detail below with reference to the accompanying drawings and specific embodiments:

Fig. 1 is the flow chart the present invention is based on the emotion recognition of deep learning and exchange method.

Specific embodiment

In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with the embodiment of the present invention Attached drawing, the technical solution of the embodiment of the present invention is clearly and completely described.Obviously, described embodiment is this hair Bright a part of the embodiment, instead of all the embodiments.Based on described the embodiment of the present invention, ordinary skill Personnel's every other embodiment obtained under the premise of being not necessarily to creative work, shall fall within the protection scope of the present invention.It removes Non- other definition, the technical term or scientific term used herein are should be in fields of the present invention with general technical ability The ordinary meaning that personage is understood." first " used herein, " second " and similar word be not offered as any sequence, Quantity or importance, and be used only to distinguish different component parts.The similar word such as " comprising " means before there is the word The element or object in face are covered the element for appearing in the word presented hereinafter or object and its are equal, and are not excluded for other elements Or object.

As shown in Figure 1, the present invention is based on the emotion recognition of deep learning and exchange methods, comprising the following steps:

1, training set establishment step；

Human face expression is classified, it is such as angry, dislike and avoid, fear, glad, sad, surprised, One's mind settled as still water；It collects a large amount of The picture of face, the crowd including all age group both domestic and external；Every picture is put into according to different expression classifications different In file, to establish training set；

2, model training step；

Neural network is built, neural network is trained using training set, obtains the neural network by deep learning Model；

The neural network that the present invention is built includes convolutional layer, pond layer, full articulamentum, softmax expression type classification Layer；It the number of plies of each layer and is formed by network structure and can according to need and be adjusted；

As the embodiment of the present invention, neural network framework may include:

First convolutional layer, for extracting the feature of input picture；Activation primitive is set as ReLU function；Because ReLU can The output 0 for making a part of neuron, can result in the sparsity of network in this way, and the interdependence for reducing parameter is closed System, it is thus possible to alleviate the generation of overfitting problem, and when Sigmoid function backpropagation, it is easy to it just will appear gradient and disappear The case where mistake；The output result of first convolutional layer enters the first pond layer；

First pond layer continues to extract face characteristic from the output result of the first convolutional layer using maximum pond mode, The output result of first pond layer enters the second convolutional layer；

Softmax expression type classification layer, using the output result of the second full articulamentum as input, by convolution operation, A possibility that obtaining the confidence level of seven kinds of expression classification results, i.e. seven kinds of expression classifications index；Such as mood is " high to this person at this time It is emerging " index be 73%, the index of " surprised " is 5%, and the index of " One's mind settled as still water " is 22%, seven kinds of expression indexes add up for 100%.

The present invention is as follows to the training method of neural network:

The face picture collected is inputted into neural network by the grayscale image that reshape becomes 48*48*1；It is a total of about 30000 plurality of pictures, training set is 30000 selected, and batch size=32 is arranged, maximum number of iterations 10000 times, maximum herein The number of iterations can according to need increase；

Using Python, tensorflow deep learning frame and the library OpenCv；

32 pictures input neural network is obtained from training set every time, operates positive pass by convolution pondization from level to level It broadcasts, obtains some classification results, which is predicted value；The operation of convolution pondization is the prior art, and this will not be repeated here；

It is (true with desired value by classification results (predicted value) the error originated from input function (regularization punishment, prevent over-fitting) Value) compare to obtain error, degree (penalty values are the smaller the better) is identified by error judgment；

Gradient vector is determined by backpropagation again (reversed derivation, final purpose are to keep error minimum)；

It repeats the above process until reaching maximum number of iterations.

The neural network model by deep learning is obtained, inputs picture to be detected to the neural network model again later, Neural network model can judge which kind of expression face is according to picture, and index a possibility that export various expressions, For example mood is the index of " happiness " to this person at this time is 73%, the index of " surprised " is 5%, and the index of " One's mind settled as still water " is 22%, choose recognition result of the expression classification as Expression Recognition corresponding to Possibility index maximum value.

3, data collection steps；

Acquire expression information；The current expression picture of user is shot by camera, to obtain the current expression letter of user Number；Current expression signal is inputted into neural network model, extracts the picture feature of current expression signal, and export various expressions Possibility index；Wherein recognition result of the expression classification as Expression Recognition corresponding to Possibility index maximum value, is denoted as str1；

Acquire language message；The current speech signal for obtaining user, extracts the text feature of current speech signal as language The recognition result of sound identification；

Specific steps include: that the library pyauido for calling Python included acquires voice messaging, recording audio, and saves as Wav voice document；If setting does not collect voice messaging in continuous 2 seconds, then it is assumed that the words is finished, and terminates to record, wherein 2 Second, this value oneself can according to circumstances change；Text information is converted by the voice of preservation using the api interface of Baidu's voice, The recognition result of speech recognition is denoted as str2.

4, emotion recognition step: the recognition result of the recognition result of Expression Recognition and speech recognition is combined, is obtained User currently has the expression content of emotion, the as current emotion recognition result of user；

Specific steps include: to combine the recognition result str2 of the recognition result str1 of Expression Recognition and speech recognition Str is obtained, then str is the current emotional expression content of user；

Such as: it is 73% that by Expression Recognition this person, mood, which is the index of " happiness ", at this time, and the index of " surprised " is 5%, The index of " One's mind settled as still water " is 22%, and happiness index is maximum, and it is glad for judging user at this time；User's word is that " I am disagreeable You ", show that user actually thinks that the content of expression is " happily saying that I dislikes you " at this time；Then str1=is glad, and " I begs for str2= Detest you ", the input for being eventually transferred to intelligent answer module can be str=str1+str2=" glad, I dislikes you ", herein Str1, which can according to need, carries out a series of modifications, increases some qualifiers etc., for example is changed to str=" I happily says that I begs for Detest you " etc., come to be write as following formula with the functional form analogy in mathematics: str=f (str1)+str2.

5, interactive step: inputting intelligent answer module for the current emotion recognition result of user, intelligent answer module according to The current emotional expression content of user, exports corresponding answer sentence；

Specific steps include: to extract the interior of wherein str1 and str2 respectively after intelligent answer module receives str text Hold, selects corresponding answer sentence result jointly further according to str1 and str2；

Intelligent answer module, can be database, for storing the answer sentence in various situations；Than if any for " sad The answer 1 of " you do not like me " said when wound ", the answer 2 of " you do not like me " said when having for " anger ", is also directed to Answer 3 of " you do not like me " said when " One's mind settled as still water ", etc..

If, only may be very stiff by the reply that speech recognition result " I dislikes you " obtains without Expression Recognition, Even misunderstand.And use the present invention, then it can be set into and replied according to " Zhang San happily says that I dislikes you ", this two Reply in the case of kind is different.Furthermore it is possible to design the modules such as intelligent answer, corpus docking as needed, in this way It can make to reply more flexible.Such as can be set scherzando reply " seeing that your appearance is known that you are to deceive me " etc. and It is not only to reply one " sorry ".It just imagines, the robot of common dialogue formula is identified only according to sound, is obtained Be dead-pan text information, without emotion, it is impossible to the true expression and idea for knowing user at this time, so can Stiff answer can be will cause.One simple " I dislikes you " may be that user is said with happy tone, it is also possible to raw It says to gas, it is also possible to which One's mind settled as still water, and ground is said, and common dialog mode robot only just knows that and user said one " I begs for Detest you ", it does not distinguish not Chu Lai true expression meaning at all but, the emotion of user can not be obtained, cause user experience very low, nothing The intimate company user of method.

The present invention is based on the emotion recognition of deep learning and interactive systems, comprising:

Training set establishes module: being configured as establishing training set；

Model training module: it is configured as being trained the neural network built using training set, obtain through too deep Spend the neural network model of study；The neural network model can judge in real time the expression of face, and export various tables A possibility that feelings index；

Expression information acquisition unit: it is configured as obtaining the current expression signal of user；Current expression signal is inputted into mind Through network model, the picture feature of current expression signal is extracted, and exports current most probable Expression Recognition result；

Present invention can apply to families to accompany chat robots, can by " see (Expression Recognition)+listen (speech recognition)+ Say (interaction) " mode and user carry out emotion communication, so that robot is developed into the playfellow that can identify user feeling from toy.

Obviously, those skilled in the art can carry out various changes and deformation to the present invention, without departing from of the invention Spirit and scope.In this way, if these modifications of the invention belong within the scope of the claims in the present invention and its equivalent technology, Then the present invention is also intended to encompass including these changes and deformation.

Claims

1. a kind of emotion recognition and exchange method based on deep learning, which comprises the following steps:

Training set establishment step: human face expression is classified；Face picture is collected, by every picture according to different expression classifications It is put into different files, to establish training set；

Model training step: building neural network, is trained, is obtained by deep learning to neural network using training set Neural network model；

The acquisition expression information Step: the current expression signal of user is obtained；Current expression signal is inputted into the nerve net Network model, neural network model extract the picture feature of current expression signal, and index a possibility that export various expression classifications； It will likely recognition result of the expression classification as Expression Recognition corresponding to sex index maximum value；

The acquisition language message step: obtaining the current speech signal of user, and the text feature for extracting current speech signal is made For the recognition result of speech recognition；

Emotion recognition step: the recognition result of the recognition result of Expression Recognition and speech recognition is combined, and is obtained user and is worked as The current emotion recognition result of the preceding expression content with emotion, as user；

Interactive step: the current emotion recognition result of user is inputted into intelligent answer module, intelligent answer module is worked as according to user Preceding emotional expression content exports corresponding answer sentence.

2. the emotion recognition and exchange method according to claim 1 based on deep learning, which is characterized in that the nerve Network includes convolutional layer, pond layer, full articulamentum, softmax expression type classification layer.

3. the emotion recognition and exchange method according to claim 1 based on deep learning, which is characterized in that the nerve Network includes:

First convolutional layer, for extracting the feature of input picture；Activation primitive is set as ReLU function；The output of first convolutional layer As a result the first pond layer is entered；

First pond layer continues to extract face characteristic from the output result of the first convolutional layer using maximum pond mode；First The output result of pond layer enters the second convolutional layer；

Second convolutional layer, for extracting feature from the output result of the first pond layer, activation primitive is set as ReLU function；

First full articulamentum, using the result of the second pond layer as input, the face characteristic extracted is concentrated, obtain compared with Big feature vector；

Second full articulamentum carries out face characteristic highly concentrated again using the result of the first full articulamentum as input, obtains Lesser feature vector；

Softmax expression type classification layer, by convolution operation, is obtained using the output result of the second full articulamentum as input A possibility that confidence level, i.e. various expression classifications of various expression classification results index；A possibility that pantomimia classification index Add up is 100%.

4. the emotion recognition and exchange method according to claim 1 based on deep learning, which is characterized in that the model It is as follows to the training method of neural network in training step:

The face picture collected is inputted into neural network；

Using Python, tensorflow deep learning frame and the library OpenCv；

Several picture input neural networks are obtained from training set every time, operate forward-propagating by convolution pondization from level to level, Some classification results is obtained, which is predicted value；

Gradient vector is determined by backpropagation again；

Each weight is adjusted finally by gradient vector, so that error is gradually tended to 0 or the adjusting of convergent trend to predicted value；

It repeats the above process until reaching maximum number of iterations.

5. the emotion recognition and exchange method according to claim 1 based on deep learning, which is characterized in that the acquisition Language message step includes: that the library pyauido for calling Python included acquires voice messaging, recording audio, and saves as wav language Sound file；If not collecting voice messaging in setting is N seconds continuous, then it is assumed that the words is finished, and terminates to record, wherein N >=2； Text information is converted by the voice of preservation using the api interface of Baidu's voice.

6. a kind of emotion recognition and interactive system based on deep learning characterized by comprising

Training set establishes module: being configured as establishing training set；Divided comprising multiple according to different expression classifications in training set The face picture of class；

Model training module: it is configured as being trained the neural network built using training set, obtain by depth The neural network model of habit；The neural network model can judge in real time the expression of face, and export various expression classes Other Possibility index；

Expression information acquisition unit: it is configured as obtaining the current expression signal of user；Current expression signal is inputted into the mind Through network model, the picture feature of current expression signal is extracted, and exports current most probable Expression Recognition result；

Verbal information capture unit: it is configured as obtaining the current speech signal of user, extracts phonetic feature, carry out voice knowledge Not, and the recognition result of current speech signal is exported；

Emotion recognition module: being configured as the recognition result according to Expression Recognition result and voice signal, and synthesis show that user works as Preceding emotional expression content；

Interactive module: it is configured as the current emotional expression content of user inputting intelligent answer module, intelligent answer module root According to the current emotional expression content of user, corresponding answer sentence is exported.