CN112016367A - Emotion recognition system and method and electronic equipment - Google Patents

Emotion recognition system and method and electronic equipment Download PDF

Info

Publication number
CN112016367A
CN112016367A CN201910468800.9A CN201910468800A CN112016367A CN 112016367 A CN112016367 A CN 112016367A CN 201910468800 A CN201910468800 A CN 201910468800A CN 112016367 A CN112016367 A CN 112016367A
Authority
CN
China
Prior art keywords
emotion
voice
expression
user
text
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910468800.9A
Other languages
Chinese (zh)
Inventor
王晓东
杜威
王宏玉
王海鹏
邹风山
张悦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenyang Siasun Robot and Automation Co Ltd
Original Assignee
Shenyang Siasun Robot and Automation Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenyang Siasun Robot and Automation Co Ltd filed Critical Shenyang Siasun Robot and Automation Co Ltd
Priority to CN201910468800.9A priority Critical patent/CN112016367A/en
Publication of CN112016367A publication Critical patent/CN112016367A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/174Facial expression recognition
    • G06V40/176Dynamic expression
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/63Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for estimating an emotional state

Abstract

The application relates to an emotion recognition system, method and electronic equipment. The robot comprises a robot and a cloud server; the robot is used for collecting images or video data and voice signals of a user, identifying the images or video data and the voice signals, respectively acquiring emotion components of the user based on expressions and voices, and uploading the emotion components based on the expressions and the voices to a cloud server; the cloud server is used for acquiring emotion components of the user based on the text according to the voice signals, and fusing the emotion components based on the expression, the voice and the text based on a weight calculation method to obtain a final emotion recognition result of the user. Compared with the prior art, the method and the device can analyze the emotion of the user at multiple angles, so that the real emotion of the user can be more accurately described.

Description

Emotion recognition system and method and electronic equipment
Technical Field
The application belongs to the technical field of artificial intelligence, and particularly relates to an emotion recognition system, method and electronic equipment.
Background
Emotion is a state in which the feeling, thought and behavior of a person are combined, and it includes a psychological response of a person to external or self-stimulation, including a physiological response accompanying such a psychological response. The mood plays a ubiquitous role in the daily work and life of people. For example, in medical care, if the emotional state of a patient, particularly a patient with an expression disorder, can be known, different care measures can be taken according to the emotion of the patient, and the care quality can be improved. In the product development process, if the emotional state of the user in the product using process can be identified and the user experience is known, the product function can be improved, and a product more suitable for the user requirement is designed. In various human-machine interaction systems, human-machine interaction becomes more friendly and natural if the system can recognize the emotional state of a human. Therefore, emotion analysis and recognition are important interdisciplinary research subjects in the fields of neuroscience, psychology, cognitive science, computer science, artificial intelligence and the like.
Currently, a general emotion recognition technology generally requires a user to wear additional auxiliary devices such as glasses or a heart rate sensor to acquire physiological data of the user, so as to perform emotion recognition. In the process of human-computer interaction, the emotion of a person needs to be recognized, but if the emotion of the person can be judged by means of additional auxiliary equipment, the application of the system or the method is limited to a great extent, and the actual application requirements cannot be met. For example, the acquisition of physiological signals requires the use of a signal capture device, which may greatly affect the expression of the user's true mood, and thus may not truly acquire the user's current true emotional state. Meanwhile, the real emotional state of the user cannot be really recognized due to the limitation of technical conditions and processing methods by means of single-mode signal analysis. For example, a person's facial expression may change slightly within a few seconds, which may lead to inaccuracies in emotion recognition if the auxiliary device is unable to capture the few seconds of change, or the time of processing is left to ignore the past, or the algorithm is misrecognized, or the user may disguise his or her facial expression.
Disclosure of Invention
The application provides an emotion recognition system, method and electronic device, which aim to solve at least one of the above technical problems in the prior art to a certain extent.
In order to solve the above problems, the present application provides the following technical solutions:
an emotion recognition system comprises a robot and a cloud server;
the robot is used for collecting images or video data and voice signals of a user, identifying the images or video data and the voice signals, respectively acquiring emotion components of the user based on expressions and voices, and uploading the emotion components based on the expressions and the voices to a cloud server;
the cloud server is used for acquiring emotion components of the user based on the text according to the voice signals, and fusing the emotion components based on the expression, the voice and the text based on a weight calculation method to obtain a final emotion recognition result of the user.
The technical scheme adopted by the embodiment of the application further comprises the following steps: the robot includes data acquisition module and emotion recognition module, the data acquisition module includes:
an image acquisition unit: the emotion recognition module is used for acquiring image or video data of a user and transmitting the acquired image or video data to the emotion recognition module;
the voice acquisition unit: the emotion recognition module is used for acquiring voice signals of users and transmitting the acquired voice signals to the emotion recognition module;
the emotion recognition module includes:
an expression recognition unit: the emotion recognition system is used for extracting effective static expression characteristics or dynamic expression characteristics through collected image or video data, training an emotion recognition model based on expression by adopting the static expression characteristics or the dynamic expression characteristics, and performing emotion type judgment and emotion intensity calculation through the emotion recognition model based on the expression to obtain an emotion component based on the expression;
a voice recognition unit: the voice recognition method is used for analyzing and extracting voice characteristic parameters capable of representing emotion change from the collected voice signals, training a voice-based emotion recognition model by adopting the voice characteristic parameters, and performing emotion type judgment and emotion intensity calculation through the voice-based emotion recognition model to obtain an emotion component based on voice.
The technical scheme adopted by the embodiment of the application further comprises the following steps: the emotion component acquisition mode based on the expression specifically comprises the following steps: analyzing the video sequence, and analyzing and retrieving the key frames in the video sequence; intercepting a plurality of sequence frames containing the same or similar expressions, performing related preprocessing operation on the intercepted sequence frames, extracting facial features in the sequence frames, and extracting dynamic expression features and static expression features based on the facial features; when the model is trained, all the dynamic expression features and the static expression features are combined, and then the emotion classification is carried out by using a feature correlation analysis method.
The technical scheme adopted by the embodiment of the application further comprises the following steps: the emotion component acquisition mode based on voice specifically comprises the following steps: after the voice signal is preprocessed, extracting voice characteristic parameters capable of expressing current sound from the voice signal, analyzing and processing the voice characteristic parameters based on statistics, and then training a emotion recognition model based on voice by using a classification method based on the voice characteristic parameters; and by utilizing the emotion recognition model, selecting a classifier to perform emotion type judgment and emotion intensity calculation by adopting a classification recognition algorithm, and performing combined judgment by using a specific weight to obtain an emotion component based on voice.
The technical scheme adopted by the embodiment of the application further comprises the following steps: the cloud server comprises:
a text recognition module: the voice recognition engine is used for converting a voice signal of a user into text information, preprocessing the text information, extracting text characteristic parameters capable of representing emotion change from the preprocessed text information, and distinguishing the text characteristic parameters through the classifier to obtain emotion components of the text;
a data fusion module: the emotion recognition system is used for fusing emotion components based on expressions, voice and text by adopting a weight calculation method, calculating a final emotion recognition result and feeding back the final emotion recognition result to the robot; the fusion method comprises weight-based fusion, statistical data-based fusion and machine learning method-based fusion, and the weight calculation method comprises static weight setting and dynamic weight setting.
Another technical scheme adopted by the embodiment of the application is as follows: a method of emotion recognition, comprising:
step a: collecting image or video data and voice signals of a user;
step b: recognizing the image or video data and the voice signal, and respectively acquiring emotion components of the user based on expressions and voices;
step c: and acquiring a text-based emotion component of the user according to the voice signal, and fusing the emotion component based on the expression, the voice and the text based on a weight calculation method to obtain a final emotion recognition result of the user.
The technical scheme adopted by the embodiment of the application further comprises the following steps: in the step b, the recognizing the image or video data and the voice signal, and respectively acquiring emotion components of the user based on expressions and voices specifically includes:
step b 1: extracting effective static expression characteristics or dynamic expression characteristics through the collected image or video data, training an emotion recognition model based on the expression by adopting the static expression characteristics or the dynamic expression characteristics, and performing emotion type judgment and emotion intensity calculation through the emotion recognition model based on the expression to obtain an emotion component based on the expression;
step b 2: analyzing and extracting voice characteristic parameters capable of representing emotion change from the acquired voice signals, training a voice-based emotion recognition model by adopting the voice characteristic parameters, and performing emotion type judgment and emotion intensity calculation through the voice-based emotion recognition model to obtain an emotion component based on voice.
The technical scheme adopted by the embodiment of the application further comprises the following steps: in step b1, the expression-based emotion component acquisition mode specifically includes: analyzing the video sequence, and analyzing and retrieving the key frames in the video sequence; intercepting a plurality of sequence frames containing the same or similar expressions, performing related preprocessing operation on the intercepted sequence frames, extracting facial features in the sequence frames, and extracting dynamic expression features and static expression features based on the facial features; when the model is trained, all the dynamic expression features and the static expression features are combined, and then the emotion classification is carried out by using a feature correlation analysis method.
The technical scheme adopted by the embodiment of the application further comprises the following steps: in step b2, the speech-based emotion component acquisition method specifically includes: after the voice signal is preprocessed, extracting voice characteristic parameters capable of expressing current sound from the voice signal, analyzing and processing the voice characteristic parameters based on statistics, and then training a emotion recognition model based on voice by using a classification method based on the voice characteristic parameters; and by utilizing the emotion recognition model, selecting a classifier to perform emotion type judgment and emotion intensity calculation by adopting a classification recognition algorithm, and performing combined judgment by using a specific weight to obtain an emotion component based on voice.
The technical scheme adopted by the embodiment of the application further comprises the following steps: in the step c, the acquiring of the emotion component based on the text of the user according to the voice signal, and the fusing of the emotion component based on the expression, the voice and the text based on the weight calculation method specifically include:
step c 1: converting a voice signal of a user into text information by calling a voice recognition engine, preprocessing the text information, extracting text characteristic parameters capable of representing emotion change from the preprocessed text information, and judging the text characteristic parameters by a classifier to obtain emotion components of the text;
step c 2: fusing emotion components based on expressions, voice and text by adopting a weight calculation method, calculating a final emotion recognition result, and feeding back the final emotion recognition result to the robot; the fusion method comprises weight-based fusion, statistical data-based fusion and machine learning method-based fusion, and the weight calculation method comprises static weight setting and dynamic weight setting.
The embodiment of the application adopts another technical scheme that: an electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the following operations of the emotion recognition method described above:
step a: collecting image or video data and voice signals of a user;
step b: recognizing the image or video data and the voice signal, and respectively acquiring emotion components of the user based on expressions and voices;
step c: and acquiring a text-based emotion component of the user according to the voice signal, and fusing the emotion component based on the expression, the voice and the text based on a weight calculation method to obtain a final emotion recognition result of the user.
Compared with the prior art, the embodiment of the application has the advantages that: according to the emotion recognition system, the emotion recognition method and the electronic equipment, potential and slight emotion fluctuation of the user in a man-machine conversation process is captured, emotion recognition results based on multi-modal information such as expressions, voices and texts are respectively obtained by combining a relevant data processing technology and a relevant classification algorithm, and the emotion recognition results based on the multi-modal information such as the expressions, the voices and the texts are fused in a fusion algorithm and weight calculation mode to obtain a final emotion recognition result of the user. Compared with the prior art, the method and the device can analyze the emotion of the user at multiple angles, so that the real emotion of the user can be more accurately described.
Drawings
Fig. 1 is a schematic structural diagram of an emotion recognition system according to an embodiment of the present application;
FIG. 2 is a flow chart of a method of emotion recognition in an embodiment of the present application;
fig. 3 is a schematic structural diagram of a hardware device of an emotion recognition method provided in an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
Please refer to fig. 1, which is a schematic structural diagram of an emotion recognition system according to an embodiment of the present application. The emotion recognition system comprises a robot and a cloud server, wherein the robot is used for collecting multi-mode information such as images or video data and voice signals of a user, recognizing the multi-mode information such as the images or the video data and the voice signals respectively, acquiring emotion components of the user based on expressions and voices respectively by combining corresponding algorithms, and uploading the emotion components based on the expressions and the voices to the cloud server. The cloud server is used for acquiring emotion components of the user based on the text according to the voice signals, fusing the emotion components based on the expression, the voice and the text based on a weight calculation method, calculating a final emotion recognition result of the user, and returning the emotion recognition result to the robot.
Specifically, the robot comprises a data acquisition module and an emotion recognition module;
the data acquisition module comprises an image acquisition unit and a voice acquisition unit;
the image acquisition unit is used for acquiring image or video data of a user and transmitting the acquired image or video data to the emotion recognition module; in the embodiment of the application, the image acquisition unit is a camera; when a user approaches the robot, a camera mounted on the robot can detect the state of the user in real time and collect image or video data including the facial expression of the user.
The voice acquisition unit is used for acquiring voice signals of a user and transmitting the acquired voice signals to the emotion recognition module; in the embodiment of the application, the voice acquisition unit is a microphone, and when a user talks with the robot, the microphone on the robot acquires voice signals of the user.
The emotion recognition module is a PAD and comprises an expression recognition unit and a voice recognition unit;
the expression recognition unit is used for extracting effective static expression characteristics or dynamic expression characteristics through the collected image or video data, training an emotion recognition model based on the expression by adopting the static expression characteristics or the dynamic expression characteristics, selecting a classifier to perform emotion type judgment and emotion intensity calculation on the basis of the trained model to obtain emotion components based on the expression, and uploading the emotion components based on the expression to a cloud server in a wireless or wired mode; the emotion component comprises emotion types and emotion intensities corresponding to various emotions; the emotion component identification mode based on the expression specifically comprises the following steps: firstly, analyzing a video sequence, and analyzing and searching key frames in the video sequence; and then intercepting a plurality of sequence frames containing the same or similar expressions, carrying out related preprocessing operation on the intercepted plurality of sequence frames, extracting facial features in the sequence frames, and extracting dynamic expression features and static expression features based on the facial features. When the mapping model is trained, all dynamic expression features and static expression features are combined, and then emotion classification is carried out by using a feature correlation analysis method including principal component analysis and the like, so that correlation among the features is reduced, feature dimensionality is reduced, and high classification accuracy is guaranteed.
The voice recognition unit is used for analyzing and extracting voice characteristic parameters capable of representing emotion change from the collected voice signals, training a voice-based emotion recognition model by adopting the voice characteristic parameters, judging the voice characteristic parameters through a classifier to obtain voice-based emotion components, and uploading the voice-based emotion components to the cloud server; meanwhile, the voice recognition unit is also used for transmitting the collected voice signals to the cloud server, and emotion recognition based on the text is completed by the cloud server; the emotion component recognition method based on voice specifically comprises the following steps: firstly, preprocessing a voice signal to remove background noise, noise and the like; then extracting voice characteristic parameters capable of expressing the current sound from the voice signals, and carrying out analysis processing on the voice characteristic parameters based on statistics, including obtaining the mean value, variance and the like of the voice characteristic parameters; then, training a emotion recognition model based on voice by using a classification method based on the extracted voice characteristic parameters; and finally, selecting a classifier by using the trained emotion recognition model to perform emotion type judgment and emotion intensity calculation by using a classification recognition algorithm, and performing combined judgment by using a specific weight to obtain an emotion component based on voice.
In the embodiment of the present application, the classifier includes, but is not limited to, a support vector machine, a random forest, hidden markov, a neural network algorithm, and the like.
The cloud server comprises a text recognition module and a data fusion module;
the text recognition module is used for converting a voice signal of a user into text information by calling a related service (a voice recognition engine), analyzing and extracting text characteristic parameters capable of representing emotion change from the text information, and distinguishing the text characteristic parameters by a classifier to obtain emotion components of the text. Specifically, the method for recognizing the emotion component of the text specifically comprises the following steps: firstly, a voice signal of a user is converted into corresponding text information by calling related services, the text information is preprocessed, and related vocabularies such as spoken language or unrelated emotions are removed; and then, sending the preprocessed text information into a classifier, finishing classification of different emotions by using a related classification algorithm, and obtaining the emotion intensity corresponding to each emotion.
The data fusion module is used for receiving the emotion component based on the expression and the emotion component based on the voice, combining the emotion component based on the text obtained by the text recognition module, fusing the three emotion components based on a weight calculation method, calculating a final emotion recognition result and feeding the final emotion recognition result back to the robot; the fusion method of the data fusion module comprises various fusion methods such as fusion based on weight, fusion based on statistical data and fusion based on a machine learning method. The weight calculation method comprises static weight setting and dynamic weight setting, and when the three emotion components are respectively included, final emotion recognition result judgment is completed by using a static weight-based method. The static weight is derived from statistical analysis based on the existing data, namely, the historical emotion data is compared with the real emotion of the user, and then the weights of three emotion components are analyzed. The dynamic emotion weight settings are based primarily on real-time emotional feedback of facial expressions such that their weights change dynamically.
Please refer to fig. 2, which is a flowchart of an emotion recognition method according to an embodiment of the present application. The emotion recognition method in the embodiment of the application comprises the following steps:
step 100: the method comprises the following steps of collecting multi-mode information such as image or video data, voice signals and the like of a user in real time through a robot;
in step 100, acquiring image or video data of a user specifically includes: collecting through a camera arranged on the robot; when a user approaches the robot, a camera mounted on the robot can detect the state of the user in real time and collect image or video data including the facial expression of the user.
The collecting of the voice signal of the user specifically comprises: collecting through a microphone arranged on the robot; when the user has a conversation with the robot, the microphone on the robot collects the voice signal of the user.
Step 200: the method comprises the steps of respectively identifying multi-modal information such as image or video data and voice signals, respectively acquiring emotion components of a user based on expressions and voices by combining with corresponding algorithms, uploading the emotion components based on the expressions and the voices to a cloud server, and simultaneously uploading the voice signals to the cloud server;
in step 200, obtaining emotion components of the user based on expressions and voices specifically includes: identifying through PAD on the robot; the acquisition mode comprises the following steps:
step 201: extracting effective static expression characteristics or dynamic expression characteristics through collected image or video data, training an emotion recognition model based on expressions by adopting the static expression characteristics or the dynamic expression characteristics, selecting a classifier to perform emotion type judgment and emotion intensity calculation on the basis of the trained model to obtain emotion components based on the expressions, and uploading the emotion components based on the expressions to a cloud server in a wireless or wired mode;
in step 201, the emotion component includes emotion types and emotion intensities corresponding to various emotions; the emotion component identification mode based on the expression specifically comprises the following steps: firstly, analyzing a video sequence, and analyzing and searching key frames in the video sequence; and then intercepting a plurality of sequence frames containing the same or similar expressions, carrying out related preprocessing operation on the intercepted plurality of sequence frames, extracting facial features in the sequence frames, and extracting dynamic expression features and static expression features based on the facial features. When the mapping model is trained, all dynamic expression features and static expression features are combined, and then emotion classification is carried out by using a feature correlation analysis method including principal component analysis and the like, so that correlation among the features is reduced, feature dimensionality is reduced, and high classification accuracy is guaranteed.
Step 202: analyzing and extracting voice characteristic parameters capable of representing emotion change from the acquired voice signals, training a voice-based emotion recognition model by adopting the voice characteristic parameters, judging the voice characteristic parameters by a classifier to obtain a voice-based emotion component, and uploading the voice-based emotion component to a cloud server;
in step 202, the emotion component recognition method based on voice specifically includes: firstly, preprocessing a voice signal to remove background noise, noise and the like; then extracting voice characteristic parameters capable of expressing the current sound from the voice signals, and carrying out analysis processing on the voice characteristic parameters based on statistics, including obtaining the mean value, variance and the like of the voice characteristic parameters; then, training a voice-based emotion recognition model by using a plurality of classification methods based on the extracted voice characteristic parameters; and finally, selecting a classifier by using the trained emotion recognition model, performing emotion type judgment and emotion intensity calculation by adopting different types of classification recognition algorithms, and performing combined judgment by using specific weight to obtain an emotion component based on voice.
Step 300: the cloud server acquires emotion components of the user based on the text according to the voice signals, fuses the emotion components based on the expression, the voice and the text based on a weight calculation method, calculates a final emotion recognition result of the user, and returns the emotion recognition result to the robot;
in step 300, the emotion component fusion method specifically includes:
step 301: converting a voice signal of a user into text information by calling a related service (a voice recognition engine), analyzing and extracting text characteristic parameters capable of representing emotion change from the text information, and judging the text characteristic parameters by a classifier to obtain emotion components based on the text;
in step 301, the method for recognizing emotion components of a text specifically includes: firstly, a voice signal of a user is converted into corresponding text information by calling related services, the text information is preprocessed, and related vocabularies such as spoken language or unrelated emotions are removed; and then, sending the preprocessed text information into a classifier, finishing classification of different emotions by using a related classification algorithm, and obtaining the emotion intensity corresponding to each emotion.
Step 302: combining the emotion component based on the expression, the emotion component based on the voice and the emotion component based on the text, fusing the three emotion components by adopting different weight calculation methods, calculating a final emotion recognition result, and feeding the final emotion recognition result back to the robot;
in step 302, the fusion methods include various fusion methods such as weight-based fusion, statistical data-based fusion, and machine learning-based fusion. The weight calculation method comprises static weight setting and dynamic weight setting, and when the three emotion components are respectively included, final emotion recognition result judgment is completed by using a static weight-based method. The static weight is derived from statistical analysis based on the existing data, namely, the historical emotion data is compared with the real emotion of the user, and then the weights of three emotion components are analyzed. The dynamic emotion weight settings are based primarily on real-time emotional feedback of facial expressions such that their weights change dynamically.
Fig. 3 is a schematic structural diagram of a hardware device of an emotion recognition method provided in an embodiment of the present application. As shown in fig. 3, the device includes one or more processors and memory. Taking a processor as an example, the apparatus may further include: an input system and an output system.
The processor, memory, input system, and output system may be connected by a bus or other means, as exemplified by the bus connection in fig. 3.
The memory, which is a non-transitory computer readable storage medium, may be used to store non-transitory software programs, non-transitory computer executable programs, and modules. The processor executes various functional applications and data processing of the electronic device, i.e., implements the processing method of the above-described method embodiment, by executing the non-transitory software program, instructions and modules stored in the memory.
The memory may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data and the like. Further, the memory may include high speed random access memory, and may also include non-transitory memory, such as at least one disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory optionally includes memory located remotely from the processor, and these remote memories may be connected to the processing system over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The input system may receive input numeric or character information and generate a signal input. The output system may include a display device such as a display screen.
The one or more modules are stored in the memory and, when executed by the one or more processors, perform the following for any of the above method embodiments:
step a: collecting image or video data and voice signals of a user;
step b: recognizing the image or video data and the voice signal, and respectively acquiring emotion components of the user based on expressions and voices;
step c: and acquiring a text-based emotion component of the user according to the voice signal, and fusing the emotion component based on the expression, the voice and the text based on a weight calculation method to obtain a final emotion recognition result of the user.
The product can execute the method provided by the embodiment of the application, and has the corresponding functional modules and beneficial effects of the execution method. For technical details that are not described in detail in this embodiment, reference may be made to the methods provided in the embodiments of the present application.
Embodiments of the present application provide a non-transitory (non-volatile) computer storage medium having stored thereon computer-executable instructions that may perform the following operations:
step a: collecting image or video data and voice signals of a user;
step b: recognizing the image or video data and the voice signal, and respectively acquiring emotion components of the user based on expressions and voices;
step c: and acquiring a text-based emotion component of the user according to the voice signal, and fusing the emotion component based on the expression, the voice and the text based on a weight calculation method to obtain a final emotion recognition result of the user.
Embodiments of the present application provide a computer program product comprising a computer program stored on a non-transitory computer readable storage medium, the computer program comprising program instructions that, when executed by a computer, cause the computer to perform the following:
step a: collecting image or video data and voice signals of a user;
step b: recognizing the image or video data and the voice signal, and respectively acquiring emotion components of the user based on expressions and voices;
step c: and acquiring a text-based emotion component of the user according to the voice signal, and fusing the emotion component based on the expression, the voice and the text based on a weight calculation method to obtain a final emotion recognition result of the user.
According to the emotion recognition system, the emotion recognition method and the electronic equipment, potential and slight emotion fluctuation of the user in a man-machine conversation process is captured, emotion recognition results based on multi-modal information such as expressions, voices and texts are respectively obtained by combining a relevant data processing technology and a relevant classification algorithm, and the emotion recognition results based on the multi-modal information such as the expressions, the voices and the texts are fused in a fusion algorithm and weight calculation mode to obtain a final emotion recognition result of the user. Compared with the prior art, the method and the device can analyze the emotion of the user at multiple angles, so that the real emotion of the user can be more accurately described.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (11)

1. An emotion recognition system is characterized by comprising a robot and a cloud server;
the robot is used for collecting images or video data and voice signals of a user, identifying the images or video data and the voice signals, respectively acquiring emotion components of the user based on expressions and voices, and uploading the emotion components based on the expressions and the voices to a cloud server;
the cloud server is used for acquiring emotion components of the user based on the text according to the voice signals, and fusing the emotion components based on the expression, the voice and the text based on a weight calculation method to obtain a final emotion recognition result of the user.
2. The emotion recognition system of claim 1, wherein the robot includes a data acquisition module and an emotion recognition module, the data acquisition module including:
an image acquisition unit: the emotion recognition module is used for acquiring image or video data of a user and transmitting the acquired image or video data to the emotion recognition module;
the voice acquisition unit: the emotion recognition module is used for acquiring voice signals of users and transmitting the acquired voice signals to the emotion recognition module;
the emotion recognition module includes:
an expression recognition unit: the emotion recognition system is used for extracting effective static expression characteristics or dynamic expression characteristics through collected image or video data, training an emotion recognition model based on expression by adopting the static expression characteristics or the dynamic expression characteristics, and performing emotion type judgment and emotion intensity calculation through the emotion recognition model based on the expression to obtain an emotion component based on the expression;
a voice recognition unit: the voice recognition method is used for analyzing and extracting voice characteristic parameters capable of representing emotion change from the collected voice signals, training a voice-based emotion recognition model by adopting the voice characteristic parameters, and performing emotion type judgment and emotion intensity calculation through the voice-based emotion recognition model to obtain an emotion component based on voice.
3. The emotion recognition system of claim 2, wherein the expression-based emotion component acquisition manner is specifically: analyzing the video sequence, and analyzing and retrieving the key frames in the video sequence; intercepting a plurality of sequence frames containing the same or similar expressions, performing related preprocessing operation on the intercepted sequence frames, extracting facial features in the sequence frames, and extracting dynamic expression features and static expression features based on the facial features; when the model is trained, all the dynamic expression features and the static expression features are combined, and then the emotion classification is carried out by using a feature correlation analysis method.
4. The emotion recognition system of claim 2, wherein the speech-based emotion component acquisition mode is specifically: after the voice signal is preprocessed, extracting voice characteristic parameters capable of expressing current sound from the voice signal, analyzing and processing the voice characteristic parameters based on statistics, and then training a emotion recognition model based on voice by using a classification method based on the voice characteristic parameters; and by utilizing the emotion recognition model, selecting a classifier to perform emotion type judgment and emotion intensity calculation by adopting a classification recognition algorithm, and performing combined judgment by using a specific weight to obtain an emotion component based on voice.
5. The emotion recognition system of any one of claims 1 to 4, wherein the cloud server comprises:
a text recognition module: the voice recognition engine is used for converting a voice signal of a user into text information, preprocessing the text information, extracting text characteristic parameters capable of representing emotion change from the preprocessed text information, and distinguishing the text characteristic parameters through the classifier to obtain emotion components of the text;
a data fusion module: the emotion recognition system is used for fusing emotion components based on expressions, voice and text by adopting a weight calculation method, calculating a final emotion recognition result and feeding back the final emotion recognition result to the robot; the fusion method comprises weight-based fusion, statistical data-based fusion and machine learning method-based fusion, and the weight calculation method comprises static weight setting and dynamic weight setting.
6. A method of emotion recognition, comprising:
step a: collecting image or video data and voice signals of a user;
step b: recognizing the image or video data and the voice signal, and respectively acquiring emotion components of the user based on expressions and voices;
step c: and acquiring a text-based emotion component of the user according to the voice signal, and fusing the emotion component based on the expression, the voice and the text based on a weight calculation method to obtain a final emotion recognition result of the user.
7. The emotion recognition method of claim 6, wherein in step b, the recognizing the image or video data and the voice signal and respectively obtaining emotion components of the user based on expressions and voices specifically comprises:
step b 1: extracting effective static expression characteristics or dynamic expression characteristics through the collected image or video data, training an emotion recognition model based on the expression by adopting the static expression characteristics or the dynamic expression characteristics, and performing emotion type judgment and emotion intensity calculation through the emotion recognition model based on the expression to obtain an emotion component based on the expression;
step b 2: analyzing and extracting voice characteristic parameters capable of representing emotion change from the acquired voice signals, training a voice-based emotion recognition model by adopting the voice characteristic parameters, and performing emotion type judgment and emotion intensity calculation through the voice-based emotion recognition model to obtain an emotion component based on voice.
8. The emotion recognition method of claim 7, wherein in step b1, the expression-based emotion component acquisition mode is specifically: analyzing the video sequence, and analyzing and retrieving the key frames in the video sequence; intercepting a plurality of sequence frames containing the same or similar expressions, performing related preprocessing operation on the intercepted sequence frames, extracting facial features in the sequence frames, and extracting dynamic expression features and static expression features based on the facial features; when the model is trained, all the dynamic expression features and the static expression features are combined, and then the emotion classification is carried out by using a feature correlation analysis method.
9. The emotion recognition method of claim 7, wherein in step b2, the speech-based emotion component acquisition mode is specifically: after the voice signal is preprocessed, extracting voice characteristic parameters capable of expressing current sound from the voice signal, analyzing and processing the voice characteristic parameters based on statistics, and then training a emotion recognition model based on voice by using a classification method based on the voice characteristic parameters; and by utilizing the emotion recognition model, selecting a classifier to perform emotion type judgment and emotion intensity calculation by adopting a classification recognition algorithm, and performing combined judgment by using a specific weight to obtain an emotion component based on voice.
10. The emotion recognition method according to any one of claims 6 to 9, wherein in the step c, the obtaining of the text-based emotion component of the user from the voice signal and the fusing of the emotion components based on expression, voice and text based on the weight calculation method specifically comprises:
step c 1: converting a voice signal of a user into text information by calling a voice recognition engine, preprocessing the text information, extracting text characteristic parameters capable of representing emotion change from the preprocessed text information, and judging the text characteristic parameters by a classifier to obtain emotion components of the text;
step c 2: fusing emotion components based on expressions, voice and text by adopting a weight calculation method, calculating a final emotion recognition result, and feeding back the final emotion recognition result to the robot; the fusion method comprises weight-based fusion, statistical data-based fusion and machine learning method-based fusion, and the weight calculation method comprises static weight setting and dynamic weight setting.
11. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the following operations of the emotion recognition method of any of above 6 to 10:
step a: collecting image or video data and voice signals of a user;
step b: recognizing the image or video data and the voice signal, and respectively acquiring emotion components of the user based on expressions and voices;
step c: and acquiring a text-based emotion component of the user according to the voice signal, and fusing the emotion component based on the expression, the voice and the text based on a weight calculation method to obtain a final emotion recognition result of the user.
CN201910468800.9A 2019-05-31 2019-05-31 Emotion recognition system and method and electronic equipment Pending CN112016367A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910468800.9A CN112016367A (en) 2019-05-31 2019-05-31 Emotion recognition system and method and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910468800.9A CN112016367A (en) 2019-05-31 2019-05-31 Emotion recognition system and method and electronic equipment

Publications (1)

Publication Number Publication Date
CN112016367A true CN112016367A (en) 2020-12-01

Family

ID=73501936

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910468800.9A Pending CN112016367A (en) 2019-05-31 2019-05-31 Emotion recognition system and method and electronic equipment

Country Status (1)

Country Link
CN (1) CN112016367A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112686048A (en) * 2020-12-23 2021-04-20 沈阳新松机器人自动化股份有限公司 Emotion recognition method and device based on fusion of voice, semantics and facial expressions
CN112700255A (en) * 2020-12-28 2021-04-23 科讯嘉联信息技术有限公司 Multi-mode monitoring service system and method
CN112910761A (en) * 2021-01-29 2021-06-04 北京百度网讯科技有限公司 Instant messaging method, device, equipment, storage medium and program product
CN112927681A (en) * 2021-02-10 2021-06-08 华南师范大学 Artificial intelligence psychological robot and method for recognizing voice from person to person
CN112990301A (en) * 2021-03-10 2021-06-18 深圳市声扬科技有限公司 Emotion data annotation method and device, computer equipment and storage medium
CN113538810A (en) * 2021-07-16 2021-10-22 中国工商银行股份有限公司 Security method, security system and automatic teller machine equipment

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101480668B1 (en) * 2014-03-21 2015-01-26 충남대학교산학협력단 Mobile Terminal Having Emotion Recognition Application using Voice and Method for Controlling thereof
CN107220591A (en) * 2017-04-28 2017-09-29 哈尔滨工业大学深圳研究生院 Multi-modal intelligent mood sensing system
US20180077095A1 (en) * 2015-09-14 2018-03-15 X Development Llc Augmentation of Communications with Emotional Data
KR20180054407A (en) * 2016-11-15 2018-05-24 주식회사 로보러스 Apparatus for recognizing user emotion and method thereof, and robot system using the same

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101480668B1 (en) * 2014-03-21 2015-01-26 충남대학교산학협력단 Mobile Terminal Having Emotion Recognition Application using Voice and Method for Controlling thereof
US20180077095A1 (en) * 2015-09-14 2018-03-15 X Development Llc Augmentation of Communications with Emotional Data
KR20180054407A (en) * 2016-11-15 2018-05-24 주식회사 로보러스 Apparatus for recognizing user emotion and method thereof, and robot system using the same
CN107220591A (en) * 2017-04-28 2017-09-29 哈尔滨工业大学深圳研究生院 Multi-modal intelligent mood sensing system

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112686048A (en) * 2020-12-23 2021-04-20 沈阳新松机器人自动化股份有限公司 Emotion recognition method and device based on fusion of voice, semantics and facial expressions
CN112700255A (en) * 2020-12-28 2021-04-23 科讯嘉联信息技术有限公司 Multi-mode monitoring service system and method
CN112910761A (en) * 2021-01-29 2021-06-04 北京百度网讯科技有限公司 Instant messaging method, device, equipment, storage medium and program product
CN112910761B (en) * 2021-01-29 2023-04-21 北京百度网讯科技有限公司 Instant messaging method, device, equipment, storage medium and program product
CN112927681A (en) * 2021-02-10 2021-06-08 华南师范大学 Artificial intelligence psychological robot and method for recognizing voice from person to person
CN112990301A (en) * 2021-03-10 2021-06-18 深圳市声扬科技有限公司 Emotion data annotation method and device, computer equipment and storage medium
CN113538810A (en) * 2021-07-16 2021-10-22 中国工商银行股份有限公司 Security method, security system and automatic teller machine equipment

Similar Documents

Publication Publication Date Title
US11226673B2 (en) Affective interaction systems, devices, and methods based on affective computing user interface
CN112016367A (en) Emotion recognition system and method and electronic equipment
CN107799126B (en) Voice endpoint detection method and device based on supervised machine learning
US20190188903A1 (en) Method and apparatus for providing virtual companion to a user
CN106997243B (en) Speech scene monitoring method and device based on intelligent robot
JP2017156854A (en) Speech semantic analysis program, apparatus and method for improving comprehension accuracy of context semantic through emotion classification
WO2008069519A1 (en) Gesture/speech integrated recognition system and method
CN106157956A (en) The method and device of speech recognition
KR20100001928A (en) Service apparatus and method based on emotional recognition
CN102298694A (en) Man-machine interaction identification system applied to remote information service
CN109101663A (en) A kind of robot conversational system Internet-based
CN112766173B (en) Multi-mode emotion analysis method and system based on AI deep learning
CN112581015B (en) Consultant quality assessment system and assessment method based on AI (advanced technology attachment) test
JP2018032164A (en) Interview system
CN112768070A (en) Mental health evaluation method and system based on dialogue communication
CN114724224A (en) Multi-mode emotion recognition method for medical care robot
CN111149172B (en) Emotion management method, device and computer-readable storage medium
KR102285482B1 (en) Method and apparatus for providing content based on machine learning analysis of biometric information
CN111339878B (en) Correction type real-time emotion recognition method and system based on eye movement data
CN108628454B (en) Visual interaction method and system based on virtual human
CN111383138A (en) Catering data processing method and device, computer equipment and storage medium
JP2018060374A (en) Information processing device, evaluation system and program
CN106997449A (en) Robot and face identification method with face identification functions
JP2017182261A (en) Information processing apparatus, information processing method, and program
JP2020067562A (en) Device, program and method for determining action taking timing based on video of user's face

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination