CN117809354B - Emotion recognition method, medium and device based on head wearable device perception - Google Patents

Emotion recognition method, medium and device based on head wearable device perception Download PDF

Info

Publication number
CN117809354B
CN117809354B CN202410223747.7A CN202410223747A CN117809354B CN 117809354 B CN117809354 B CN 117809354B CN 202410223747 A CN202410223747 A CN 202410223747A CN 117809354 B CN117809354 B CN 117809354B
Authority
CN
China
Prior art keywords
emotion
data
emotion recognition
network
local
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202410223747.7A
Other languages
Chinese (zh)
Other versions
CN117809354A (en
Inventor
张通
吴梦琪
王锦炫
陈俊龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Provincial Laboratory Of Artificial Intelligence And Digital Economy Guangzhou
South China University of Technology SCUT
Original Assignee
Guangdong Provincial Laboratory Of Artificial Intelligence And Digital Economy Guangzhou
South China University of Technology SCUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Provincial Laboratory Of Artificial Intelligence And Digital Economy Guangzhou, South China University of Technology SCUT filed Critical Guangdong Provincial Laboratory Of Artificial Intelligence And Digital Economy Guangzhou
Priority to CN202410223747.7A priority Critical patent/CN117809354B/en
Publication of CN117809354A publication Critical patent/CN117809354A/en
Application granted granted Critical
Publication of CN117809354B publication Critical patent/CN117809354B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Image Analysis (AREA)

Abstract

The invention relates to the technical field of emotion recognition, and particularly provides an emotion recognition method, medium and device based on head wearable device perception; the method comprises the following steps of: collecting multi-mode emotion data of a wearer; processing by adopting a local fusion emotion recognition network: performing deep convolution on the left eye input and the right eye input; the left lower face input and the right lower face input are respectively embedded with the extracted action units through an embedding layer, and then are input into the space domain map convolution together with the face behavior code; performing space mapping in a multi-layer perceptron, and performing feature map fusion after calculating space attention and channel attention to obtain emotion features; and fusing the emotion characteristics, and classifying to obtain a composite emotion recognition result. According to the method, facial action unit information is adopted in the local multi-view facial data to assist emotion perception, so that the robustness of the emotion information projected from the appearance of the body of a wearer is improved, and the emotion distinguishing precision is improved.

Description

Emotion recognition method, medium and device based on head wearable device perception
Technical Field
The invention relates to the technical field of emotion recognition, in particular to an emotion recognition method, medium and device based on head wearable device perception.
Background
Mental diseases often affect activities such as daily life, study, work, social activities and the like of a patient, long-term anxiety and depression can affect personal development of the patient, and even behaviors such as self injury, injury and the like occur. Therefore, screening and monitoring of mental diseases to achieve timely diagnosis and treatment of mental diseases is very important for mental disease patients.
The existing mental disease diagnosis mode generally adopts a form filling answer mode, or adopts external equipment to collect biological signal data or expression data of a diagnosed person, and performs data processing and emotion recognition, so that a mental disease diagnosis result is obtained. However, the external device is adopted to collect biological signal data or expression data of the diagnosed person, and the diagnosed person is generally required to stay in a specific detection environment to collect data within a short period of time; the mental state of the diagnosed person in the specific environment cannot fully represent the mental state in daily life; and the sampling time is not long, the sampling data volume is limited, and the accuracy of the diagnosis result is influenced. If the head wearable device is used for collecting emotion data, and then the emotion data is processed and analyzed, the detection place limitation and the sampling time limitation can be solved, and the method is an ideal mode. But the head wearable device is adopted to acquire the face image, and because the head wearable device is too close to the face, the face image is difficult to acquire, and a plurality of cameras are required to cooperate to acquire the images of all the positions of the face. The existing mode only can extract the emotion characteristics of the whole face (global) when carrying out expression trend recognition, and does not have the capability of extracting and fusing the emotion characteristics of the expression with a plurality of visual angles.
The expression is divided into macro-expressions and micro-expressions. The macro expression is a few facial expressions with obvious and long duration, is easy to detect and identify, plays a very important role in emotion identification, and is emotion data mainly adopted in the existing emotion identification technology; however, since people sometimes hide or unconsciously suppress their emotion, and deliberately control and change macro-expressions, it is not accurate to recognize emotion only by macro-expressions. The micro-expression is spontaneously generated in an unconscious state, is difficult to disguise or camouflage, is generally directly related to the true emotion, so that the micro-expression data is added in emotion analysis more reliably; however, the duration of the micro expression is short, and is usually only 1/25 s-1/3 s; the motion strength is low, the motion is difficult to perceive and capture, and the motion is less applied to the existing emotion recognition technology.
Disclosure of Invention
In order to overcome the defects and shortcomings in the prior art, the invention aims to provide an emotion recognition method, medium and device based on perception of head wearable equipment; according to the method, the local multi-view face data of the wearer can be acquired, the facial action unit information is adopted in the local multi-view face data to assist emotion perception, consistency and integrity of multi-region emotion states are coordinated, robustness of emotion information projected from the body appearance of the wearer is improved, and accordingly emotion distinguishing precision is improved.
In order to achieve the above purpose, the invention is realized by the following technical scheme: an emotion recognition method based on head wearable device perception, the head wearable device comprising: the device comprises a device body and a multi-mode data acquisition device arranged on the device body; the multimode data acquisition device comprises four first camera modules for respectively acquiring pictures and videos of four visual angles of a wearer; the four views refer to: left eye, right eye, lower left face, lower right face;
The emotion recognition method based on the perception of the head wearable device comprises a wearer emotion recognition method; the emotion recognition method for the wearer comprises the following steps of:
Step X1, collecting multi-mode emotion data of a wearer; the multimode emotion data of the wearer comprise picture data I and video data I which are acquired through four camera modules I; the first picture data and the first video data comprise four view angle data;
Step X2, extracting emotion characteristics of the multimodal emotion data of the wearer:
Processing the first picture data and the first video data by adopting a local fusion emotion recognition network respectively; the first picture data is input by taking four view angle data as four view angles of a local fusion emotion recognition network respectively; firstly, extracting a start frame and a peak frame from the video data to the four view angle data respectively, and inputting the start frame and the peak frame as the four view angles of the local fusion emotion recognition network;
The processing mode of the four visual angle inputs in the local fusion emotion recognition network is as follows: the left eye input and the right eye input are subjected to deep convolution to extract local visual angle characteristics; the left lower face input and the right lower face input are respectively embedded with the extracted action units through an embedding layer, and then are input into a space domain map convolution together with a facial behavior coding FACS to extract local visual angle characteristics; inputting the local visual angle characteristics obtained by extraction of the four visual angle inputs into a multi-layer perceptron at the same time for space mapping, and carrying out feature map fusion after calculating the space attention and the channel attention to obtain final emotion characteristics;
And step X3, fusing the emotion characteristics obtained in the step X2, and obtaining a composite emotion recognition result through classification.
Preferably, the number of the local fusion emotion recognition networks is four, namely a local fusion emotion recognition network I, a local fusion emotion recognition network II, a local fusion emotion recognition network III and a local fusion emotion recognition network IV;
The local fusion emotion recognition network processes the first picture data to obtain macro expression emotion characteristics of the first picture data; processing the first picture data by the local fusion emotion recognition network II to obtain micro-expression emotion characteristics of the first picture data; processing the video data I by the local fusion emotion recognition network III to obtain macro expression emotion characteristics of the video data I; and processing the video data one by the local fusion emotion recognition network four to obtain micro-expression emotion characteristics of the video data one.
Preferably, the four local fusion emotion recognition networks comprise four local feature extraction units respectively input from four visual angles; the two local feature extraction units for left eye input and right eye input both comprise a deep convolution network I; the two local feature extraction units aiming at the left lower face input and the right lower face input are formed by sequentially connecting an embedding layer and an airspace map convolution network; the embedded layer is also connected with the action unit extractor; the airspace map convolution network is also connected with a face motion coding system; lower left face input and lower right face input; the outputs of the four local feature extraction units are simultaneously connected with the multi-layer perceptron, and are fused through channel attention and space attention;
The local fusion emotion recognition network III and the local fusion emotion recognition network IV respectively comprise an action amplifying network aiming at two local feature extraction units of left eye input and right eye input; the left eye input and the right eye input amplify smile expression through an action amplifying network respectively, then input a deep convolution network I, and extract local visual angle characteristics.
Preferably, in the step X2, the first picture data and the first video data are preprocessed respectively before being processed by the local fusion emotion recognition network;
the method comprises the steps of preprocessing picture data I, wherein preprocessing comprises the step of performing face detection by utilizing a serially connected preprocessing convolutional neural network I; the face detection by using the serially connected preprocessing convolutional neural network is that: generating candidate frames, carrying out preliminary screening on the candidate frames, and detecting key points of the human face; after convolution, activation function, pooling and full connection processing, the confidence coefficient, the coordinate offset and the coordinates of five key points of each candidate frame are output so as to realize face detection;
Preprocessing the first video data comprises performing face detection by utilizing a serial preprocessing multi-layer deep convolutional neural network II; the face detection by using the serially connected preprocessing multi-layer deep convolutional neural network II means that: using a video streaming mode to read video data one frame by frame; each frame of image of the first video data is operated by utilizing pyramid data after the two pairs of images of the serially-connected preprocessing multi-layer depth convolution neural network are changed in size, so that a face frame, key point coordinates and face classification are obtained, and face detection is realized; the preprocessing multi-layer deep convolutional neural network II comprises an image size changing layer, a convolutional neural unit I, a convolutional neural unit II, a maximum pooling layer I, a full connecting layer I, a convolutional neural unit III, a maximum pooling layer II and a full connecting layer II which are sequentially connected, and a space attention layer connected between the convolutional neural unit III and the maximum pooling layer II.
Preferably, the step X3 means: and (3) fusing the emotion characteristics obtained in the step (X2) by adopting a multi-mode self-adaptive fusion module: the input of the multi-mode self-adaptive fusion module is emotion characteristics X= { X 1,…,Xn }, wherein X i is the ith emotion characteristic, and n is the number of emotion characteristics; feature fusion is performed iteratively by using an attention mechanism, and fusion features are finally obtained; inputting the fusion characteristics into a classifier for learning to obtain a composite emotion recognition result; the composite emotion recognition result adopts composite representation of emotion states; the composite representation of emotional states is: emotion category and corresponding proportion.
Preferably, the head wearable device, the multi-mode data acquisition device further comprises an audio acquisition module; in the step X1, the multimodal emotion data further includes audio data; in the step X2, emotion feature extraction is also performed on the audio data: filtering, smoothing and framing the audio data; extracting the characteristic of the mel-frequency spectrum coefficient; the mel cepstrum coefficient features are utilized and loaded into a feature vector form, and the feature vector form is input into BiLSTM neural networks based on an attention mechanism to extract emotion features;
In the step X1, the multi-modal emotion data further comprises text data; in the step X2, emotion feature extraction is also performed on the text data: and processing the text data by using a word2vec model to obtain a sequence context word vector representation, and extracting emotion characteristics by using an LSTM-based emotion analysis network.
Preferably, the head wearable device, the multi-mode data acquisition device further comprises a second camera module for acquiring pictures and videos of the observed person;
the emotion recognition method based on the perception of the head wearable equipment further comprises an observed person emotion recognition method; the emotion recognition method of the observed person comprises the following steps:
Step Y1, collecting multi-mode emotion data of an observed person; the multi-mode emotion data of the observed person comprises picture data II and video data II which are acquired through a camera module II;
step Y2, extracting emotion characteristics of the multi-mode emotion data of the observed person:
Carrying out emotion feature extraction on the picture data II by adopting an expression convolutional neural network I, an expression depth convolutional neural network II and a picture neural network respectively to obtain macro-expression emotion features, micro-expression emotion features and gesture emotion features of the picture data II;
extracting emotion characteristics of the video data II by adopting a three-dimensional convolutional neural network III and a convolutional neural network IV based on peak frame optical flow respectively to obtain macro expression emotion characteristics and micro expression emotion characteristics of the video data II;
The fourth convolutional neural network based on the peak frame optical flow performs emotion feature extraction, which means that: firstly, carrying out preprocessing of rotation, cutting and face alignment on video data II, and extracting a peak frame by using a peak frame detection algorithm; then, calculating optical flow vectors u, v and optical strain epsilon between the initial frame and the peak frame of the video data II; after the graying treatment, respectively taking the three channels as RGB images to synthesize an RGB image; extracting emotion characteristics of the RGB image;
The optical flow vectors u, v and optical strain ε are calculated using the following set of equations:
Wherein, I (x, y, t) represents the light intensity of the pixel point at the initial frame; t represents the time dimension in which it is located; dx, dy represents the movement values of the abscissa and the ordinate of the start frame to the peak frame, respectively; dt represents the time it takes for the start frame to move to the peak frame; order the ,/>,/>Respectively representing partial derivatives of gray scales of pixel points along x, y and t directions; wherein, I x,Iy,It is obtained by image data;
and Y3, fusing the emotion characteristics obtained in the step Y2, and obtaining a composite emotion recognition result through classification.
Preferably, the method for establishing the individual data storage database is further included: constructing an individual ID in an individual data storage database; before the emotion recognition method of the wearer or the emotion recognition method of the observed person starts, the individual ID of the wearer or the observed person is acquired; after the emotion recognition method of the wearer or the observed person is completed, the emotion recognition result obtained by the emotion recognition method of the wearer or the observed person and the time and place are stored in the corresponding individual ID.
A readable storage medium, wherein the storage medium stores a computer program which, when executed by a processor, causes the processor to perform the emotion recognition method based on head wearable device perception.
The computer equipment comprises a processor and a memory for storing a program executable by the processor, wherein the emotion recognition method based on the perception of the head wearable equipment is realized when the processor executes the program stored by the memory.
Compared with the prior art, the invention has the following advantages and beneficial effects:
1. The invention is based on the head wearable equipment, can collect the data of the wearer in real time, and further realize emotion perception and recognition of the wearer; the method has the advantages that local multi-view expression data of a wearer can be acquired, a local fusion emotion recognition network of macro expression and micro expression is assisted by facial action unit information, and the expression capacity of macro expression and micro expression characteristics is improved;
2. The invention can identify the complex emotion state, which accords with the cognition of emotion in neurology, and emotion is a multidimensional and multi-level experience;
3. The invention supports the extraction and fusion of the emotion characteristics of the multi-modal emotion data, and has the capability of mutually supporting and mutually supplementing the multi-modal emotion data;
4. The invention realizes the collection of the emotion signals of the cross-individuals, can collect and process the emotion signals of the user and a plurality of testees at the same time, and expands the application scene; for example, the method can be used in the scenes of assisting in mental disease inquiry, daily communication, mental disease screening, mental disease patient daily emotion change monitoring and the like; thereby the invention has the capability of improving the auxiliary screening, diagnosis and monitoring efficiency;
5. The method comprises the steps of establishing an individual data storage database from collected data and emotion analysis results; the method is beneficial to mining emotion requirements in home life, assisting in psychological disease diagnosis and recording daily emotion states; carrying out the review and analysis of the emotion state by combining the historical activities and data records of the face and the equipment with real-time data, and establishing the emotion personal portrait of the user so as to know the change of the emotion of the individual in time, environment and experience; and meanwhile, information such as places, background pictures, wearing and facial expressions and the like acquired by the equipment are processed, and factors affecting the emotion state are analyzed.
Drawings
FIG. 1 is a schematic structural view of a head wearable device of the present invention;
FIG. 2 is a schematic flow chart of the emotion recognition method of the wearer of the present invention;
FIG. 3 is a block diagram of a pre-processing multi-layer deep convolutional neural network two of the present invention;
FIG. 4 is a block diagram of a first and a second local fusion emotion recognition networks of the present invention;
FIG. 5 is a block diagram of a third and a fourth local fusion emotion recognition network of the present invention;
FIG. 6 is a block diagram of a multi-modal adaptive fusion module of the present invention;
FIG. 7 is a schematic flow chart of a preferred embodiment of the emotion recognition method of the present invention;
FIG. 8 is a schematic structural view of a two-head wearable device of an embodiment;
fig. 9 is a flowchart of a second embodiment of a method for identifying emotion of an observed person.
Detailed Description
The invention is described in further detail below with reference to the drawings and the detailed description.
Example 1
According to the emotion recognition method based on perception of the head wearable device, the head wearable device comprises: the device comprises a device body 1 and a multi-mode data acquisition device arranged on the device body; in this embodiment, the apparatus body 1 adopts glasses, as shown in fig. 1; the multimode data acquisition device comprises four camera modules I3, 4, 5 and 6 for respectively acquiring pictures and videos of four visual angles of a wearer; four views refer to: left eye, right eye, lower left face, lower right face; the first camera module 3 is used for collecting pictures and videos of the right eye of the wearer; the first camera module 4 is used for collecting left eye pictures and videos of a wearer; the first camera module 5 is used for collecting pictures and videos of the right lower face of the wearer; the first camera module 6 is used for collecting pictures and videos of the lower left face of the wearer.
The emotion recognition method based on the perception of the head wearable device comprises a wearer emotion recognition method.
The emotion recognition method of the wearer, as shown in fig. 2, comprises the following steps:
Step X1, collecting multi-mode emotion data of a wearer; the multimode emotion data of the wearer comprise picture data I and video data I which are acquired through four camera modules I; the first picture data and the first video data comprise four view angle data;
Step X2, extracting emotion characteristics of the multimodal emotion data of the wearer:
first, it is preferable to perform preprocessing on the first picture data and the first video data, respectively:
the method comprises the steps of preprocessing picture data I, wherein preprocessing comprises the step of performing face detection by utilizing a serially connected preprocessing convolutional neural network I; the face detection by using the serially connected preprocessing convolutional neural network is that: generating candidate frames, carrying out preliminary screening on the candidate frames, and detecting key points of the human face; after convolution, activation function, pooling and full connection processing, the confidence coefficient, the coordinate offset and the coordinates of five key points of each candidate frame are output so as to realize face detection;
Preprocessing the first video data comprises performing face detection by utilizing a serial preprocessing multi-layer deep convolutional neural network II; the face detection by using the serially connected preprocessing multi-layer deep convolutional neural network II means that: using a video streaming mode to read video data one frame by frame; each frame of image of the first video data is operated by utilizing pyramid data after the two pairs of images of the serially-connected preprocessing multi-layer depth convolution neural network are changed in size, so that a face frame, key point coordinates and face classification are obtained, and face detection is realized; the preprocessing multi-layer deep convolutional neural network II comprises an image size changing layer, a convolutional neural unit I, a convolutional neural unit II, a maximum pooling layer I, a full connecting layer I, a convolutional neural unit III, a maximum pooling layer II and a full connecting layer II which are sequentially connected, and a space attention layer connected between the convolutional neural unit III and the maximum pooling layer II as shown in fig. 3.
Then, processing the first picture data and the first video data by adopting a local fusion emotion recognition network respectively; the first picture data is input by taking four view angle data as four view angles of a local fusion emotion recognition network respectively; and extracting a start frame and a peak frame from the video data to the four view angle data respectively to serve as four view angle inputs of the local fusion emotion recognition network.
Specifically, the number of the local fusion emotion recognition networks is four, namely a local fusion emotion recognition network I, a local fusion emotion recognition network II, a local fusion emotion recognition network III and a local fusion emotion recognition network IV.
Processing the first picture data by the local fusion emotion recognition network to obtain macro expression emotion characteristics of the first picture data; and processing the first picture data by the local fusion emotion recognition network II to obtain micro-expression emotion characteristics of the first picture data.
The structure of the first local fusion emotion recognition network and the second local fusion emotion recognition network is shown in fig. 4, and the first local fusion emotion recognition network and the second local fusion emotion recognition network respectively comprise four local feature extraction units for inputting at four visual angles. The two local feature extraction units for left eye input and right eye input both comprise a deep convolution network I; the two local feature extraction units aiming at the left lower face input and the right lower face input are formed by sequentially connecting an embedding layer and an airspace map convolution network; the embedded layer is also connected with the action unit extractor; the airspace map convolution network is also connected with a face motion coding system; lower left face input and lower right face input; the outputs of the four local feature extraction units are simultaneously connected with the multi-layer perceptron and are fused through channel attention and space attention.
Processing the video data I by the local fusion emotion recognition network III to obtain macro expression emotion characteristics of the video data I; and processing the video data one by the local fusion emotion recognition network four to obtain micro-expression emotion characteristics of the video data one.
The structure of the local fusion emotion recognition network III and the local fusion emotion recognition network IV is shown in fig. 5, and the local fusion emotion recognition network III and the local fusion emotion recognition network IV respectively comprise four local feature extraction units for inputting at four visual angles. The two local feature extraction units aiming at the left eye input and the right eye input comprise an action amplifying network and a depth convolution network I; the left eye input and the right eye input amplify smile expression through an action amplifying network respectively, then input a deep convolution network I, and extract local visual angle characteristics. The two local feature extraction units aiming at the left lower face input and the right lower face input are formed by sequentially connecting an embedding layer and an airspace map convolution network; the embedded layer is also connected with the action unit extractor; the airspace map convolution network is also connected with a face motion coding system; lower left face input and lower right face input; the outputs of the four local feature extraction units are simultaneously connected with the multi-layer perceptron and are fused through channel attention and space attention.
The processing mode of the four visual angle inputs in the local fusion emotion recognition network is as follows: the left eye input and the right eye input are subjected to deep convolution to extract local visual angle characteristics; the left lower face input and the right lower face input are respectively embedded with the extracted action units through an embedding layer, and then are input into a space domain map convolution together with a facial behavior coding FACS to extract local visual angle characteristics; FACS (Facial Action Coding System), facial behavior code, which refers to a set of facial muscle movements. And simultaneously inputting the local visual angle characteristics obtained by extracting the four visual angle inputs into a multi-layer perceptron to carry out space mapping, and carrying out feature map fusion after calculating the space attention and the channel attention to obtain the final emotion characteristics.
The four local fusion emotion recognition networks adopt a convolutional neural network to extract emotion characteristics, and adopt a facial expression recognition network based on privilege action unit information, and AUs (AUs is a facial movement unit and is used for describing the minimum movement unit and expression characteristics of a human face) is used as privilege information to guide emotion recognition; meanwhile, AUs is used as an auxiliary output label of the network shallow layer, so that shallow layer characteristics in the auxiliary model are expressed.
And step X3, adopting a multi-mode self-adaptive fusion module to fuse the emotion characteristics obtained in the step X2: the structure of the multi-mode self-adaptive fusion module is shown in fig. 6, wherein the input is emotion characteristics X= { X 1,…,Xn }, X i is the ith emotion characteristic, and n is the number of emotion characteristics; feature fusion is performed iteratively by using an attention mechanism, and fusion features are finally obtained; inputting the fusion characteristics into a classifier for learning to obtain a composite emotion recognition result; the composite emotion recognition result adopts composite representation of emotion states; the composite representation of emotional states is: emotion category and corresponding proportion. For example, happy-75%, sad-1%, surprised-5%, fear 1.5%, aversion-1.5%, gas-1%, neutral-15%.
In this embodiment, the head wearable device, the multi-modal data collection apparatus preferably further comprises an audio collection module 7, such as a microphone;
the preferable scheme of the emotion recognition method of the wearer is shown in fig. 7:
in the step X1, the multi-modal emotion data further comprises audio data and text data; text data can be obtained through an applet/mobile terminal APP;
In step X2, emotion feature extraction is also performed on the audio data: filtering, smoothing and framing the audio data; extracting the characteristic of the mel-frequency spectrum coefficient; the mel cepstrum coefficient features are utilized and loaded into a feature vector form, and the feature vector form is input into BiLSTM neural networks based on an attention mechanism to extract emotion features;
emotion feature extraction is also performed on the text data: and processing the text data by using a word2vec model to obtain a sequence context word vector representation, and extracting emotion characteristics by using an LSTM-based emotion analysis network.
The method for establishing the individual data storage database is preferably further included in the embodiment: constructing an individual ID in an individual data storage database; before the emotion recognition method of the wearer or the emotion recognition method of the observed person starts, the individual ID of the wearer or the observed person is acquired; after the emotion recognition method of the wearer or the observed person is completed, the emotion recognition result obtained by the emotion recognition method of the wearer or the observed person and the time and place are stored in the corresponding individual ID. The location may be obtained by the GPS module 2 of the head wearable device.
The personal emotion portrait of the user is established by using a database technology to store the data records of the personal ID, time, place, emotion state, voice high-frequency word, picture background information, emotion state result, wearing (color) and the like acquired by the equipment, carrying out data statistics and carrying out the review and analysis of the emotion state by combining with real-time data, and knowing the change of the emotion of the individual under the conditions of time, environment and experience. Factors affecting emotional state are analyzed.
By establishing an emotion state portrait of an individual time dimension, counting individual voice high-frequency vocabulary, visualizing through a histogram, aiming at emotion states at different times, giving an emotion change curve chart visualization and a frequency statistics chart of high-frequency emotion occurrence, giving negative emotion early warning for the situation that more negative emotion occurs, and simultaneously giving corresponding reference advice.
By establishing individual emotion state portraits in space dimension, statistics and analysis of data such as position data, picture background, emotion state, wearing, high-frequency vocabulary and the like are used for analyzing emotion influencing factors including influence of environment and places on emotion, for example, the emotion state in the same environment is counted, and a simple statistics report is given.
Example two
The difference between the emotion recognition method based on head wearable device perception and the first embodiment is that: in this embodiment, the head wearable device, the multi-mode data acquisition device further includes a second camera module 8 for acquiring pictures and videos of the observed person, as shown in fig. 8.
The emotion recognition method based on the perception of the head wearable equipment further comprises an observed person emotion recognition method; as shown in fig. 9, the observed emotion recognition method includes the steps of:
Step Y1, collecting multi-mode emotion data of an observed person; the multi-mode emotion data of the observed person comprises picture data II and video data II which are acquired through a camera module II 8;
step Y2, extracting emotion characteristics of the multi-mode emotion data of the observed person:
Carrying out emotion feature extraction on the picture data II by adopting an expression convolutional neural network I, an expression depth convolutional neural network II and a picture neural network respectively to obtain macro-expression emotion features, micro-expression emotion features and gesture emotion features of the picture data II;
extracting emotion characteristics of the video data II by adopting a three-dimensional convolutional neural network III and a convolutional neural network IV based on peak frame optical flow respectively to obtain macro expression emotion characteristics and micro expression emotion characteristics of the video data II;
The fourth convolutional neural network based on the peak frame optical flow performs emotion feature extraction, which means that: firstly, carrying out preprocessing of rotation, cutting and face alignment on video data II, and extracting a peak frame by using a peak frame detection algorithm; then, calculating optical flow vectors u, v and optical strain epsilon between the initial frame and the peak frame of the video data II; after the graying treatment, respectively taking the three channels as RGB images to synthesize an RGB image; extracting emotion characteristics of the RGB image;
The optical flow vectors u, v and optical strain ε are calculated using the following set of equations:
Wherein, I (x, y, t) represents the light intensity of the pixel point at the initial frame; t represents the time dimension in which it is located; dx, dy represents the movement values of the abscissa and the ordinate of the start frame to the peak frame, respectively; dt represents the time it takes for the start frame to move to the peak frame; order the ,/>,/>Respectively representing partial derivatives of gray scales of pixel points along x, y and t directions; Wherein, I x,Iy,It is obtained by image data.
In step Y1 and step Y2, the multimodal emotion data of the observed person may further include audio data two; and extracting emotion characteristics of the second audio data.
And Y3, fusing the emotion characteristics obtained in the step Y2, and obtaining a composite emotion recognition result through classification.
Example III
The readable storage medium of this embodiment stores a computer program, which when executed by a processor, causes the processor to perform the emotion recognition method based on perception of a head wearable device described in the first or second embodiment.
Example IV
The computer device of the present embodiment includes a processor and a memory for storing a program executable by the processor, where when the processor executes the program stored in the memory, the emotion recognition method based on perception of the head wearable device of the first embodiment or the second embodiment is implemented.
The above examples are preferred embodiments of the present invention, but the embodiments of the present invention are not limited to the above examples, and any other changes, modifications, substitutions, combinations, and simplifications that do not depart from the spirit and principle of the present invention should be made in the equivalent manner, and the embodiments are included in the protection scope of the present invention.

Claims (8)

1. An emotion recognition method based on head wearable equipment perception is characterized by comprising the following steps of: the head wearable device includes: the device comprises a device body and a multi-mode data acquisition device arranged on the device body; the multimode data acquisition device comprises four first camera modules for respectively acquiring pictures and videos of four visual angles of a wearer; the four views refer to: left eye, right eye, lower left face, lower right face;
The emotion recognition method based on the perception of the head wearable device comprises a wearer emotion recognition method; the emotion recognition method for the wearer comprises the following steps of:
Step X1, collecting multi-mode emotion data of a wearer; the multimode emotion data of the wearer comprise picture data I and video data I which are acquired through four camera modules I; the first picture data and the first video data comprise four view angle data;
Step X2, extracting emotion characteristics of the multimodal emotion data of the wearer:
Processing the first picture data and the first video data by adopting a local fusion emotion recognition network respectively; the first picture data is input by taking four view angle data as four view angles of a local fusion emotion recognition network respectively; firstly, extracting a start frame and a peak frame from the video data to the four view angle data respectively, and inputting the start frame and the peak frame as the four view angles of the local fusion emotion recognition network;
The processing mode of the four visual angle inputs in the local fusion emotion recognition network is as follows: the left eye input and the right eye input are subjected to deep convolution to extract local visual angle characteristics; the left lower face input and the right lower face input are respectively embedded with the extracted action units through an embedding layer, and then are input into a space domain map convolution together with a facial behavior coding FACS to extract local visual angle characteristics; inputting the local visual angle characteristics obtained by extraction of the four visual angle inputs into a multi-layer perceptron at the same time for space mapping, and carrying out feature map fusion after calculating the space attention and the channel attention to obtain final emotion characteristics;
Step X3, fusing the emotion characteristics obtained in the step X2, and obtaining a composite emotion recognition result through classification;
The local fusion emotion recognition networks are four, namely a local fusion emotion recognition network I, a local fusion emotion recognition network II, a local fusion emotion recognition network III and a local fusion emotion recognition network IV;
The local fusion emotion recognition network processes the first picture data to obtain macro expression emotion characteristics of the first picture data; processing the first picture data by the local fusion emotion recognition network II to obtain micro-expression emotion characteristics of the first picture data; processing the video data I by the local fusion emotion recognition network III to obtain macro expression emotion characteristics of the video data I; processing the first video data by the local fusion emotion recognition network IV to obtain micro-expression emotion characteristics of the first video data;
the four local fusion emotion recognition networks comprise four local feature extraction units respectively input from four visual angles; the two local feature extraction units for left eye input and right eye input both comprise a deep convolution network I; the two local feature extraction units aiming at the left lower face input and the right lower face input are formed by sequentially connecting an embedding layer and an airspace map convolution network; the embedded layer is also connected with the action unit extractor; the airspace map convolution network is also connected with a face motion coding system; lower left face input and lower right face input; the outputs of the four local feature extraction units are simultaneously connected with the multi-layer perceptron, and are fused through channel attention and space attention;
The local fusion emotion recognition network III and the local fusion emotion recognition network IV respectively comprise an action amplifying network aiming at two local feature extraction units of left eye input and right eye input; the left eye input and the right eye input amplify smile expression through an action amplifying network respectively, then input a deep convolution network I, and extract local visual angle characteristics.
2. The emotion recognition method based on head wearable device perception according to claim 1, characterized in that: in the step X2, the first picture data and the first video data are preprocessed respectively before being processed by adopting a local fusion emotion recognition network;
the method comprises the steps of preprocessing picture data I, wherein preprocessing comprises the step of performing face detection by utilizing a serially connected preprocessing convolutional neural network I; the face detection by using the serially connected preprocessing convolutional neural network is that: generating candidate frames, carrying out preliminary screening on the candidate frames, and detecting key points of the human face; after convolution, activation function, pooling and full connection processing, the confidence coefficient, the coordinate offset and the coordinates of five key points of each candidate frame are output so as to realize face detection;
Preprocessing the first video data comprises performing face detection by utilizing a serial preprocessing multi-layer deep convolutional neural network II; the face detection by using the serially connected preprocessing multi-layer deep convolutional neural network II means that: using a video streaming mode to read video data one frame by frame; each frame of image of the first video data is operated by utilizing pyramid data after the two pairs of images of the serially-connected preprocessing multi-layer depth convolution neural network are changed in size, so that a face frame, key point coordinates and face classification are obtained, and face detection is realized; the preprocessing multi-layer deep convolutional neural network II comprises an image size changing layer, a convolutional neural unit I, a convolutional neural unit II, a maximum pooling layer I, a full connecting layer I, a convolutional neural unit III, a maximum pooling layer II and a full connecting layer II which are sequentially connected, and a space attention layer connected between the convolutional neural unit III and the maximum pooling layer II.
3. The emotion recognition method based on head wearable device perception according to claim 1, characterized in that: the step X3 refers to: and (3) fusing the emotion characteristics obtained in the step (X2) by adopting a multi-mode self-adaptive fusion module: the input of the multi-mode self-adaptive fusion module is emotion characteristics X= { X 1,…,Xn }, wherein X i is the ith emotion characteristic, and n is the number of emotion characteristics; feature fusion is performed iteratively by using an attention mechanism, and fusion features are finally obtained; inputting the fusion characteristics into a classifier for learning to obtain a composite emotion recognition result; the composite emotion recognition result adopts composite representation of emotion states; the composite representation of emotional states is: emotion category and corresponding proportion.
4. The emotion recognition method based on head wearable device perception according to claim 1, characterized in that: the multi-mode data acquisition device further comprises an audio acquisition module; in the step X1, the multimodal emotion data further includes audio data; in the step X2, emotion feature extraction is also performed on the audio data: filtering, smoothing and framing the audio data; extracting the characteristic of the mel-frequency spectrum coefficient; the mel cepstrum coefficient features are utilized and loaded into a feature vector form, and the feature vector form is input into BiLSTM neural networks based on an attention mechanism to extract emotion features;
In the step X1, the multi-modal emotion data further comprises text data; in the step X2, emotion feature extraction is also performed on the text data: and processing the text data by using a word2vec model to obtain a sequence context word vector representation, and extracting emotion characteristics by using an LSTM-based emotion analysis network.
5. The emotion recognition method based on head wearable device perception according to claim 1, characterized in that: the multi-mode data acquisition device further comprises a second camera module for acquiring pictures and videos of observed persons;
the emotion recognition method based on the perception of the head wearable equipment further comprises an observed person emotion recognition method; the emotion recognition method of the observed person comprises the following steps:
Step Y1, collecting multi-mode emotion data of an observed person; the multi-mode emotion data of the observed person comprises picture data II and video data II which are acquired through a camera module II;
step Y2, extracting emotion characteristics of the multi-mode emotion data of the observed person:
Carrying out emotion feature extraction on the picture data II by adopting an expression convolutional neural network I, an expression depth convolutional neural network II and a picture neural network respectively to obtain macro-expression emotion features, micro-expression emotion features and gesture emotion features of the picture data II;
extracting emotion characteristics of the video data II by adopting a three-dimensional convolutional neural network III and a convolutional neural network IV based on peak frame optical flow respectively to obtain macro expression emotion characteristics and micro expression emotion characteristics of the video data II;
The fourth convolutional neural network based on the peak frame optical flow performs emotion feature extraction, which means that: firstly, carrying out preprocessing of rotation, cutting and face alignment on video data II, and extracting a peak frame by using a peak frame detection algorithm; then, calculating optical flow vectors u, v and optical strain epsilon between the initial frame and the peak frame of the video data II; after the graying treatment, respectively taking the three channels as RGB images to synthesize an RGB image; extracting emotion characteristics of the RGB image;
The optical flow vectors u, v and optical strain ε are calculated using the following set of equations:
Wherein, I (x, y, t) represents the light intensity of the pixel point at the initial frame; t represents the time dimension in which it is located; dx, dy represents the movement values of the abscissa and the ordinate of the start frame to the peak frame, respectively; dt represents the time it takes for the start frame to move to the peak frame; order the ,/>,/>Respectively representing partial derivatives of gray scales of pixel points along x, y and t directions; /(I)Wherein, I x,Iy,It is obtained by image data;
and Y3, fusing the emotion characteristics obtained in the step Y2, and obtaining a composite emotion recognition result through classification.
6. The emotion recognition method based on head wearable device perception of claim 5, wherein: the method for establishing the individual data storage database is also included: constructing an individual ID in an individual data storage database; before the emotion recognition method of the wearer or the emotion recognition method of the observed person starts, the individual ID of the wearer or the observed person is acquired; after the emotion recognition method of the wearer or the observed person is completed, the emotion recognition result obtained by the emotion recognition method of the wearer or the observed person and the time and place are stored in the corresponding individual ID.
7. A readable storage medium, wherein the storage medium has stored thereon a computer program which, when executed by a processor, causes the processor to perform the emotion recognition method based on head wearable device perception of any of claims 1-6.
8. A computer device comprising a processor and a memory for storing a program executable by the processor, wherein the processor, when executing the program stored in the memory, implements the emotion recognition method of any of claims 1-6 based on head wearable device perception.
CN202410223747.7A 2024-02-29 2024-02-29 Emotion recognition method, medium and device based on head wearable device perception Active CN117809354B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410223747.7A CN117809354B (en) 2024-02-29 2024-02-29 Emotion recognition method, medium and device based on head wearable device perception

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410223747.7A CN117809354B (en) 2024-02-29 2024-02-29 Emotion recognition method, medium and device based on head wearable device perception

Publications (2)

Publication Number Publication Date
CN117809354A CN117809354A (en) 2024-04-02
CN117809354B true CN117809354B (en) 2024-06-21

Family

ID=90422178

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410223747.7A Active CN117809354B (en) 2024-02-29 2024-02-29 Emotion recognition method, medium and device based on head wearable device perception

Country Status (1)

Country Link
CN (1) CN117809354B (en)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109409222A (en) * 2018-09-20 2019-03-01 中国地质大学(武汉) A kind of multi-angle of view facial expression recognizing method based on mobile terminal

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016195474A1 (en) * 2015-05-29 2016-12-08 Charles Vincent Albert Method for analysing comprehensive state of a subject
US10667697B2 (en) * 2015-06-14 2020-06-02 Facense Ltd. Identification of posture-related syncope using head-mounted sensors
CN110313923B (en) * 2019-07-05 2022-08-16 昆山杜克大学 Autism early-stage screening system based on joint attention ability test and audio-video behavior analysis
CN111652159B (en) * 2020-06-05 2023-04-14 山东大学 Micro-expression recognition method and system based on multi-level feature combination
CN112232191B (en) * 2020-10-15 2023-04-18 南京邮电大学 Depression recognition system based on micro-expression analysis
CN113420591B (en) * 2021-05-13 2023-08-22 华东师范大学 Emotion-based OCC-PAD-OCEAN federal cognitive modeling method
CN113469153B (en) * 2021-09-03 2022-01-11 中国科学院自动化研究所 Multi-modal emotion recognition method based on micro-expressions, limb actions and voice
US20230237844A1 (en) * 2022-01-26 2023-07-27 The Regents Of The University Of Michigan Detecting emotional state of a user based on facial appearance and visual perception information
CN115205923A (en) * 2022-05-19 2022-10-18 重庆邮电大学 Micro-expression recognition method based on macro-expression state migration and mixed attention constraint
CN115169507B (en) * 2022-09-08 2023-05-19 华中科技大学 Brain-like multi-mode emotion recognition network, recognition method and emotion robot

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109409222A (en) * 2018-09-20 2019-03-01 中国地质大学(武汉) A kind of multi-angle of view facial expression recognizing method based on mobile terminal

Also Published As

Publication number Publication date
CN117809354A (en) 2024-04-02

Similar Documents

Publication Publication Date Title
CN109934176B (en) Pedestrian recognition system, recognition method, and computer-readable storage medium
US20190188903A1 (en) Method and apparatus for providing virtual companion to a user
CN112766173B (en) Multi-mode emotion analysis method and system based on AI deep learning
CN111666845B (en) Small sample deep learning multi-mode sign language recognition method based on key frame sampling
CN113920568B (en) Face and human body posture emotion recognition method based on video image
CN116825365B (en) Mental health analysis method based on multi-angle micro-expression
CN112016367A (en) Emotion recognition system and method and electronic equipment
CN113378649A (en) Identity, position and action recognition method, system, electronic equipment and storage medium
CN114724224A (en) Multi-mode emotion recognition method for medical care robot
CN116230234A (en) Multi-mode feature consistency psychological health abnormality identification method and system
CN114170537A (en) Multi-mode three-dimensional visual attention prediction method and application thereof
CN111079465A (en) Emotional state comprehensive judgment method based on three-dimensional imaging analysis
CN116092119A (en) Human behavior recognition system based on multidimensional feature fusion and working method thereof
RU2005100267A (en) METHOD AND SYSTEM OF AUTOMATIC VERIFICATION OF THE PRESENCE OF A LIVING FACE OF A HUMAN IN BIOMETRIC SECURITY SYSTEMS
CN113673308A (en) Object identification method, device and electronic system
CN116665281B (en) Key emotion extraction method based on doctor-patient interaction
CN117122324A (en) Practitioner psychological health detection method based on multi-mode emotion data fusion
CN117809354B (en) Emotion recognition method, medium and device based on head wearable device perception
Hou Deep learning-based human emotion detection framework using facial expressions
CN108197593B (en) Multi-size facial expression recognition method and device based on three-point positioning method
Kadhim et al. A face recognition application for Alzheimer’s patients using ESP32-CAM and Raspberry Pi
CN115035438A (en) Emotion analysis method and device and electronic equipment
CN114492579A (en) Emotion recognition method, camera device, emotion recognition device and storage device
KR20220144983A (en) Emotion recognition system using image and electrocardiogram
CN113255535A (en) Depression identification method based on micro-expression analysis

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant