WO2022067524A1 - 自动情绪识别方法、系统、计算设备及计算机可读存储介质 - Google Patents
自动情绪识别方法、系统、计算设备及计算机可读存储介质 Download PDFInfo
- Publication number
- WO2022067524A1 WO2022067524A1 PCT/CN2020/118887 CN2020118887W WO2022067524A1 WO 2022067524 A1 WO2022067524 A1 WO 2022067524A1 CN 2020118887 W CN2020118887 W CN 2020118887W WO 2022067524 A1 WO2022067524 A1 WO 2022067524A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- data
- emotion recognition
- emotion
- automatic
- face
- Prior art date
Links
- 230000008909 emotion recognition Effects 0.000 title claims abstract description 119
- 238000000034 method Methods 0.000 title claims abstract description 54
- 238000012360 testing method Methods 0.000 claims abstract description 10
- 230000008451 emotion Effects 0.000 claims description 50
- 238000013527 convolutional neural network Methods 0.000 claims description 28
- 238000000605 extraction Methods 0.000 claims description 23
- 238000012549 training Methods 0.000 claims description 17
- 238000010606 normalization Methods 0.000 claims description 10
- 238000007781 pre-processing Methods 0.000 claims description 7
- 238000012545 processing Methods 0.000 claims description 7
- 230000002996 emotional effect Effects 0.000 claims description 6
- 210000003128 head Anatomy 0.000 claims description 6
- 210000001747 pupil Anatomy 0.000 claims description 5
- 238000011156 evaluation Methods 0.000 claims description 4
- 230000001815 facial effect Effects 0.000 claims description 3
- 230000004434 saccadic eye movement Effects 0.000 claims description 3
- 238000013519 translation Methods 0.000 claims description 3
- 230000004630 mental health Effects 0.000 abstract description 8
- 230000036651 mood Effects 0.000 abstract description 2
- 238000000537 electroencephalography Methods 0.000 description 19
- 238000011176 pooling Methods 0.000 description 17
- 230000006870 function Effects 0.000 description 14
- 230000037007 arousal Effects 0.000 description 13
- 210000002569 neuron Anatomy 0.000 description 13
- 230000004913 activation Effects 0.000 description 10
- 238000005516 engineering process Methods 0.000 description 7
- 238000004891 communication Methods 0.000 description 6
- 230000000694 effects Effects 0.000 description 6
- 238000010586 diagram Methods 0.000 description 5
- 230000008569 process Effects 0.000 description 4
- 230000009471 action Effects 0.000 description 3
- 238000004590 computer program Methods 0.000 description 3
- 230000008878 coupling Effects 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 238000013135 deep learning Methods 0.000 description 3
- 208000019901 Anxiety disease Diseases 0.000 description 2
- 230000036506 anxiety Effects 0.000 description 2
- 230000008901 benefit Effects 0.000 description 2
- 238000013136 deep learning model Methods 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 238000001514 detection method Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000002360 preparation method Methods 0.000 description 2
- 238000005070 sampling Methods 0.000 description 2
- 230000001360 synchronised effect Effects 0.000 description 2
- 206010022998 Irritability Diseases 0.000 description 1
- 230000002411 adverse Effects 0.000 description 1
- 230000004888 barrier function Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000007405 data analysis Methods 0.000 description 1
- 238000013480 data collection Methods 0.000 description 1
- 238000003066 decision tree Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
- 238000007499 fusion processing Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000010365 information processing Effects 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 238000003062 neural network model Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 208000020016 psychiatric disease Diseases 0.000 description 1
- 230000009323 psychological health Effects 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 238000012706 support-vector machine Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
Definitions
- the present invention relates to the field of computer technology, and in particular, to an automatic emotion recognition method, system, computing device and computer-readable storage medium.
- an automatic emotion recognition method comprising: acquiring emotion recognition data from multiple different data sources; inputting the emotion recognition data into a trained emotion recognition model for recognition, and acquiring emotion Identify the results.
- the emotion recognition data from multiple different data sources include: EEG data, eye data, and face data.
- the face data includes three-dimensional face data.
- the automatic emotion recognition method further includes: performing a preprocessing operation on the EEG data, eye data and face data.
- the preprocessing operations include: extraction of valid values of EEG data, reset and normalization of invalid values of eye data, and normalization of face data.
- a filter is used to extract the effective value of the EEG data.
- the invalid value of the eye data includes eye data collected with eyes closed and saccade, and the normalization of the eye data is completed by solving the pupil diameter fluctuation.
- the normalization of the face data includes: setting a reference point, taking the reference point as a reference point, and correcting the depth value in each frame, so as to perform head translation and facial data analysis. Normalized.
- the emotion recognition result is classified into emotion according to the emotion ring pattern theory.
- the emotion recognition model includes a multi-branch deep convolutional neural network, and the emotion recognition data of different data sources are respectively extracted by corresponding deep convolutional neural networks, and then processed through a fully convolutional neural network. network for fusion processing.
- the deep convolutional neural network corresponding to the emotion recognition data of different data sources includes a plurality of feature extraction layers in series, and the deep convolutional neural network outputs the emotion recognition data corresponding to the data source Characteristics.
- the fully convolutional neural network includes ⁇ fully connected layers and ⁇ -1 Dropout layers, the ⁇ fully connected layers and ⁇ -1 Dropout layers are alternately connected in series, and the fully connected layers are alternately connected in series.
- the last fully connected layer of the convolutional neural network outputs the emotion classification with the maximum probability corresponding to the input emotion recognition data as the final emotion recognition result, where ⁇ 1.
- the training of the emotion recognition model includes the following steps: collecting original sample data from multiple different data sources; processing the original sample data collected from multiple different data sources into multiple Emotion recognition training data from different data sources; set emotion classification labels on the emotion recognition training data.
- the automatic emotion recognition method further includes: superimposing noise on the original sample data to increase the data volume of emotion recognition training data obtained based on the original sample data.
- the emotion tag is set according to the emotion ring pattern theory.
- the collection of the original sample data includes: the subject prepares by closing his eyes for no less than half a minute and opening his eyes for no less than half a minute; the prepared subject completes G
- the subjects will collect and record in the T channel through N human head electrode sensors to generate the Seeg EEG data of ⁇ T, where Seeg is the number of samples recorded in a time period; eye data is recorded using an eye data acquisition instrument to create two-dimensional eye data of S eye ⁇ E, where S eye is in The number of samples recorded in a time period; and the depth of the face captured by a face data acquisition device through a window of resolution W ⁇ H, by recording the depth of the face points associated with each pixel of the image frame, creating S The three-dimensional face data of face ⁇ W ⁇ H, where S face represents a frame sequence of face depth value sampling in a time period, where G, N, T, E, W, H are positive integers.
- the labels of the emotion classification are collected through emotion-guided video and game trials, and are obtained according to self-evaluation questionnaires completed by the subjects respectively after each trial.
- an automatic emotion recognition system comprising: an emotion recognition data acquisition module for acquiring emotion recognition data from multiple different data sources; an emotion recognition result acquisition module for The emotion recognition data is input into the trained emotion recognition model for recognition, and the emotion recognition result is obtained.
- the present invention also provides a computing device including a memory and a processor, the memory stores a program, and the processor implements the above method when executing the program.
- the present invention also provides a computer-readable storage medium on which a program is stored, and when the program is executed by a processor, the above method is implemented.
- the automatic emotion recognition method of the invention can reliably and automatically recognize the psychological emotion of the subject, is helpful for the evaluation and auxiliary detection of the psychological health status of the subject, and has high scientific value and wide application prospect.
- FIG. 1 is a flowchart of a method according to an embodiment of the present invention.
- FIG. 2 is an emotional ring pattern theory (ie, arousal-valence emotional graph) according to an embodiment of the present invention.
- FIG. 3 is a deep convolutional neural network for emotion recognition according to an embodiment of the present invention.
- FIG. 4 is a branch network FDCNN used for feature extraction on input data in FIG. 3 according to an embodiment of the present invention.
- FIG. 5 is a fully convolutional neural network IDNN used for fusing the output results of each branch network in FIG. 3 according to an embodiment of the present invention.
- FIG. 6 is a system structure diagram of an embodiment of the present invention.
- FIG. 7 is an internal structural diagram of a computing device according to an embodiment of the present invention.
- first, second, third, etc., numbered attributes may be used herein to describe various means, elements, components or sections, these means, elements, components or sections should not be limited by these attributes . That is, these attributives are only used to distinguish one from the other.
- the first device may also be referred to as the second device without departing from the technical solution of the present invention.
- the terms "and/or", “and/or” are meant to include all combinations of any one or more of the listed items.
- an automatic emotion recognition method disclosed in an embodiment of the present invention includes: acquiring emotion recognition data from multiple different data sources; inputting the emotion recognition data into a trained emotion recognition model for recognition, and obtaining emotion recognition data. result.
- the main process is: use the collected multimodal data sources, such as electroencephalography (EEG) data, eye and face data, to perform emotion recognition through a multi-branch deep convolutional neural network, and get the valence- Arousal represents four distinct categories of emotions: low-low, low-high, high-low, and high-high. With further refinement, any emotion can be effectively identified, including common emotions such as happiness, anger, and anxiety.
- EEG electroencephalography
- the steps include:
- Human physiological signal data may include, for example, EEG data, eye data, and face data.
- the preprocessing operations may include EEG effective value extraction, eye data invalid value reset and normalization, and face data normalization.
- the deep convolutional network may be a multi-branch deep convolutional neural network, and the collected data passes through the corresponding deep convolutional neural network, and finally passes through the full convolutional neural network for feature fusion and classification.
- Emotion recognition including training sample settings, model training and emotion recognition. First, use qualified samples to train the emotion recognition model, and then perform emotion recognition on the trained model until all emotion recognition tasks are completed. Among them, the training samples can increase the data volume by superimposing noise on the original sample data.
- R1 arousal level and R2 valence grades can be divided into high and low arousal - valence combinations for emotion classification, and R1 ⁇ R2 emotion recognition classification can also be performed.
- the data acquisition scheme is as follows:
- Subjects are ready (such as closing their eyes for 1 minute (not less than half a minute) and opening their eyes for 2 minutes (not less than half a minute)) in order to complete subsequent video and game trials (eg G video and game trials, where G is a positive integer).
- SAM Self-Assessment Questionnaire
- EEG data EEG signals were acquired in the T channel through N human head electrode sensors. Recording generates Seeg ⁇ T EEG data, where Seeg is the number of samples recorded in a time period, and N, T are positive integers.
- Eye data Record eye data using an eye data acquirer. Record the gaze position, that is, the (x, y) coordinates of the computer screen, and the pupil diameter value. Create 2D eye data of S eye ⁇ E, where S eye is the number of samples recorded in a time period, and E is a positive integer, such as 6.
- Face Data Use a face data acquisition device to capture the depth of a face through a window of resolution W ⁇ H, creating a S face ⁇ W ⁇ H by recording the depth of the face points associated with each pixel of the image frame
- the three-dimensional (3D) face data of , S face represents the face depth value sampling frame sequence in a time period, W, H are positive integers.
- EEG data Extract valid EEG using filters (eg bandpass frequency filters) etc.
- Eye data reset the invalid value of invalid samples (that is, the data collected when the eyes are closed, saccade, etc.), and normalize the pupil diameter value (complete the normalization of eye data by solving the pupil diameter fluctuation) .
- Face data Set reference points, use the reference points as reference points, correct the values (such as depth values) in each frame, perform head translation and normalization of face data, etc.
- Multi-level arousal (Arousal) and valence (Valence) based on James Russell's circumplex model of emotion are used as class labels corresponding to emotions, where valence measures the degree of unpleasantness associated with emotion, Arousal is the level of calm to excitement associated with emotion, as shown in Figure 2.
- This step is mainly to build a regression model that can perform 3D keypoint tracking and emotion estimation at the same time.
- the present invention can be implemented using a variety of deep learning models, including but not limited to the following deep models.
- Convolutional neural network mainly includes three structures: convolutional layer, pooling layer, and fully connected layer.
- the convolution layer implements feature extraction for the input data X, and it contains multiple convolution kernels inside. Assuming that the size of the input data X and the convolution kernel K are s ⁇ t and p ⁇ q, respectively, the output of the convolutional layer is the feature map O, and its size is (s-p+1) ⁇ (t-q+1), Then each neuron O ij in the two-dimensional feature map O is calculated by the two-dimensional convolution of its bias B and input data X and weight K:
- the three-dimensional convolution operation is similar to the two-dimensional convolution, except that X and K are three-dimensional.
- the feature map O output by the convolutional layer is modified by the startup function and then passed to the pooling layer of size m ⁇ n for feature selection and information filtering.
- Each element O' ij in the output O' of the pooling layer is obtained by the following formula:
- the size of the input feature map O is am ⁇ bn, and a and b are positive real numbers, then the size of the output O′ is a ⁇ b.
- the 3D pooling operation is similar to the 2D pooling, except that O and O' are 3D data.
- Each neuron in a fully connected layer is fully connected to all neurons in the previous layer.
- the emotion recognition model of the present invention is composed of three deep convolutional neural networks (FDCNN) for feature extraction and one summary DNN (IDNN).
- FDCNN deep convolutional neural networks
- IDNN summary DNN
- FDCNN 1 is responsible for feature extraction of two-dimensional EEG data. It is assumed that the combination of one convolution layer and one pooling layer is a feature extraction unit layer (referred to as feature extraction layer for short), and FDCNN 1 consists of r (r ⁇ 2) The unit layers are connected in series.
- the preprocessed EEG data enters the first feature extraction layer: C 11 two-dimensional convolutional layer L 1 composed of 11 p 11 ⁇ q 11 kernel neurons, the output of which is activated by the activation function and enters m 11 ⁇ n 11 pooling layer L 2 ; the output of L 2 is sent to the second feature extraction layer: C 12 two-dimensional convolutional layer L 3 of p 12 ⁇ q 12 kernel neurons, the output of which is activated by the activation function, enters m 12 ⁇ n 12 pooling layer L 4 ; ...; the output of layer L 2r-2 is fed into C 1(r/2) p 1(r/2) ⁇ q 1(r/2) kernel neurons
- the two-dimensional convolutional layer L 2r-1 of whose output is activated by the activation function, enters the pooling layer L 2r of m 1(r/2) ⁇ n 1(r/2) , and its output is the extracted EEG data Features Feeg .
- FDCNN 2 is responsible for feature extraction of two-dimensional eye data. It is assumed that the combination of one convolution layer and one pooling layer is a feature extraction unit layer (referred to as feature extraction layer), and FDCNN 2 consists of r (r ⁇ 2) The unit layers are connected in series.
- the preprocessed eye data enters the first feature extraction layer: C 21 2-dimensional convolutional layer L 1 composed of p 21 ⁇ q 21 kernel neurons, the output of which is activated by the activation function and enters m 21 ⁇
- the two-dimensional convolution layer L 2r-1 of whose output is activated by the activation function, enters the pooling layer L 2r of m 2(r/2) ⁇ n 2(r/2) , and its output is the extracted eye Data feature F eye .
- FDCNN 3 is responsible for feature extraction of 3D face data. It is assumed that the combination of a convolution layer and a pooling layer is a feature extraction unit layer (referred to as feature extraction layer for short), and FDCNN 3 consists of r (r ⁇ 2) The unit layers are connected in series.
- the preprocessed face data enters the first feature extraction layer: C 31 p 31 ⁇ q 31 ⁇ z 31 kernel neurons composed of three-dimensional convolution layer L 1 , the output of which is activated by the activation function, enters m
- the three-dimensional convolutional layer L 3 of the kernel neuron its After the output is activated by the activation function, it enters the pooling layer L 4 of m 32 ⁇ n 32 ⁇ l 32 ; ...; the output of the layer L 2r-2 inputs C 3(r/2) p 3(r/2) ⁇ q 3(r/2) ⁇ z 3(r/2) three-dimensional convolutional layer L 2r-1 of the nucleus neuron, the output of which is activated by the activation function, enters m 3(r/2) ⁇ n 3(r/ 2) ⁇ l 3
- IDNN consists of ⁇ ( ⁇ 1 ) fully connected layers (F 1 , F 2 , ... F ⁇ ) and ⁇ - 1 Dropout layers (D 1 , D 2 ,...D ⁇ -1 ) in series. Among them, the output connection of the fully connected layer activates the activation function.
- the output of each neuron in F ⁇ serves as the emotion classification with the maximum probability that the input modality data has.
- IDNN can output probabilities for at least four emotion categories (valence-arousal): low-low, low-high, high-low, high-high. The one with the highest probability is taken as the final emotion recognition result. This classification result is further subdivided, which can be extended to R 1 ⁇ R 2 emotion recognition results.
- the automatic emotion recognition method based on the deep learning technology and the multimodal data source disclosed in the embodiment of the present invention has the following main features:
- the effective feature information is extracted from the input original signal, and the emotion-related features are extracted from the information, and the emotion recognition task is realized.
- the end-to-end structure can realize the extraction and classification of emotion-related features at the same time, without the need to manually design complex emotion-related features.
- the advantages of the automatic emotion recognition method based on deep learning technology and multimodal data sources disclosed in the embodiments of the present invention include:
- the reliability is good.
- Multiple data sources enhance the reliability of emotion recognition.
- Combining EEG data, face and eye data three types of emotion recognition data sources for emotion recognition, compared with a single data source signal, improves the reliability of emotion recognition.
- the multi-branch deep convolutional neural network is used for emotion recognition, which improves the accuracy of emotion recognition. Compared with support vector machines, decision trees, linear combination neural network models, etc., it further improves the accuracy of sentiment classification.
- the automatic emotion recognition technology based on deep learning technology and multi-modal data sources proposed by it can accurately identify students' psychological emotions, and its high reliability and high accuracy emotion recognition can effectively classify students' emotions. Based on the assessment of students' mental health, assist teachers to grasp students' emotional trends, take more reasonable and personalized educational countermeasures, and help them relieve psychological pressure and improve students' physical and mental health.
- the present invention not only has the advantages of good reliability, high accuracy, and various types of identifiable emotions, but also has a relatively simple data acquisition method. For example, users can complete data acquisition by watching a video or playing a game without causing any psychological burden. Products incorporating this method will greatly help students' emotion recognition, so they have broad promotion prospects. Through the further promotion of the present invention, it will help the emotion recognition and mental health auxiliary detection of millions of students in many schools across the country. Therefore, the present invention has high scientific value and broad application prospect.
- an automatic emotion recognition system includes: an emotion recognition data acquisition module for acquiring emotion recognition data from multiple different data sources; an emotion recognition result acquisition module for The emotion recognition data is input into the trained emotion recognition model for recognition, and the emotion recognition result is obtained.
- the methods of the embodiments of the present invention may be implemented in a computing device.
- An exemplary internal structure diagram of a computing device may be shown in FIG. 7 , and the computing device may include a processor, a memory, an external interface, a display, and an input device connected through a system bus.
- the processor is used to provide computing and control capabilities.
- the memory includes non-volatile storage media, internal memory.
- the non-volatile storage medium stores an operating system, an application program, a database, and the like.
- the internal memory provides an environment for the operation of the operating system and programs in the non-volatile storage medium.
- the external interface includes, for example, a network interface for communicating with an external terminal through a network connection.
- the external interface may also include a USB interface and the like.
- the display of the computing device may be a liquid crystal display screen or an electronic ink display screen
- the input device may be a touch layer covered on the display screen, or may be, for example, a button, a trackball or a touchpad set on the casing of the computing device, or a An external keyboard, trackpad, or mouse, etc.
- the program stored in the non-volatile storage medium in the computing device can implement the above method when executed by the processor.
- the non-volatile storage medium may also exist in a separate physical form, such as a U disk, when it is connected to a processor, the program stored on the U disk is executed to implement the above method.
- the method of the present invention can also be implemented as an APP (application program) in the Apple or Android application market for users to download and run on their respective mobile terminals.
- FIG. 7 is only a block diagram of a partial structure related to the solution of the present application, and does not constitute a limitation on the computing device to which the solution of the present application is applied.
- the specific computing device may be Include more or fewer components than shown in the figures, or combine certain components, or have a different arrangement of components.
- Nonvolatile memory may include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory.
- Volatile memory may include random access memory (RAM) or external cache memory.
- RAM is available in various forms such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain Road (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.
- the computer described in the present invention is a computing device in a broad sense that can automatically perform numerical calculation and/or information processing according to pre-set or stored instructions, and its hardware may include at least one memory, at least one processor, and at least one communication bus. Wherein, the communication bus is used to realize the connection communication between these elements.
- a processor may include, but is not limited to, a microprocessor.
- Computer hardware may also include Application Specific Integrated Circuit (ASIC), Programmable Gate Array (Field-Programmable Gate Array, FPGA), Digital Signal Processor (DSP), embedded devices, and the like.
- the computer may also include network equipment and/or user equipment.
- the network device includes but is not limited to a single network server, a server group composed of multiple network servers, or a cloud composed of a large number of hosts or network servers based on cloud computing, wherein cloud computing is distributed computing A super virtual computer consisting of a group of loosely coupled sets of computers.
- the computing device may be, but is not limited to, any terminal such as a personal computer, a server, etc., which can perform human-computer interaction with the user through a keyboard, a touchpad, or a voice-activated device.
- the computing device herein may also include a mobile terminal, which may be, but is not limited to, any electronic device that can perform human-computer interaction with the user through a keyboard, a touchpad, or a voice-activated device, for example, a tablet computer, a smart phone, Personal digital assistant (Personal Digital Assistant, PDA), smart wearable devices and other terminals.
- the network where the computing device is located includes, but is not limited to, the Internet, a wide area network, a metropolitan area network, a local area network, a virtual private network (Virtual Private Network, VPN), and the like.
- the memory is used to store program codes.
- the memory may be a circuit with a storage function that does not have a physical form in an integrated circuit, such as RAM (Random-Access Memory, random access memory), FIFO (First In First Out), and the like.
- the memory can also be a memory with a physical form, such as a memory stick, a TF card (Trans-flash Card), a smart media card (smart media card), a secure digital card (secure digital card), a flash memory card ( flash card) and other storage devices, etc.
- the processor may include one or more microprocessors, digital processors.
- the processor may invoke program code stored in the memory to perform the associated functions.
- the respective modules described in FIG. 6 are program codes stored in the memory and executed by the processor to implement the above method.
- the processor is also called a central processing unit (CPU, Central Processing Unit), which can be a very large-scale integrated circuit, and is a computing core (Core) and a control core (Control Unit).
- the disclosed apparatus may be implemented in other manners.
- the device embodiments described above are only illustrative, for example, the division of the units is only a logical function division, and there may be other division methods in actual implementation, for example, multiple units or elements may be combined or Integration into another system, or some features can be ignored, or not implemented.
- the shown or discussed mutual coupling or direct coupling or communication connection may be through some interfaces, indirect coupling or communication connection of devices or units, and may be in electrical or other forms.
- the units described as separate components may or may not be physically separated, and components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution in this embodiment.
- each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit.
- the above-mentioned integrated units may be implemented in the form of hardware, or may be implemented in the form of software functional units.
- the integrated unit if implemented in the form of a software functional unit and sold or used as an independent product, may be stored in a computer-readable storage medium.
- the technical solution of the present invention is essentially or the part that contributes to the prior art, or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium , including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present invention.
- the aforementioned storage medium includes: U disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), mobile hard disk, magnetic disk or optical disk and other media that can store program codes .
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Image Analysis (AREA)
Abstract
一种自动情绪识别方法、系统、计算设备及计算机可读存储介质。所述自动情绪识别方法包括:获取来自多个不同数据源的情绪识别数据;将所述情绪识别数据输入至已经过训练的情绪识别模型进行识别,获取情绪识别结果。该方法,可以可靠的自动识别受试者的心理情绪,有助于对受试者的心理健康状况的评估和辅助检测。
Description
本发明涉及计算机技术领域,具体的,涉及一种自动情绪识别方法、系统、计算设备及计算机可读存储介质。
在当今世界,众多人士,尤其是学生,面临许多心理健康问题,其中不良情绪如焦虑、激动、易怒等,对学生的学习、生活和身心健康带来不利影响。根据一项对全国12.6万大学生的调查显示,约20.23%的人有不同程度的心理障碍,严重影响正常生活和学习的开展。据统计,因各种心理疾病而休、退学的大学生人数已占总休、退学人数的50%左右。然而,目前市面上流行的心理健康测评形式及其他情绪测试系统仅仅是通过做面对面交流或线上问卷来进行心理测试,其效率及可靠性都相当有限,有待于改进。
发明内容
根据本发明的一个方面,提供了一种自动情绪识别方法,包括:获取来自多个不同数据源的情绪识别数据;将所述情绪识别数据输入至已经过训练的情绪识别模型进行识别,获取情绪识别结果。
在本发明的一实施例中,所述来自多个不同数据源的情绪识别数据包括:脑电图数据、眼部数据和脸部数据。
在本发明的一实施例中,所述脸部数据包括三维脸部数据。
在本发明的一实施例中,所述的自动情绪识别方法还包括:对所述脑电图数据、眼部数据和脸部数据进行预处理操作。
在本发明的一实施例中,所述预处理操作包括:脑电图数据的有效值提取,眼部数据的无效值重置和归一化,以及脸部数据的归一化。
在本发明的一实施例中,利用滤波器提取脑电图数据的有效值。
在本发明的一实施例中,所述眼部数据的无效值包括闭眼和眼跳采集的眼部数据,通过求解瞳孔直径波动完成眼部数据归一化。
在本发明的一实施例中,脸部数据的归一化包括:设置参考点,以参考点为基准点,对每一帧中的深度值进行纠正,以进行头部平移及脸部数据的归一化。
在本发明的一实施例中,所述情绪识别结果依据情绪环状模式理论进行情绪分类。
在本发明的一实施例中,所述情绪识别模型包括多分支深度卷积神经网络,不同数据源的情绪识别数据各自通过对应的深度卷积神经网络进行相应特征提取,而后经过全卷积神经网络进行融合处理。
在本发明的一实施例中,所述不同数据源的情绪识别数据对应的深度卷积神经网络包括多个串联的特征提取层,以及所述深度卷积神经网络输出对应数据源的情绪识别数据的特征。
在本发明的一实施例中,所述全卷积神经网络包括θ个全连接层和θ-1个Dropout层,所述θ个全连接层和θ-1个Dropout层交替串联,所述全卷积神经网络的最后一个全连接层输出对应输入的情绪识别数据所具有的最大概率的情感分类作为最终的情绪识别结果,其中,θ≥1。
在本发明的一实施例中,所述情绪识别模型的训练包括如下步骤:从多个不同数据源采集原始样本数据;将从多个不同数据源采集到的原始样本数据分别处理为来自多个不同数据源的情绪识别训练数据;对所述情绪识别训练数据设置情绪分类标签。
在本发明的一实施例中,所述的自动情绪识别方法还包括:将噪声叠加到原始样本数据上以增加基于所述原始样本数据得到的情绪识别训练数据的数据量。
在本发明的一实施例中,所述情绪标签依据情绪环状模式理论设置。
在本发明的一实施例中,所述原始样本数据的采集包括:受试者通过闭上眼睛不少于半分钟,睁开眼睛不少于半分钟进行准备;准备好的受试者完成G个视频和游戏试验,每次试验结束后完成情绪问卷;在受试者进行每次的视频和游戏试验中,通过N个人体头部电极感测器在T通道中采集并记录,生成S
eeg×T的脑电图数据, 其中S
eeg是在一个时间周期中记录的样本数量;使用眼部数据获取仪记录眼部数据,创建S
eye×E的二维眼部数据,其中S
eye是在一个时间周期中记录的样本数量;以及使用脸部数据获取设备通过解析度W×H的窗口来捕捉脸的深度,通过记录与图像帧的每个像素相关联的脸部点的深度,创建S
face×W×H的三维脸部数据,其中S
face表示一个时间周期内脸部深度值采样帧序列,其中G,N,T,E,W,H为正整数。
在本发明的一实施例中,所述情绪分类的标签通过情绪引导的视频和游戏试验采集,根据受试者在每次试验后的情绪分别完成的自我评价问卷获得。
根据本发明的另一方面,提供了一种自动情绪识别系统,包括:情绪识别数据获取模块,用于获取来自多个不同数据源的情绪识别数据;情绪识别结果获取模块,用于将所述情绪识别数据输入至已经过训练的情绪识别模型进行识别,获取情绪识别结果。
本发明也提供了一种计算设备,包括存储器和处理器,所述存储器存储有程序,所述处理器执行所述程序时实现上述方法。
本发明还提供了一种计算机可读存储介质,其上存储有程序,所述程序被处理器执行时实现上述方法。
本发明的自动情绪识别方法,可以可靠的自动识别受试者的心理情绪,有助于对受试者的心理健康状况的评估和辅助检测,具有很高的科学价值和广泛的应用前景。
为了使本发明所解决的技术问题、采用的技术手段及取得的技术效果更加清楚,下面将参照附图详细描述本发明的具体实施例。但需声明的是,下面描述的附图仅仅是本发明的示例性实施例的附图,对于本领域的技术人员来讲,在不付出创造性劳动的前提下,可以根据这些附图获得其他实施例的附图。
图1是本发明实施例的方法流程图。
图2是本发明实施例的情绪环状模式理论(即唤醒-效价情绪图)。
图3是本发明实施例的情绪识别深度卷积神经网络。
图4是本发明实施例的图3中用于对输入数据进行特征提取的分支网络FDCNN。
图5是本发明实施例的图3中用于融合各分支网络输出结果的全卷积神经网络IDNN。
图6是本发明实施例的系统结构图。
图7是本发明实施例的计算设备的内部结构图。
现在将参考附图来更加全面地描述本发明的示例性实施例,虽然各示例性实施例能够以多种具体的方式实施,但不应理解为本发明仅限于在此阐述的实施例。相反,提供这些示例性实施例是为了使本发明的内容更加完整,更加便于将发明构思全面地传达给本领域的技术人员。在符合本发明的技术构思的前提下,在某个特定的实施例中描述的结构、性能、效果或者其他特征可以以任何合适的方式结合到一个或更多其他的实施例中。
在对于具体实施例的介绍过程中,对结构、性能、效果或者其他特征的细节描述是为了使本领域的技术人员对实施例能够充分理解。但是,并不排除本领域技术人员可以在特定情况下,以不含有上述结构、性能、效果或者其他特征的技术方案来实施本发明。
附图中的流程图仅是一种示例性的流程演示,不代表本发明的方案中必须包括流程图中的所有的内容、操作和步骤,也不代表必须按照图中所显示的的顺序执行。例如,流程图中有的操作/步骤可以分解,有的操作/步骤可以合并或部分合并,等等,在不脱离本发明的发明主旨的情况下,流程图中显示的执行顺序可以根据实际情况改变。
附图中的框图一般表示的是功能实体,并不一定必然与物理上独立的实体相对应。即,可以采用软件形式来实现这些功能实体,或在一个或多个硬件模块或集成电路中实现这些功能实体,或在不同网络和/或处理单元装置和/或微控制器装置中实现这些功能实体。
各附图中相同的附图标记表示相同或类似的元件、组件或部分,因而下文中可能省略了对相同或类似的元件、组件或部分的重复描述。还应理解,虽然本文中可能使用第一、第二、第三等表示编号的定语来描述各种器件、元件、组件或部分, 但是这些器件、元件、组件或部分不应受这些定语的限制。也就是说,这些定语仅是用来将一者与另一者区分。例如,第一器件亦可称为第二器件,但不偏离本发明实质的技术方案。此外,术语“和/或”、“及/或”是指包括所列出项目中的任一个或多个的所有组合。
参见图1,本发明实施例公开的自动情绪识别方法,包括:获取来自多个不同数据源的情绪识别数据;将所述情绪识别数据输入至已经过训练的情绪识别模型进行识别,获取情绪识别结果。其主要过程是:将采集到的多模态数据源,例如脑电图(electroencephalography,EEG)数据、眼部和脸部数据,通过多分支深度卷积神经网络进行情绪识别,得到用效价-唤醒表示的四大类不同的情绪分类:低-低,低-高,高-低,高-高。通过进一步细化,可有效识别包括快乐、生气、焦虑等常见的情绪在内的任意种情绪。
具体的,本发明实施例公开的自动情绪识别方法,步骤包括:
1)数据准备
1a)通过数据获取设备采集多模态人体生理信号数据。人体生理信号数据例如可以包括EEG数据,眼部数据,以及脸部数据。
1b)对数据进行预处理设置,使其满足情绪识别模型输入要求。其中,预处理操作可以包括EEG有效值提取,眼部数据无效值重置和归一化,以及脸部数据归一化。
1c)设置情绪分类标签。其中,情绪标签依据情绪环状模式(James Russell的circumplex model of emotion)理论设置,包括多级别唤醒加效价组合。
2)搭建模型:设计情绪识别的深度卷积网络。该深度卷积网络可以是多分支深度卷积神经网络,采集到的数据各自通过对应的深度卷积神经网络,最后经过全卷积神经网络进行特征融合和分类。
3)情绪识别:包括训练样本设置、模型训练和情绪识别。首先使用符合条件的样本进行情绪识别模型训练,在训练好的模型上进行情绪的识别,直至完成全部情绪识别任务。其中,训练样本可以通过将噪声叠加到原始样本数据上增加数据量。情绪分类,可以将R
1个唤醒水准和R
2个效价等级分别划分高、低的唤醒-效价的高低组合来进行情绪分类,也可进行R
1×R
2种的情绪识别分类。
下面对上述步骤进行详细说明。
1.数据准备
数据获取方案如下:
(1)受试者准备(比如闭上眼睛t
1分钟(不少于半分钟),睁开眼睛t
2分钟(不少于半分钟)),以便完成之后的视频和游戏试验(例如G个视频和游戏试验,其中G为正整数)。
(2)观看n
1个不同的有声视频,即进行视频试验。
(3)完成n
2个视听游戏,即进行游戏试验。
在每次视频试验或游戏试验后,受试者根据他或她在试验期间的情绪来完成自我评价问卷(SAM)。从效价(v=1…R
1)、唤醒(a=1…R
2)量化水准等级中分别选择一个,用以显示他/她在观看视频或游戏中整体情绪的效价水准和唤醒水准。
1a.数据采集
EEG数据:通过N个人体头部电极感测器在T通道中采集脑电图信号。记录生成S
eeg×T的EEG数据,其中S
eeg是在一个时间周期中记录的样本数量,N,T为正整数。
眼部数据:使用眼部数据获取仪记录眼部数据。记录注视位置,即电脑屏幕的(x,y)座标,和瞳孔直径值。创建S
eye×E的2D眼部数据,其中S
eye是在一个时间周期中记录的样本数量,E为正整数,例如可以为6。
脸部数据:使用脸部数据获取设备通过解析度W×H的窗口来捕捉人脸的深度,通过记录与图像帧的每个像素相关联的脸部点的深度,创建S
face×W×H的三维(3D)人脸数据,S
face表示一个时间周期内脸部深度值采样帧序列,W,H为正整数。
1b.数据预处理
EEG数据:利用滤波器(例如带通频率滤波器)等提取有效EEG。
眼部数据:对无效样本(即出现闭眼、眼跳等情况时采集的数据)进行无效值重置,对瞳孔直径值进行归一化(通过求解瞳孔直径波动完成眼部数据归一化)。
脸部数据:设置参考点,以参考点为基准点,对每一帧中的值(例如深度值)进行纠正,进行头部平移及脸部数据归一化等。
1c.数据标签
基于James Russell的circumplex model of emotion(情绪环状模式)的多级唤醒(Arousal)和效价(Valence)被用作对应情绪的类标签,其中效价衡量的是与情绪相关的不愉快的程度,唤醒度是与情绪相关的平静到兴奋的程度,如图2所示。
2.深度学习模型构建
该步骤主要是为了构建可以同时进行三维关键点跟踪以及情绪估计的回归模型。本发明可以使用多种深度学习模型加以实现,包含但不仅限于以下深度模型。
本发明可以采用卷积神经网络技术进行情绪识别。卷积神经网络主要包含卷积层、池化层、全连接层三种结构。
卷积层实现对输入数据X的特征提取,其内部包含多个卷积核。假设输入数据X和卷积核K的大小分别为s×t,p×q,卷积层的输出为特征图O,其大小为(s-p+1)×(t-q+1),则二维特征图O中每一个神经元O
ij由其偏置B和输入数据X与权重K的二维卷积进行计算:
三维卷积操作与二维卷积类似,不同之处在于X和K为三维。
卷积层输出的特征图O经启动函数修正后传递至大小为m×n的池化层进行特征选择和信息过滤。池化层输出O′中每一个元素O′
ij由下式得出:
(2) O′
ij=P
(im-m)<k≤im,(jn-n)<l≤jnO
kl
其中,输入特征图O的大小为am×bn,a,b为正实数,则输出O′的大小为a×b。
三维池化操作与二维池化类似,不同之处在于O和O′为三维数据。
全连接层中的每个神经元与其前一层的所有神经元进行全连接。
本发明的情绪识别模型由三个用来做特征提取的深度卷积神经网络(FDCNN)和一个汇总DNN(IDNN)构成,结构如图3所示:
(1)FDCNN结构(参见图4)
FDCNN
1负责二维EEG数据的特征提取,假定1个卷积层和1个池化层的组合为1个特征提取单位层(简称为特征提取层),FDCNN
1由r(r≥2)个单位层串联构成。预处理后的EEG数据进入第1个特征提取层:C
11个p
11×q
11大小的核神经元构成的二维卷积层L
1,其输出经激活函数启动后,进入m
11×n
11的池化层L
2;L
2输出送入第2个特征提取层:C
12个p
12×q
12的核神经元的二维卷积层L
3,其输出经激活函数启动后,进入m
12×n
12的池化层L
4;…;层L
2r-2的输出送入C
1(r/2)个p
1(r/2)×q
1(r/2)的核神经元的二维卷积层L
2r-1,其输出经激活函数启动后,进入m
1(r/2)×n
1(r/2)的池化层L
2r,其输出为提取到的EEG数据特征F
eeg。
FDCNN
2负责二维眼部数据的特征提取,假定1个卷积层和1个池化层的组合为1个特征提取单位层(简称为特征提取层),FDCNN
2由r(r≥2)个单位层串联构成。预处理后的眼部数据进入第1个特征提取层:C
21个p
21×q
21大小的核神经元构成的二维卷积层L
1,其输出经激活函数启动后,进入m
21×n
21的池化层L
2;L
2输出送入第2个特征提取层:C
22个p
22×q
22的核神经元的二维卷积层L
3,其输出经激活函数启动后,进入m
22×n
22的池化层L
4;…;层L
2r-2的输出输入C
2(r/2)个p
2(r/2)×q
2(r/2)的核神经元的二维卷积层L
2r-1,其输出经激活函数启动后,进入m
2(r/2)×n
2(r/2)的池化层L
2r,其输出为提取到的眼部数据特征F
eye。
FDCNN
3负责三维脸部数据的特征提取,假定1个卷积层和1个池化层的组合为1个特征提取单位层(简称为特征提取层),FDCNN
3由r(r≥2)个单位层串联构成。预处理后的脸部数据进入第1个特征提取层:C
31个p
31×q
31×z
31大小的核神经元构成的三维卷积层L
1,其输出经激活函数启动后,进入m
31×n
31×l
31的池化层L
2;L
2输出送入第2个特征提取层:C
32个p
32×q
32×z
32的核神经元的三维卷积层L
3,其输出经激活函数启动后,进入m
32×n
32×l
32的池化层L
4;…;层L
2r-2的输出输入C
3(r/2)个p
3(r/2)×q
3(r/2)×z
3(r/2)的核神经元的三维卷积层L
2r-1,其输出经激活函数启动后,进入m
3(r/2)×n
3(r/2)×l
3(r/2)的池化层L
2r,其输出为提取到的眼部数据特征F
face。
(2)IDNN结构(参见图5)
提取到的三个模态数据特征F
eeg,F
eye,F
face连接后输入到IDNN模型中。IDNN由θ(θ≥1)个分别具有f
1,f
2,…f
θ个神经元的全连接层(F
1,F
2,…F
θ)和θ-1个Dropout层(D
1,D
2,…D
θ-1)串联而成。其中,全连接层输出连接激活启动函数。F
θ 中每个神经元的输出作为输入模态数据所具有的最大概率的情感分类。
3.情绪识别
从X名健康的受试者中分别采集到S
eeg,S
eye和S
face,经过预处理后,加入噪声数据,扩大样本量,最终得到的样本数据量为Yx。其中数据量中的k%被随机选择为测试数据,用于评估DCNN的准确性,即测试数据中来自CNN的类预测正确的概率。剩余的1-k%的样本数据作为训练CNN的训练数据集。数据集中的脑电图数据、眼部数据和脸部数据分别作为FDCNN
1,FDCNN
2,FDCNN
3的输入。FDCNN
1,FDCNN
2,FDCNN
3的输出一起作为IDNN的输入。重复分批次处理所有训练数据的反复运算操作进行模型训练。输入的不同模态数据和其对应标签用于CNN输出的计算以及权重和偏置的更新。
通过对情绪识别模型进行训练,IDNN可输出至少四种情感类别(效价-唤醒)的概率:低-低,低-高,高-低,高-高。取概率最大的一个作为最终的情绪识别结果。对此分类结果进一步细分,可扩展到R
1×R
2种情绪识别结果。
综上,本发明实施例公开的基于深度学习技术和多模态数据源的自动情绪识别方法,其主要特点包括:
1、通过构建多分支深度卷积神经网络,从输入的原始信号中提取有效的特征信息,并从这些信息中提取出情绪相关特征,实现了情绪识别任务。
2、能够实现端到端的训练和测试,与传统的情绪识别技术相比,端到端的结构能够同时实现情绪相关特征的提取和分类,不需要手工设计复杂的情绪相关特征。
3、获取EEG、脸部和眼部特征信息,利用多模态特征的相对性和互补性,实现更可靠、稳定的情绪识别。
4、利用唤醒加效价组合的评价标准,能够有效识别R
1个唤醒水准和R
2个效价水准组合而成的R
1×R
2(任意)种情绪,远远多于现有的情绪识别方法。
本发明实施例公开的基于深度学习技术和多模态数据源的自动情绪识别方法,与现有技术相比,其优势包括:
第一,可靠性好。多数据源增强了情绪识别可靠性。组合EEG数据、脸部和眼部数据三类情绪识别数据源进行情绪识别,相比较单一的数据来源信号,提升了情绪识别可靠性。
第二,精确度高。采用多分支深度卷积神经网络进行情绪识别,提高了情绪识别准确率。相比支持向量机,决策树,线性组合神经网络模型等,进一步提高了情绪分类准确度。
第三,可识别情绪种类多。几乎人类所有的情绪都可以用唤醒-效价这两个维度所构成的二维空间来表示。本方法依据James Russell的circumplex model of emotion(情绪环状模式)理论分类情绪,丰富了识别的情绪种类,可以容易地扩展到识别SAM问卷中的R
1个唤醒水准和R
2个效价水准组合而成的R
1×R
2(任意)种情绪。
正如背景技术所述,当今的学生面临许多心理健康问题,本发明可以用于学生情绪识别。其提出的基于深度学习技术和多模态数据来源的情绪自动识别技术能准确识别学生心理情绪,其高可靠性和高准确率的情绪识别可以有效对学生的情绪进行分类,一方面,有助于学生心理健康状况的评估,辅助教师把握学生情绪动向,采取更加合理与个性化的教育对策,有针对性地帮助他们缓解心理压力,提高学生身心健康水准。另一方面,通过对学生情绪进行分析可以有效识别学生学习中的情绪,准确识别高兴、沮丧等情绪,进而测评上课中的愉悦度及专注度,可辅助于个性化教学活动开展,有助于教师采取更加合理与个性化的教育对策指导学生的学习,为个性化教学提供了重要依据。本发明不仅具有可靠性好、准确率高、可识别情绪种类多等优势,其数据获取方式也相对简单。如,使用者通过观看视频或玩一段游戏即可完成数据获取,不会造成任何心理负担,融合此方法的产品将大大有助于学生的情绪识别,因而具备广阔的推广前景。通过本发明的进一步推广,将有助于全国众多学校的千万学生的情绪识别和心理健康辅助检测。因此,本发明具有很高的科学价值和广泛的应用前景。
进一步的,参考图6,本发明实施例的自动情绪识别系统,包括:情绪识别数据获取模块,用于获取来自多个不同数据源的情绪识别数据;情绪识别结果获取模块,用于将所述情绪识别数据输入至已经过训练的情绪识别模型进行识别,获取情绪识别结果。
本发明实施例的方法可以实现在计算设备中。计算设备的一个示例性的内部结构图可以如图7所示,该计算设备可以包括通过系统总线连接的处理器、存储器、 外界接口、显示器和输入装置。其中,处理器用于提供计算和控制能力。存储器包括非易失性存储介质、内存储器。该非易失性存储介质存储有操作系统,应用程序、数据库等。内存储器为非易失性存储介质中的操作系统和程序的运行提供环境。外界接口包括例如网络接口,用于与外部的终端通过网络连接通信。外界接口也可以包括USB接口等等。该计算设备的显示器可以是液晶显示屏或者电子墨水显示屏,输入装置可以是显示屏上覆盖的触摸层,也可以是例如计算设备外壳上设置的按键、轨迹球或触控板,还可以是外接的键盘、触控板或鼠标等。
计算设备中的非易失性存储介质存储的程序在被处理器执行时可以实现上述方法。另外,非易失性存储介质也可以以单独的物理形式存在,例如一U盘,当其与一处理器连接时,U盘上存储的程序被执行可以实现上述方法。本发明的方法,也可以实现为苹果或安卓应用市场中的一个APP(应用程序),供用户下载到各自的移动终端运行。
本领域技术人员可以理解,图7中示出的结构,仅仅是与本申请方案相关的部分结构的框图,并不构成对本申请方案所应用于其上的计算设备的限定,具体的计算设备可以包括比图中所示更多或更少的部件,或者组合某些部件,或者具有不同的部件布置。
如上所述,本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,所述的计算机程序可存储于一非易失性计算机可读存储介质中,该计算机程序在执行时,可包括如上述各方法的实施例的流程。其中,本申请所提供的各实施例中所使用的对存储器、存储、数据库或其它介质的任何引用,均可包括非易失性和/或易失性存储器。非易失性存储器可包括只读存储器(ROM)、可编程ROM(PROM)、电可编程ROM(EPROM)、电可擦除可编程ROM(EEPROM)或闪存。易失性存储器可包括随机存取存储器(RAM)或者外部高速缓冲存储器。作为说明而非局限,RAM以多种形式可得,诸如静态RAM(SRAM)、动态RAM(DRAM)、同步DRAM(SDRAM)、双数据率SDRAM(DDRSDRAM)、增强型SDRAM(ESDRAM)、同步链路(Synchlink)DRAM(SLDRAM)、存储器总线(Rambus)直接RAM(RDRAM)、直接存储器总线动态RAM(DRDRAM)、以及存储器总线动态RAM(RDRAM)等。
以上实施例的各技术特征可以进行任意的组合,为使描述简洁,未对上述实施例中的各个技术特征所有可能的组合都进行描述,然而,只要这些技术特征的组合不存在矛盾,都应当认为是本说明书记载的范围。
本发明所述的计算机,是广义上的一种能够按照事先设定或存储的指令,自动进行数值计算和/或信息处理的计算设备,其硬件可以包括至少一个存储器、至少一个处理器,以及至少一个通信总线。其中,所述通信总线用于实现这些元件之间的连接通信。处理器可以包括但不限于微处理器。计算机硬件还可以包括专用集成电路(Application Specific Integrated Circuit,ASIC)、可编程门阵列(Field-Programmable Gate Array,FPGA)、数字处理器(Digital Signal Processor,DSP)、嵌入式设备等。所述计算机还可包括网络设备和/或用户设备。其中,所述网络设备包括但不限于单个网络服务器、多个网络服务器组成的服务器组或基于云计算(Cloud Computing)的由大量主机或网络服务器构成的云,其中,云计算是分布式计算的一种,由一群松散耦合的计算机集组成的一个超级虚拟计算机。
计算设备可以是,但不限于任何一种可与用户通过键盘、触摸板或声控设备等方式进行人机交互的个人电脑、服务器等终端。本文中的计算设备还可以包括移动终端,其可以是,但不限于任何一种可与用户通过键盘、触摸板或声控设备等方式进行人机交互的电子设备,例如,平板电脑、智能手机、个人数字助理(Personal Digital Assistant,PDA)、智能式穿戴式设备等终端。计算设备所处的网络包括,但不限于互联网、广域网、城域网、局域网、虚拟专用网络(Virtual Private Network,VPN)等。
所述存储器用于存储程序代码。所述存储器可以是集成电路中没有实物形式的具有存储功能的电路,如RAM(Random-Access Memory,随机存取存储器)、FIFO(First In First Out)等。或者,所述存储器也可以是具有实物形式的存储器,如内存条、TF卡(Trans-flash Card)、智能媒体卡(smart media card)、安全数字卡(secure digital card)、快闪存储器卡(flash card)等储存设备等等。
所述处理器可以包括一个或者多个微处理器、数字处理器。所述处理器可调用存储器中存储的程序代码以执行相关的功能。例如,图6中所述的各个模块是存储在所述存储器中的程序代码,并由所述处理器所执行,以实现上述方法。所述处理 器又称中央处理器(CPU,Central Processing Unit),可以是一块超大规模的集成电路,是运算核心(Core)和控制核心(Control Unit)。
需要说明的是,对于前述的各方法实施例,为了简单描述,故将其都表述为一系列的动作组合,但是本领域技术人员应该知悉,本发明并不受所描述的动作顺序的限制,因为依据本发明,某些步骤可以采用其他顺序或者同时进行。其次,本领域技术人员也应该知悉,说明书中所描述的实施例均属于优选实施例,所涉及的动作和模块并不一定是本发明所必须的。
在上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述的部分,可以参见其他实施例的相关描述。
在本申请所提供的几个实施例中,应该理解到,所揭露的装置,可通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或元件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本发明的各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。
所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本发明的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可为个人计算机、服务器或者网络设备等)执行本发明各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、只读存储器 (ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、移动硬盘、磁碟或者光盘等各种可以存储程序代码的介质。
以上实施例仅用以说明本发明的技术方案,而非对其限制;尽管参照前述实施例对本发明进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本发明各实施例技术方案的范围。
以上所述实施例仅表达了本申请的几种实施方式,其描述较为具体和详细,但并不能因此而理解为对申请专利范围的限制。应当指出的是,对于本领域的普通技术人员来说,在不脱离本申请构思的前提下,还可以做出若干变形和改进,这些都属于本申请的保护范围。因此,本申请专利的保护范围应以所附权利要求为准。
Claims (20)
- 一种自动情绪识别方法,其特征在于,包括:获取来自多个不同数据源的情绪识别数据;将所述情绪识别数据输入至已经过训练的情绪识别模型进行识别,获取情绪识别结果。
- 如权利要求1所述的自动情绪识别方法,其特征在于,所述来自多个不同数据源的情绪识别数据包括:脑电图数据、眼部数据和脸部数据。
- 如权利要求2所述的自动情绪识别方法,其特征在于,所述脸部数据包括三维脸部数据。
- 如权利要求2所述的自动情绪识别方法,其特征在于,还包括:对所述脑电图数据、眼部数据和脸部数据进行预处理操作。
- 如权利要求4所述的自动情绪识别方法,其特征在于,所述预处理操作包括:脑电图数据的有效值提取,眼部数据的无效值重置和归一化,以及脸部数据的归一化。
- 如权利要求5所述的自动情绪识别方法,其特征在于,利用滤波器提取脑电图数据的有效值。
- 如权利要求5所述的自动情绪识别方法,其特征在于,所述眼部数据的无效值包括闭眼和眼跳采集的眼部数据,通过求解瞳孔直径波动完成眼部数据归一化。
- 如权利要求5所述的自动情绪识别方法,其特征在于,脸部数据的归一化包括:设置参考点,以参考点为基准点,对每一帧中的值进行纠正,以进行头部平移及脸部数据的归一化。
- 如权利要求2所述的自动情绪识别方法,其特征在于,所述情绪识别结果依据情绪环状模式理论进行情绪分类。
- 如权利要求1所述的自动情绪识别方法,其特征在于,所述情绪识别模型包括多分支深度卷积神经网络,不同数据源的情绪识别数据各自通过对应的深度卷积神经网络进行相应特征提取,而后经过全卷积神经网络进行融合处理。
- 如权利要求10所述的自动情绪识别方法,其特征在于,所述不同数据源的情绪识别数据对应的深度卷积神经网络包括多个串联的特征提取层,以及所述深度卷积神经网络输出对应数据源的情绪识别数据的特征。
- 如权利要求10所述的自动情绪识别方法,其特征在于,所述全卷积神经网络包括θ个全连接层和θ-1个Dropout层,所述θ个全连接层和θ-1个Dropout层交替串联,所述全卷积神经网络的最后一个全连接层输出对应输入的情绪识别数据所具有的最大概率的情感分类作为最终的情绪识别结果,其中,θ≥1。
- 如权利要求1所述的自动情绪识别方法,其特征在于,所述情绪识别模型的训练包括如下步骤:从多个不同数据源采集原始样本数据;将从多个不同数据源采集到的原始样本数据分别处理为来自多个不同数据源的情绪识别训练数据;对所述情绪识别训练数据设置情绪分类标签。
- 如权利要求13所述的自动情绪识别方法,其特征在于,还包括:将噪声叠加到原始样本数据上以增加基于所述原始样本数据得到的情绪识别训练数据的数据量。
- 如权利要求13所述的自动情绪识别方法,其特征在于,所述情绪标签依据情绪环状模式理论设置。
- 如权利要求13所述的自动情绪识别方法,其特征在于,所述原始样本数据的采集包括:受试者通过闭上眼睛不少于半分钟,睁开眼睛不少于半分钟进行准备;准备好的受试者完成G个视频和游戏试验,每次试验结束后完成情绪问卷;在受试者进行每次的视频或游戏试验过程中,通过N个人体头部电极感测器在T通道中采集并记录,生成S eeg×T的脑电图数据,其中S eeg是在一个时间周期中记录的样本数量;使用眼部数据获取仪记录眼部数据,创建S eye×E的二维眼部数据,其中S eye是在一个时间周期中记录的样本数量;以及使用脸部数据获取设备通过解析度W×H的窗口来捕捉脸部数据值,通过记录与图像帧的每个像素相关联的脸部点或脸部点的深度值,创建S face×W×H的脸部数据,其中S face表示一个时间周期内脸部值采样帧序列,其中G,N,T,E,W,H为正整数。
- 如权利要求16所述的自动情绪识别方法,其特征在于,所述情绪分类的标签通过情绪引导的视频和游戏试验采集,根据受试者在每次试验后的情绪分别完成的自我评价问卷获得。
- 一种自动情绪识别系统,其特征在于,包括:情绪识别数据获取模块,用于获取来自多个不同数据源的情绪识别数据;情绪识别结果获取模块,用于将所述情绪识别数据输入至已经过训练的情绪识别模型进行识别,获取情绪识别结果。
- 一种计算设备,包括存储器和处理器,所述存储器存储有程序,其特征在于,所述处理器执行所述程序时实现权利要求1-16任一所述的方法。
- 一种计算机可读存储介质,其上存储有程序,其特征在于,所述程序被处理器执行时实现权利要求1-16任一所述的方法。
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202080002247.0A CN114787883A (zh) | 2020-09-29 | 2020-09-29 | 自动情绪识别方法、系统、计算设备及计算机可读存储介质 |
PCT/CN2020/118887 WO2022067524A1 (zh) | 2020-09-29 | 2020-09-29 | 自动情绪识别方法、系统、计算设备及计算机可读存储介质 |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2020/118887 WO2022067524A1 (zh) | 2020-09-29 | 2020-09-29 | 自动情绪识别方法、系统、计算设备及计算机可读存储介质 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2022067524A1 true WO2022067524A1 (zh) | 2022-04-07 |
Family
ID=80949269
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2020/118887 WO2022067524A1 (zh) | 2020-09-29 | 2020-09-29 | 自动情绪识别方法、系统、计算设备及计算机可读存储介质 |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN114787883A (zh) |
WO (1) | WO2022067524A1 (zh) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114998440A (zh) * | 2022-08-08 | 2022-09-02 | 广东数业智能科技有限公司 | 基于多模态的测评方法、装置、介质及设备 |
CN115099311A (zh) * | 2022-06-06 | 2022-09-23 | 陕西师范大学 | 基于脑电时空频特征和眼动特征的多模态情绪分类方法 |
CN115186146A (zh) * | 2022-09-13 | 2022-10-14 | 北京科技大学 | 一种半结构化访谈与跨模态融合的情绪识别方法及装置 |
CN116369949A (zh) * | 2023-06-06 | 2023-07-04 | 南昌航空大学 | 一种脑电信号分级情绪识别方法、系统、电子设备及介质 |
CN116570289A (zh) * | 2023-07-11 | 2023-08-11 | 北京视友科技有限责任公司 | 一种基于便携式脑电的抑郁状态评估系统 |
CN116825365A (zh) * | 2023-08-30 | 2023-09-29 | 安徽爱学堂教育科技有限公司 | 基于多角度微表情的心理健康分析方法 |
CN117171557A (zh) * | 2023-08-03 | 2023-12-05 | 武汉纺织大学 | 基于脑电信号的自监督情绪识别模型的预训练方法及装置 |
CN117520826A (zh) * | 2024-01-03 | 2024-02-06 | 武汉纺织大学 | 一种基于可穿戴设备的多模态情绪识别方法及系统 |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20170185827A1 (en) * | 2015-12-24 | 2017-06-29 | Casio Computer Co., Ltd. | Emotion estimation apparatus using facial images of target individual, emotion estimation method, and non-transitory computer readable medium |
CN109199412A (zh) * | 2018-09-28 | 2019-01-15 | 南京工程学院 | 基于眼动数据分析的异常情绪识别方法 |
CN109730701A (zh) * | 2019-01-03 | 2019-05-10 | 中国电子科技集团公司电子科学研究院 | 一种情绪数据的获取方法及装置 |
CN110464366A (zh) * | 2019-07-01 | 2019-11-19 | 华南师范大学 | 一种情绪识别方法、系统及存储介质 |
CN111000556A (zh) * | 2019-11-29 | 2020-04-14 | 上海师范大学 | 一种基于深度模糊森林的情绪识别方法 |
CN111190484A (zh) * | 2019-12-25 | 2020-05-22 | 中国人民解放军军事科学院国防科技创新研究院 | 一种多模态交互系统和方法 |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110163063B (zh) * | 2018-11-28 | 2024-05-28 | 腾讯数码(天津)有限公司 | 表情处理方法、装置、计算机可读存储介质和计算机设备 |
-
2020
- 2020-09-29 WO PCT/CN2020/118887 patent/WO2022067524A1/zh active Application Filing
- 2020-09-29 CN CN202080002247.0A patent/CN114787883A/zh active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20170185827A1 (en) * | 2015-12-24 | 2017-06-29 | Casio Computer Co., Ltd. | Emotion estimation apparatus using facial images of target individual, emotion estimation method, and non-transitory computer readable medium |
CN109199412A (zh) * | 2018-09-28 | 2019-01-15 | 南京工程学院 | 基于眼动数据分析的异常情绪识别方法 |
CN109730701A (zh) * | 2019-01-03 | 2019-05-10 | 中国电子科技集团公司电子科学研究院 | 一种情绪数据的获取方法及装置 |
CN110464366A (zh) * | 2019-07-01 | 2019-11-19 | 华南师范大学 | 一种情绪识别方法、系统及存储介质 |
CN111000556A (zh) * | 2019-11-29 | 2020-04-14 | 上海师范大学 | 一种基于深度模糊森林的情绪识别方法 |
CN111190484A (zh) * | 2019-12-25 | 2020-05-22 | 中国人民解放军军事科学院国防科技创新研究院 | 一种多模态交互系统和方法 |
Cited By (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115099311B (zh) * | 2022-06-06 | 2024-03-19 | 陕西师范大学 | 基于脑电时空频特征和眼动特征的多模态情绪分类方法 |
CN115099311A (zh) * | 2022-06-06 | 2022-09-23 | 陕西师范大学 | 基于脑电时空频特征和眼动特征的多模态情绪分类方法 |
CN114998440A (zh) * | 2022-08-08 | 2022-09-02 | 广东数业智能科技有限公司 | 基于多模态的测评方法、装置、介质及设备 |
CN115186146A (zh) * | 2022-09-13 | 2022-10-14 | 北京科技大学 | 一种半结构化访谈与跨模态融合的情绪识别方法及装置 |
CN116369949A (zh) * | 2023-06-06 | 2023-07-04 | 南昌航空大学 | 一种脑电信号分级情绪识别方法、系统、电子设备及介质 |
CN116369949B (zh) * | 2023-06-06 | 2023-09-15 | 南昌航空大学 | 一种脑电信号分级情绪识别方法、系统、电子设备及介质 |
CN116570289A (zh) * | 2023-07-11 | 2023-08-11 | 北京视友科技有限责任公司 | 一种基于便携式脑电的抑郁状态评估系统 |
CN117171557B (zh) * | 2023-08-03 | 2024-03-22 | 武汉纺织大学 | 基于脑电信号的自监督情绪识别模型的预训练方法及装置 |
CN117171557A (zh) * | 2023-08-03 | 2023-12-05 | 武汉纺织大学 | 基于脑电信号的自监督情绪识别模型的预训练方法及装置 |
CN116825365A (zh) * | 2023-08-30 | 2023-09-29 | 安徽爱学堂教育科技有限公司 | 基于多角度微表情的心理健康分析方法 |
CN116825365B (zh) * | 2023-08-30 | 2023-11-28 | 安徽爱学堂教育科技有限公司 | 基于多角度微表情的心理健康分析方法 |
CN117520826A (zh) * | 2024-01-03 | 2024-02-06 | 武汉纺织大学 | 一种基于可穿戴设备的多模态情绪识别方法及系统 |
CN117520826B (zh) * | 2024-01-03 | 2024-04-05 | 武汉纺织大学 | 一种基于可穿戴设备的多模态情绪识别方法及系统 |
Also Published As
Publication number | Publication date |
---|---|
CN114787883A (zh) | 2022-07-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2022067524A1 (zh) | 自动情绪识别方法、系统、计算设备及计算机可读存储介质 | |
Cimtay et al. | Cross-subject multimodal emotion recognition based on hybrid fusion | |
Zhu et al. | Electrocardiogram generation with a bidirectional LSTM-CNN generative adversarial network | |
Xu et al. | A one-dimensional CNN-LSTM model for epileptic seizure recognition using EEG signal analysis | |
Zhang et al. | Video-based stress detection through deep learning | |
Lilhore et al. | Hybrid CNN-LSTM model with efficient hyperparameter tuning for prediction of Parkinson’s disease | |
Chen et al. | Smg: A micro-gesture dataset towards spontaneous body gestures for emotional stress state analysis | |
Xie et al. | Interpreting depression from question-wise long-term video recording of SDS evaluation | |
Ashraf et al. | On the review of image and video-based depression detection using machine learning | |
Zheng et al. | Detecting Dementia from Face-Related Features with Automated Computational Methods | |
Gomez et al. | Exploring facial expressions and action unit domains for Parkinson detection | |
Creagh et al. | Interpretable deep learning for the remote characterisation of ambulation in multiple sclerosis using smartphones | |
Lopes Silva et al. | Chimerical dataset creation protocol based on Doddington zoo: A biometric application with face, eye, and ECG | |
Ahmed et al. | Applying eye tracking with deep learning techniques for early-stage detection of autism spectrum disorders | |
Li et al. | Automatic classification of ASD children using appearance-based features from videos | |
Yadav et al. | Review of automated depression detection: Social posts, audio and video, open challenges and future direction | |
Bibbo’ et al. | Emotional Health Detection in HAR: New Approach Using Ensemble SNN | |
Lu et al. | Transformer encoder with multiscale deep learning for pain classification using physiological signals | |
ALISAWI et al. | Real-Time Emotion Recognition Using Deep Learning Methods: Systematic Review | |
Prome et al. | Deception detection using ML and DL techniques: A systematic review | |
Ribeiro et al. | Stimming behavior dataset-unifying stereotype behavior dataset in the wild | |
Pinto et al. | A Systematic Review of Facial Expression Detection Methods | |
Anju et al. | Recent survey on Parkinson disease diagnose using deep learning mechanism | |
Pereira et al. | Systematic Review of Emotion Detection with Computer Vision and Deep Learning | |
Liu et al. | Machine to brain: facial expression recognition using brain machine generative adversarial networks |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 20955550 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 20955550 Country of ref document: EP Kind code of ref document: A1 |