CN109325457B - Emotion analysis method and system based on multi-channel data and recurrent neural network - Google Patents

Emotion analysis method and system based on multi-channel data and recurrent neural network Download PDF

Info

Publication number
CN109325457B
CN109325457B CN201811155546.9A CN201811155546A CN109325457B CN 109325457 B CN109325457 B CN 109325457B CN 201811155546 A CN201811155546 A CN 201811155546A CN 109325457 B CN109325457 B CN 109325457B
Authority
CN
China
Prior art keywords
data
emotion
convolution
preset
neural network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811155546.9A
Other languages
Chinese (zh)
Other versions
CN109325457A (en
Inventor
孙晓
洪涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hefei University of Technology
Original Assignee
Hefei University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hefei University of Technology filed Critical Hefei University of Technology
Priority to CN201811155546.9A priority Critical patent/CN109325457B/en
Publication of CN109325457A publication Critical patent/CN109325457A/en
Application granted granted Critical
Publication of CN109325457B publication Critical patent/CN109325457B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/70Multimodal biometrics, e.g. combining information from different biometric modalities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides an emotion analysis method and system based on multi-channel data and a recurrent neural network, and relates to the technical field of emotion analysis. The method comprises the following steps: acquiring facial expression pictures, voice data, infrared pulse data and skin resistance data of a person to be analyzed in the process of watching a preset video; respectively converting the voice data, the infrared pulse data and the skin resistance data into corresponding spectrograms; respectively inputting the facial expression picture, the spectrogram corresponding to the voice data, the spectrogram corresponding to the infrared pulse data and the spectrogram corresponding to the skin resistance data into a preset convolution neural network model to obtain respective corresponding feature arrays; each feature array comprises a first preset number of feature data; and combining the characteristic arrays to obtain a total characteristic array, and inputting the total characteristic array into the emotion analysis model to obtain the proportion of each type of emotion of the person to be analyzed. The invention can improve the accuracy of emotion analysis.

Description

Emotion analysis method and system based on multi-channel data and recurrent neural network
Technical Field
The invention relates to the technical field of emotion analysis, in particular to an emotion analysis method and system based on multichannel data and a recurrent neural network, computer equipment, a computer readable storage medium and a computer program.
Background
In the prior art, emotion analysis methods include a method of establishing an emotion calculation system based on facial expressions and a method of establishing an emotion calculation system based on pulse signals, and in any of the methods, adopted data are single, so that the accuracy of emotion analysis is poor.
Disclosure of Invention
Technical problem to be solved
Aiming at the defects of the prior art, the invention provides an emotion analysis method, system, computer equipment, computer readable storage medium and computer program based on multi-channel data and a recurrent neural network, which can improve the accuracy of emotion analysis.
(II) technical scheme
In order to achieve the purpose, the invention is realized by the following technical scheme:
in a first aspect, the invention provides an emotion analysis method based on multichannel data and a recurrent neural network, which comprises the following steps:
acquiring facial expression pictures, voice data, infrared pulse data and skin resistance data of a person to be analyzed in the process of watching a preset video;
respectively converting the voice data, the infrared pulse data and the skin resistance data into corresponding spectrograms;
inputting the facial expression picture, the spectrogram corresponding to the voice data, the spectrogram corresponding to the infrared pulse data and the spectrogram corresponding to the skin resistance data into a preset convolutional neural network model respectively to obtain corresponding feature arrays respectively; each feature array comprises a first preset number of feature data;
combining the feature arrays to obtain a total feature array, and inputting the total feature array into an emotion analysis model to obtain the proportion of each type of emotion of the person to be analyzed;
the emotion analysis model comprises a pre-trained recurrent neural network, a preset first full-connection layer and a preset activation function; the cyclic neural network comprises a third preset number of long-time and short-time memory network units, the third preset number is the number of characteristic data in the total characteristic array, the long-time and short-time memory network units of the third preset number are sequentially connected, input data of a first long-time and short-time memory network unit is first characteristic data in the total characteristic array, and input data of each long-time and short-time memory network unit except the first long-time and short-time memory network unit is output data of a previous long-time and short-time memory network unit and corresponding characteristic data in the total characteristic array; the recurrent neural network is used for outputting first emotion data according to the total feature array, the first full-connection layer is used for converting the first emotion data into second emotion data with a second preset number, and the second preset number is the number of emotion types; and the activation function is used for determining the proportion of each type of emotion of the person to be analyzed according to the second preset amount of second emotion data.
In a second aspect, the invention provides an emotion analysis system based on multichannel data and a recurrent neural network, the system comprising:
the data acquisition unit is used for acquiring facial expression pictures, voice data, infrared pulse data and skin resistance data of a person to be analyzed in the process of watching a preset video;
the data conversion unit is used for respectively converting the voice data, the infrared pulse data and the skin resistance data into corresponding spectrograms;
the characteristic determining unit is used for respectively inputting the facial expression picture, the spectrogram corresponding to the voice data, the spectrogram corresponding to the infrared pulse data and the spectrogram corresponding to the skin resistance data into a preset convolutional neural network model to obtain respective corresponding characteristic arrays; each feature array comprises a first preset number of feature data;
the emotion determining unit is used for combining the characteristic arrays to obtain a total characteristic array, inputting the total characteristic array into an emotion analysis model, and obtaining the proportion of each type of emotion of the person to be analyzed; the emotion analysis model comprises a pre-trained recurrent neural network, a preset first full-connection layer and a preset activation function; the cyclic neural network comprises a third preset number of long-time and short-time memory network units, the third preset number is the number of characteristic data in the total characteristic array, the long-time and short-time memory network units of the third preset number are sequentially connected, input data of a first long-time and short-time memory network unit is first characteristic data in the total characteristic array, and input data of each long-time and short-time memory network unit except the first long-time and short-time memory network unit is output data of a previous long-time and short-time memory network unit and corresponding characteristic data in the total characteristic array; the recurrent neural network is used for outputting first emotion data according to the total feature array, the first full-connection layer is used for converting the first emotion data into second emotion data with a second preset number, and the second preset number is the number of emotion types; and the activation function is used for determining the proportion of each type of emotion of the person to be analyzed according to the second preset amount of second emotion data.
In a third aspect, the present invention provides a computer apparatus comprising:
at least one memory;
and at least one processor, wherein:
the at least one memory is for storing a computer program;
the at least one processor is configured to invoke a computer program stored in the at least one memory to perform the sentiment analysis method described above.
In a fourth aspect, the present invention provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, may implement the emotion analysis method described above.
In a fifth aspect, the present invention provides a computer program comprising computer-executable instructions that, when executed, cause at least one processor to perform the emotion analysis method described above.
(III) advantageous effects
The embodiment of the invention provides an emotion analysis method, system, computer equipment, computer readable storage medium and computer program based on multichannel data and a cyclic neural network, which are used for collecting multichannel data of a person to be analyzed, namely human face expression pictures, voice data, infrared pulse data and skin resistance data, and facilitating a convolutional neural network model to extract the characteristics of the multichannel data, and analyzing the proportion of each type of emotion according to the characteristics by adopting an emotion analysis model. Because the method is based on multi-channel data for analysis, the problem that the emotion type cannot be truly reflected by adopting single-channel data in the prior art is solved, and the accuracy of emotion analysis is improved. Because the infrared pulse data and the skin resistance data in the multi-channel data are the physiological data of the human body, the emotion analysis method is not changed by individual consciousness, and can reflect the emotion of a person to be analyzed more truly.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a schematic flow chart of an emotion analysis method based on multi-channel data and a recurrent neural network according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a convolutional neural network model according to an embodiment of the present invention;
FIG. 3 is a diagram illustrating an operation process of the tth long-time and short-time memory network unit according to an embodiment of the present invention;
FIG. 4 is a block diagram of an emotion analysis system based on multi-channel data and a recurrent neural network according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In a first aspect, the present invention provides an emotion analysis method based on multichannel data and a recurrent neural network, as shown in fig. 1, the emotion analysis method includes:
s101, acquiring facial expression pictures, voice data, infrared pulse data and skin resistance data of a person to be analyzed in the process of watching a preset video;
it is understood that the preset video may include at least one of sadness, anger, happiness, surprise, fear, and disgust.
It can be understood that the facial expression picture may be a picture obtained by photographing a user in a process of watching a preset video by a to-be-analyzed person, or may be a picture selected from videos obtained by photographing a user in a process of watching a preset video by a to-be-analyzed person.
In practical application, a voice acquisition device can be arranged on the site where a person to be analyzed watches the preset video, and the voice data is acquired by the voice acquisition device.
During practical application, can be at the infrared pulse collection equipment of waiting to install and skin resistance collection equipment on the analyst's health, and then can utilize infrared pulse collection equipment to gather the infrared pulse data of waiting to analyst, utilize skin resistance data acquisition to wait analyst's skin resistance data.
It can be understood that the multi-channel data mentioned in the subject are facial expression pictures, voice data, infrared pulse data and skin resistance data.
S102, converting the voice data, the infrared pulse data and the skin resistance data into corresponding spectrograms respectively;
in order to facilitate the subsequent data processing process, the voice data, the infrared pulse data and the skin resistance data are all converted into spectrograms, so that all channel data are in a picture form.
S103, inputting the facial expression picture, the spectrogram corresponding to the voice data, the spectrogram corresponding to the infrared pulse data and the spectrogram corresponding to the skin resistance data into a preset convolutional neural network model respectively to obtain corresponding feature arrays respectively; each feature array comprises a first preset number of feature data;
it can be understood that the convolutional neural network model can perform feature extraction on the data of each channel, and further obtain a feature array corresponding to each channel.
In particular, the convolutional neural network model may take a variety of configurations, one of which is described below with reference to fig. 2: the convolution neural network model comprises five convolution units which are sequentially connected and a second full-connection layer connected with the output end of the fifth convolution unit; wherein: each convolution unit comprises a convolution layer and a down-sampling layer connected with the output end of the convolution layer; and the second full-link layer is used for converting the number of the output data of the fifth convolution unit into a first preset number.
For example, as shown in fig. 2, the convolution layer 301a in the first convolution unit includes 96 convolution kernels with a size of 11 × 11, the sampling kernel of the down-sampling layer 301b in the first convolution unit has a size of 3 × 3, and the sampling step is 2; as another example, the convolution layer 302a in the second convolution unit includes 128 convolution kernels with a size of 5 × 5, the sampling kernel of the downsampling layer 302b in the second convolution unit has a size of 3 × 3, and the sampling step size is 1; for another example, the convolution layer 303a in the third convolution unit includes 192 convolution kernels having a size of 3 × 3, the sampling kernel of the down-sampling layer 303b in the third convolution unit has a size of 3 × 3, and the sampling step size is 1; for another example, the convolution layer 304a in the fourth convolution unit includes 192 convolution kernels with a size of 3 × 3, the sampling kernel of the down-sampling layer 304b in the fourth convolution unit has a size of 3 × 3, and the sampling step size is 1; for another example, the convolution layer 305a in the fifth convolution unit includes 128 convolution kernels having a size of 3 × 3, the sampling kernel of the downsampling layer 305b in the fifth convolution unit has a size of 3 × 3, and the sampling step size is 1.
For example, the facial expression picture is a color picture, and includes R, G and B color channels, so that the facial expression picture corresponds to a three-dimensional array, for example, a three-dimensional array with a size of 6 × 3, where 3 represents 3 color channels, such an array may be understood as a stack of three-dimensional arrays, so that convolution processing of the facial expression picture by the convolution layer may be performed on each two-dimensional array, and then the processed three-dimensional arrays are stacked to form a convolved three-dimensional array. Also, the process of downsampling is similar.
The following describes a principle of convolution processing for a two-dimensional array:
as shown in table 1, a two-dimensional array size is 5 x 5, and as shown in table 2 below, the convolution process uses convolution kernels of (1, 0, 1; 0, 1, 0; 1, 0, 1). The array of rows 1, 2, 3 and columns 1, 2, 3 in table 1 is (1, 1, 1; 0, 1, 1; 0, 0, 1), the convolution kernel is multiplied by the data at the corresponding position in the array of the first three rows, and then the multiplied data are added, i.e., 1+ 0+ 1+ 0+ 1+ 4, to obtain the first output value. By analogy, an output matrix with a size of 3 × 3 may be obtained.
TABLE 1
1 1 1 0 0
0 1 1 1 0
0 0 1 1 1
0 0 1 1 0
0 1 1 0 0
TABLE 2
1 0 1
0 1 0
1 0 1
And after one convolution processing, the size of an output matrix is N x N, wherein N is (W-F)/S + 1. The input matrix of the convolution process has a size W × W, the convolution kernel has a size F × F, and the step size is S.
The following describes a principle of downsampling a two-dimensional array:
the three-dimensional array obtained after the convolution processing is decomposed into three two-dimensional arrays, and as shown in table 3 below, the size of one two-dimensional array obtained after the decomposition is 4 × 4, the down-sampling kernel size is 2 × 2, and the step size is 2. The array of rows 1, 2 and columns 1, 2 in Table 3 is (1, 1; 5, 6), with a maximum of 6 in the array. Since the step size is 2, the array of rows 1, 2 and columns 3, 4 is (2, 4; 7, 8), and the maximum value in the array is 8. By analogy, a two-dimensional array as shown in table 4 can be obtained.
TABLE 3
1 1 2 4
5 6 7 8
3 2 1 0
1 2 3 4
TABLE 4
6 8
3 4
The size of the output matrix obtained after the primary down-sampling process is len × len, wherein len ═ X-pool _ size/stride + 1. The size of the input matrix of the down-sampling layer is X, the size of the kernel of the down-sampling layer is pool _ size, and the step size is stride.
For example, a facial expression picture with 237 × 237 pixels is input into the convolutional neural network model, after the facial expression picture is input into the first convolution unit, because the convolutional layer in the first convolution unit includes 96 convolution kernels with the size of 11 × 11, an array with the dimension of 55 × 96 is obtained after passing through the convolutional layer, and after the array is subjected to downsampling with the kernel of 3 × 3 and the step size of 2, a first three-dimensional array with the dimension of 27 × 96 is obtained; inputting the first three-dimensional array into a second convolution unit of the structure to obtain a second three-dimensional array; inputting the second three-dimensional array into a third convolution unit of the structure to obtain a third three-dimensional array; inputting the third three-dimensional data into a fourth convolution unit of the structure to obtain a fourth three-dimensional array; and inputting the fourth three-dimensional array into a fifth convolution unit of the structure to obtain a fifth three-dimensional array. The size of the fifth three-dimensional array is 6 × 256, and the array with the size of 1 × 4096 is obtained by expanding the fifth three-dimensional array, i.e., 4096 data is obtained. And (3) passing the 1 × 4096 array through a full connection layer with the output data number of 1000 to obtain 1000 data, namely the array with the size of 1 × 1000, wherein the 1 × 1000 array is a feature array of the facial expression picture with the pixel of 237 × 237, and the feature array comprises 1000 data.
Similarly, the spectrogram corresponding to the voice data, the spectrogram corresponding to the infrared pulse data and the spectrogram corresponding to the skin resistance data are respectively input into the convolutional neural network model with the structure, so that a feature array with the size of 1 x 1000 is respectively obtained. That is to say, the facial expression picture, the spectrogram corresponding to the voice data, the spectrogram corresponding to the infrared pulse data and the spectrogram corresponding to the skin resistance data are respectively input into the convolutional neural network model with the structure, so that four feature arrays with the size of 1 × 1000 are obtained.
S104, combining the feature arrays to obtain a total feature array, and inputting the total feature array into an emotion analysis model to obtain the proportion of each type of emotion of the person to be analyzed;
for example, the 4 feature arrays with the size of 1 × 1000 are combined into a total feature array with the size of 1 × 4000, and the total feature array includes 4000 data.
The emotion analysis model comprises a pre-trained recurrent neural network, a preset first full-connection layer and a preset activation function; the cyclic neural network comprises a third preset number of long-time and short-time memory network units, the third preset number is the number of characteristic data in the total characteristic array, the long-time and short-time memory network units of the third preset number are sequentially connected, input data of a first long-time and short-time memory network unit is first characteristic data in the total characteristic array, and input data of each long-time and short-time memory network unit except the first long-time and short-time memory network unit is output data of a previous long-time and short-time memory network unit and corresponding characteristic data in the total characteristic array; the recurrent neural network is used for outputting first emotion data according to the total feature array, the first full-connection layer is used for converting the first emotion data into second emotion data with a second preset number, and the second preset number is the number of emotion types; and the activation function is used for determining the proportion of each type of emotion of the person to be analyzed according to the second preset amount of second emotion data.
For example, the total feature array includes 4000 feature data, the number of the long and short term memory network units in the recurrent neural network is 4000, and one feature data corresponds to one long and short term memory network unit.
For example, human emotions may generally include six categories of sadness, anger, happiness, surprise, fear, and disgust. For the analysis of six emotions, the second preset number is 6.
Referring to fig. 3, the structure of each long and short term memory network cell may include an input gate, an output gate, and a memory gate; wherein:
the calculation formula of the input gate of the tth long-time memory network unit is as follows:
ft=σ(Wf·[ht-1,xt]+bf)
in the formula (f)tMemorizing the output data of an input gate of the network unit for the tth long time; h ist-1For the t-1 th long-short time memory of the output data, x of the network unittFor the ith long-and-short term, the corresponding characteristic data, W, of the network unit in the total characteristic array is memorizedfAnd bfIs a pre-trained parameter; the calculated function represented by σ is f (x) 1/[1+ e ^ (-x)];
The calculation formula of the memory gate of the tth long-time memory network unit is as follows:
Ct=ft*Ct-1+it* tg
in the formula, CtFor the tth duration of the output data of the memory gate of the memory network unit, Ct-1For the t-1 th long-and-short term memory of the output data of the memory gate of the network unit, it=σ(Wi·[ht-1,xt]+bi),gt=tanh(Wg·[ht-1,xt]+bg),Wi、bi、Wg、bgIs a pre-trained parameter;
the calculation formula of the output gate of the tth long-time memory network unit is as follows:
ht=Ot*tanh(Ct)
in the formula, htMemorizing the output data of an output gate of the network unit for the tth long time; o ist=σ(Wo[ht-1,xt]+bo),Wo、boAre pre-trained parameters.
In a specific implementation, the activation function may include:
Figure BDA0001818864100000101
in the formula, SiThe proportion of the ith type emotion of the person to be analyzed,ViAnd the ith second emotion data output for the first full connection layer, wherein C is a second preset quantity.
It is understood that for six emotions, C takes the value 6.
Understandably, for the five emotions, S1Representing the proportion of the first type of emotion, S2Representing the proportion of the second type of emotion, S3Representing the proportion of a third type of emotion, S4Representing the proportion of the fourth type of emotion, S5Representing the proportion of the fifth type of emotion, S6Representing the proportion of the sixth type of emotion.
For example, the second predetermined number is 6, the first emotion data output by the emotion analysis function is input to the first full link layer, and the first full link layer converts the first emotion data into 6 second emotion data, so as to output 6 second emotion data, where each second emotion data corresponds to an emotion type. After the 6 second emotion data are input into the activation function, the proportion of each type of emotion can be obtained.
In specific implementation, the pre-training process of the emotion analysis function is a pre-training process of W and bias in the emotion analysis function, and the pre-training process specifically includes:
a. respectively marking the types of emotions generated in the process that a plurality of training objects watch the preset video;
it is understood that the plurality of training subjects are a plurality of subjects.
In practical application, the emotion type marking can be carried out in a mode of enabling the emotion type generated by the training object in the process of watching the preset video to be selected.
It will be appreciated that the labels can be 0, 1, 2, 3, 4, 5 for the six emotion types, respectively.
b. Respectively acquiring facial expression pictures, voice data, infrared pulse data and skin resistance data of the training objects in the process of watching a preset video;
c. respectively converting the voice data, the infrared pulse data and the skin resistance data of each training object into corresponding spectrogram;
d. respectively inputting the facial expression picture, the spectrogram corresponding to the voice data, the spectrogram corresponding to the infrared pulse data and the spectrogram corresponding to the skin resistance data of each training object into a preset convolutional neural network model to obtain respective corresponding feature arrays;
e. combining the characteristic data corresponding to each training object to obtain a total characteristic array of the training object;
f. and carrying out emotion analysis function training on the total feature arrays of the training objects and the emotion types marked by the training objects to obtain an emotion analysis function.
In step f, the process of performing the emotion analysis function training is actually the process of determining the parameters W and bias in the emotion analysis function.
It can be understood that, the emotion type of a training object label is the output value F in the emotion analysis function, the total feature array of the training object is the input, and the parameters W and bias in the above formula can be determined by training the total feature data of each of a plurality of training objects and the emotion type of the label.
It is understood that the steps b to e are similar to the steps S101 to S104, and the explanation, examples and specific embodiments of the related contents can refer to the corresponding parts in the steps S101 to S104.
It is understood that the size or dimension of the array a × b represents the size or dimension of the array a, b, and c, and the size or dimension of the array a × b × c represents the size or dimension of the array a, b, and c, and it can be understood that the length, the width, and the height are a, b, and c, respectively. And where otherwise "+" denotes multiplication.
The emotion analysis method provided by the invention collects multichannel data of a person to be analyzed, namely facial expression pictures, voice data, infrared pulse data and skin resistance data, is beneficial to extracting the characteristics of the multichannel data by a convolutional neural network model, and analyzes the proportion of each type of emotion according to the characteristics by adopting an emotion analysis model. Because the method is based on multi-channel data for analysis, the problem that the emotion type cannot be truly reflected by adopting single-channel data in the prior art is solved, and the accuracy of emotion analysis is improved. Because the infrared pulse data and the skin resistance data in the multi-channel data are the physiological data of the human body, the emotion analysis method is not changed by individual consciousness, and can reflect the emotion of a person to be analyzed more truly.
In a second aspect, the present invention provides an emotion analyzing system based on multi-channel data, as shown in fig. 4, the system includes:
the data acquisition unit is used for acquiring facial expression pictures, voice data, infrared pulse data and skin resistance data of a person to be analyzed in the process of watching a preset video;
the data conversion unit is used for respectively converting the voice data, the infrared pulse data and the skin resistance data into corresponding spectrograms;
the characteristic determining unit is used for respectively inputting the facial expression picture, the spectrogram corresponding to the voice data, the spectrogram corresponding to the infrared pulse data and the spectrogram corresponding to the skin resistance data into a preset convolutional neural network model to obtain respective corresponding characteristic arrays; each feature array comprises a first preset number of feature data;
the emotion determining unit is used for combining the characteristic arrays to obtain a total characteristic array, inputting the total characteristic array into an emotion analysis model, and obtaining the proportion of each type of emotion of the person to be analyzed; the emotion analysis model comprises a pre-trained recurrent neural network, a preset first full-connection layer and a preset activation function; the cyclic neural network comprises a third preset number of long-time and short-time memory network units, the third preset number is the number of characteristic data in the total characteristic array, the long-time and short-time memory network units of the third preset number are sequentially connected, input data of a first long-time and short-time memory network unit is first characteristic data in the total characteristic array, and input data of each long-time and short-time memory network unit except the first long-time and short-time memory network unit is output data of a previous long-time and short-time memory network unit and corresponding characteristic data in the total characteristic array; the recurrent neural network is used for outputting first emotion data according to the total feature array, the first full-connection layer is used for converting the first emotion data into second emotion data with a second preset number, and the second preset number is the number of emotion types; and the activation function is used for determining the proportion of each type of emotion of the person to be analyzed according to the second preset amount of second emotion data.
It is understood that the emotion analysis system provided by the second aspect corresponds to the emotion analysis method provided by the first aspect, and the explanation, the example, the detailed description, the beneficial effects, and the like of the content can be referred to the corresponding content in the first aspect.
In a third aspect, the present invention provides a computer apparatus comprising:
at least one memory;
and at least one processor, wherein:
the at least one memory is for storing a computer program;
the at least one processor is configured to invoke a computer program stored in the at least one memory to perform the sentiment analysis method provided in the first aspect.
It is to be understood that each unit in the emotion analyzing system provided by the second aspect is a computer program module, and the computer program modules are computer programs stored in the at least one memory.
In a fourth aspect, the present invention provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the lattice analysis method provided by the first aspect.
In a fifth aspect, the invention provides a computer program comprising computer-executable instructions that, when executed, cause at least one processor to perform the method of emotion analysis provided in the first aspect.
It is to be understood that the explanation, the detailed description, the examples, the advantages, and the like of the contents of the computer device, the computer-readable storage medium, and the computer program provided in the third to fifth aspects can be referred to the corresponding parts in the first aspect.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (9)

1. An emotion analysis method based on multichannel data and a recurrent neural network is characterized by comprising the following steps:
acquiring facial expression pictures, voice data, infrared pulse data and skin resistance data of a person to be analyzed in the process of watching a preset video;
the preset video comprises at least one type of video of sadness, anger, happiness, surprise, fear and disgust;
respectively converting the voice data, the infrared pulse data and the skin resistance data into corresponding spectrograms;
inputting the facial expression picture, the spectrogram corresponding to the voice data, the spectrogram corresponding to the infrared pulse data and the spectrogram corresponding to the skin resistance data into a preset convolutional neural network model respectively to obtain corresponding feature arrays respectively; each feature array comprises a first preset number of feature data;
combining the feature arrays to obtain a total feature array, and inputting the total feature array into an emotion analysis model to obtain the proportion of each type of emotion of the person to be analyzed;
the emotion analysis model comprises a pre-trained recurrent neural network, a preset first full-connection layer and a preset activation function; the cyclic neural network comprises a third preset number of long-time and short-time memory network units, the third preset number is the number of characteristic data in the total characteristic array, the long-time and short-time memory network units of the third preset number are sequentially connected, input data of a first long-time and short-time memory network unit is first characteristic data in the total characteristic array, and input data of each long-time and short-time memory network unit except the first long-time and short-time memory network unit is output data of a previous long-time and short-time memory network unit and corresponding characteristic data in the total characteristic array; the recurrent neural network is used for outputting first emotion data according to the total feature array, the first full-connection layer is used for converting the first emotion data into second emotion data with a second preset number, and the second preset number is the number of emotion types; the activation function is used for determining the proportion of each type of emotion of the person to be analyzed according to the second preset amount of second emotion data;
the training process of the recurrent neural network comprises the following steps:
respectively marking the types of emotions generated in the process that a plurality of training objects watch the preset video;
respectively acquiring facial expression pictures, voice data, infrared pulse data and skin resistance data of the training objects in the process of watching a preset video;
respectively converting the voice data, the infrared pulse data and the skin resistance data of each training object into corresponding spectrogram;
respectively inputting the facial expression picture, the spectrogram corresponding to the voice data, the spectrogram corresponding to the infrared pulse data and the spectrogram corresponding to the skin resistance data of each training object into a preset convolutional neural network model to obtain respective corresponding feature arrays;
combining the characteristic data corresponding to each training object to obtain a total characteristic array of the training object;
and performing cyclic neural network training according to the respective total feature arrays of the plurality of training objects and the emotion types marked on the plurality of training objects to obtain a cyclic neural network.
2. An emotion analysis method as claimed in claim 1, wherein the structure of the convolutional neural network model includes five convolutional units connected in sequence and a second fully-connected layer connected to an output terminal of the fifth convolutional unit; wherein: each convolution unit comprises a convolution layer and a down-sampling layer connected with the output end of the convolution layer; and the second full-link layer is used for converting the number of the output data of the fifth convolution unit into a first preset number.
3. The emotion analyzing method as recited in claim 2,
the convolution layer in the first convolution unit of the five convolution units comprises 96 convolution kernels of 11 × 11, the sampling kernel of the down-sampling layer in the first convolution unit is 3 × 3, and the sampling step is 2; and/or
The convolution layer in the second convolution unit of the five convolution units comprises 128 convolution kernels of 5 × 5, the sampling kernel of the down-sampling layer in the second convolution unit is 3 × 3, and the sampling step is 1; and/or
The convolution layer in the third convolution unit in the five convolution units comprises 192 convolution kernels of 3 × 3, the sampling kernel of the down-sampling layer in the third convolution unit is 3 × 3, and the sampling step is 1; and/or
The convolution layer in the fourth convolution unit in the five convolution units comprises 192 convolution kernels of 3 × 3, the sampling kernel of the down-sampling layer in the third convolution unit is 3 × 3, and the sampling step is 1; and/or
The convolution layer in the fifth convolution unit of the five convolution units comprises 128 convolution kernels of 3 × 3, the sampling kernel of the down-sampling layer in the fifth convolution unit is 3 × 3, and the sampling step is 1.
4. An emotion analysis method as claimed in any one of claims 1 to 3, wherein the structure of each long-and-short memory network element includes an input gate, an output gate and a memory gate; wherein:
the calculation formula of the input gate of the tth long-time memory network unit is as follows:
ft=σ(Wf · [ht-1,xt]+bf
in the formula (f)tMemorizing the output data of an input gate of the network unit for the tth long time; h ist-1For the t-1 th long-short time memory of the output data, x of the network unittFor the ith long-and-short term, the corresponding characteristic data, W, of the network unit in the total characteristic array is memorizedfAnd bfIs a pre-trained parameter; the calculated function represented by σ is f (x) = 1/[1+ e ^ (x) ^ x)];
The calculation formula of the memory gate of the tth long-time memory network unit is as follows:
Ct=ft*Ct-1+it* tg
in the formula, CtFor the tth duration of the output data of the memory gate of the memory network unit, Ct-1For the t-1 th long-and-short term memory of the output data of the memory gate of the network unit, it=σ(Wi ·[ht-1,xt]+bi),g t =tanh(Wg ·[ht-1,xt]+bg),Wi、bi、Wg、bgIs a pre-trained parameter;
the calculation formula of the output gate of the tth long-time memory network unit is as follows:
ht= Ot*tanh(Ct)
in the formula, htMemorizing the output data of an output gate of the network unit for the tth long time; o ist=σ(Wo[ht-1,xt]+bo),Wo、boAre pre-trained parameters.
5. An emotion analysis method as claimed in any one of claims 1 to 3, wherein the activation function includes:
Figure 737332DEST_PATH_IMAGE001
in the formula, the ratio of the ith type emotion of the person to be analyzed is ith second emotion data output by the first full connection layer, and C is a second preset number.
6. An emotion analysis system based on multichannel data and a recurrent neural network, comprising:
the data acquisition unit is used for acquiring facial expression pictures, voice data, infrared pulse data and skin resistance data of a person to be analyzed in the process of watching a preset video;
the preset video comprises at least one type of video of sadness, anger, happiness, surprise, fear and disgust;
the data conversion unit is used for respectively converting the voice data, the infrared pulse data and the skin resistance data into corresponding spectrograms;
the characteristic determining unit is used for respectively inputting the facial expression picture, the spectrogram corresponding to the voice data, the spectrogram corresponding to the infrared pulse data and the spectrogram corresponding to the skin resistance data into a preset convolutional neural network model to obtain respective corresponding characteristic arrays; each feature array comprises a first preset number of feature data;
the emotion determining unit is used for combining the characteristic arrays to obtain a total characteristic array, inputting the total characteristic array into an emotion analysis model, and obtaining the proportion of each type of emotion of the person to be analyzed; the emotion analysis model comprises a pre-trained recurrent neural network, a preset first full-connection layer and a preset activation function; the cyclic neural network comprises a third preset number of long-time and short-time memory network units, the third preset number is the number of characteristic data in the total characteristic array, the long-time and short-time memory network units of the third preset number are sequentially connected, input data of a first long-time and short-time memory network unit is first characteristic data in the total characteristic array, and input data of each long-time and short-time memory network unit except the first long-time and short-time memory network unit is output data of a previous long-time and short-time memory network unit and corresponding characteristic data in the total characteristic array; the recurrent neural network is used for outputting first emotion data according to the total feature array, the first full-connection layer is used for converting the first emotion data into second emotion data with a second preset number, and the second preset number is the number of emotion types; the activation function is used for determining the proportion of each type of emotion of the person to be analyzed according to the second preset number of second emotion data
The training process of the recurrent neural network comprises the following steps:
respectively marking the types of emotions generated in the process that a plurality of training objects watch the preset video;
respectively acquiring facial expression pictures, voice data, infrared pulse data and skin resistance data of the training objects in the process of watching a preset video;
respectively converting the voice data, the infrared pulse data and the skin resistance data of each training object into corresponding spectrogram;
respectively inputting the facial expression picture, the spectrogram corresponding to the voice data, the spectrogram corresponding to the infrared pulse data and the spectrogram corresponding to the skin resistance data of each training object into a preset convolutional neural network model to obtain respective corresponding feature arrays;
combining the characteristic data corresponding to each training object to obtain a total characteristic array of the training object;
and performing cyclic neural network training according to the respective total feature arrays of the plurality of training objects and the emotion types marked on the plurality of training objects to obtain a cyclic neural network.
7. A computer device, comprising:
at least one memory;
and at least one processor, wherein:
the at least one memory is for storing a computer program;
the at least one processor is configured to invoke a computer program stored in the at least one memory to perform the sentiment analysis method of any one of claims 1 to 5.
8. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, is adapted to carry out a sentiment analysis method according to any one of claims 1 to 5.
9. A computer program comprising computer-executable instructions that, when executed, cause at least one processor to perform a sentiment analysis method as claimed in any one of claims 1 to 5.
CN201811155546.9A 2018-09-30 2018-09-30 Emotion analysis method and system based on multi-channel data and recurrent neural network Active CN109325457B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811155546.9A CN109325457B (en) 2018-09-30 2018-09-30 Emotion analysis method and system based on multi-channel data and recurrent neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811155546.9A CN109325457B (en) 2018-09-30 2018-09-30 Emotion analysis method and system based on multi-channel data and recurrent neural network

Publications (2)

Publication Number Publication Date
CN109325457A CN109325457A (en) 2019-02-12
CN109325457B true CN109325457B (en) 2022-02-18

Family

ID=65266675

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811155546.9A Active CN109325457B (en) 2018-09-30 2018-09-30 Emotion analysis method and system based on multi-channel data and recurrent neural network

Country Status (1)

Country Link
CN (1) CN109325457B (en)

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106228977B (en) * 2016-08-02 2019-07-19 合肥工业大学 Multi-mode fusion song emotion recognition method based on deep learning
CN106384166A (en) * 2016-09-12 2017-02-08 中山大学 Deep learning stock market prediction method combined with financial news
CN107066446B (en) * 2017-04-13 2020-04-10 广东工业大学 Logic rule embedded cyclic neural network text emotion analysis method
CN107799165A (en) * 2017-09-18 2018-03-13 华南理工大学 A kind of psychological assessment method based on virtual reality technology
CN108038414A (en) * 2017-11-02 2018-05-15 平安科技(深圳)有限公司 Character personality analysis method, device and storage medium based on Recognition with Recurrent Neural Network
CN107808146B (en) * 2017-11-17 2020-05-05 北京师范大学 Multi-mode emotion recognition and classification method
CN107894837A (en) * 2017-11-28 2018-04-10 合肥工业大学 Dynamic sentiment analysis model sample processing method and processing device
CN108717856B (en) * 2018-06-16 2022-03-08 台州学院 Speech emotion recognition method based on multi-scale deep convolution cyclic neural network

Also Published As

Publication number Publication date
CN109325457A (en) 2019-02-12

Similar Documents

Publication Publication Date Title
CN110353675B (en) Electroencephalogram signal emotion recognition method and device based on picture generation
CN110515456B (en) Electroencephalogram signal emotion distinguishing method and device based on attention mechanism
CN110969124B (en) Two-dimensional human body posture estimation method and system based on lightweight multi-branch network
CN111582141B (en) Face recognition model training method, face recognition method and device
CN113408508B (en) Transformer-based non-contact heart rate measurement method
CN112686048B (en) Emotion recognition method and device based on fusion of voice, semantics and facial expressions
CN112508110A (en) Deep learning-based electrocardiosignal graph classification method
CN109919085A (en) Health For All Activity recognition method based on light-type convolutional neural networks
CN112818764A (en) Low-resolution image facial expression recognition method based on feature reconstruction model
CN113157094B (en) Electroencephalogram emotion recognition method combining feature migration and graph semi-supervised label propagation
CN113158880A (en) Deep learning-based student classroom behavior identification method
CN112883231B (en) Short video popularity prediction method, system, electronic equipment and storage medium
CN116645721B (en) Sitting posture identification method and system based on deep learning
CN112597824A (en) Behavior recognition method and device, electronic equipment and storage medium
CN111126350A (en) Method and device for generating heart beat classification result
CN109171773B (en) Emotion analysis method and system based on multi-channel data
Taleb et al. Visual representation of online handwriting time series for deep learning Parkinson's disease detection
CN109325457B (en) Emotion analysis method and system based on multi-channel data and recurrent neural network
CN109171774B (en) Personality analysis method and system based on multi-channel data
CN111259759A (en) Cross-database micro-expression recognition method and device based on domain selection migration regression
Hurri Independent component analysis of image data
CN110852271A (en) Micro-expression recognition method based on peak frame and deep forest
CN115547488A (en) Early screening system and method based on VGG convolutional neural network and facial recognition autism
CN114742107A (en) Method for identifying perception signal in information service and related equipment
CN114818822A (en) Electroencephalogram migration emotion recognition method combining semi-supervised regression and icon label propagation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant