CN109325457B

CN109325457B - Emotion analysis method and system based on multi-channel data and recurrent neural network

Info

Publication number: CN109325457B
Application number: CN201811155546.9A
Authority: CN
Inventors: 孙晓; 洪涛
Original assignee: Hefei University of Technology
Current assignee: Hefei University of Technology
Priority date: 2018-09-30
Filing date: 2018-09-30
Publication date: 2022-02-18
Anticipated expiration: 2038-09-30
Also published as: CN109325457A

Abstract

The invention provides an emotion analysis method and system based on multi-channel data and a recurrent neural network, and relates to the technical field of emotion analysis. The method comprises the following steps: acquiring facial expression pictures, voice data, infrared pulse data and skin resistance data of a person to be analyzed in the process of watching a preset video; respectively converting the voice data, the infrared pulse data and the skin resistance data into corresponding spectrograms; respectively inputting the facial expression picture, the spectrogram corresponding to the voice data, the spectrogram corresponding to the infrared pulse data and the spectrogram corresponding to the skin resistance data into a preset convolution neural network model to obtain respective corresponding feature arrays; each feature array comprises a first preset number of feature data; and combining the characteristic arrays to obtain a total characteristic array, and inputting the total characteristic array into the emotion analysis model to obtain the proportion of each type of emotion of the person to be analyzed. The invention can improve the accuracy of emotion analysis.

Description

Emotion analysis method and system based on multi-channel data and recurrent neural network

Technical Field

The invention relates to the technical field of emotion analysis, in particular to an emotion analysis method and system based on multichannel data and a recurrent neural network, computer equipment, a computer readable storage medium and a computer program.

Background

In the prior art, emotion analysis methods include a method of establishing an emotion calculation system based on facial expressions and a method of establishing an emotion calculation system based on pulse signals, and in any of the methods, adopted data are single, so that the accuracy of emotion analysis is poor.

Disclosure of Invention

Technical problem to be solved

Aiming at the defects of the prior art, the invention provides an emotion analysis method, system, computer equipment, computer readable storage medium and computer program based on multi-channel data and a recurrent neural network, which can improve the accuracy of emotion analysis.

(II) technical scheme

In order to achieve the purpose, the invention is realized by the following technical scheme:

in a first aspect, the invention provides an emotion analysis method based on multichannel data and a recurrent neural network, which comprises the following steps:

acquiring facial expression pictures, voice data, infrared pulse data and skin resistance data of a person to be analyzed in the process of watching a preset video;

respectively converting the voice data, the infrared pulse data and the skin resistance data into corresponding spectrograms;

inputting the facial expression picture, the spectrogram corresponding to the voice data, the spectrogram corresponding to the infrared pulse data and the spectrogram corresponding to the skin resistance data into a preset convolutional neural network model respectively to obtain corresponding feature arrays respectively; each feature array comprises a first preset number of feature data;

combining the feature arrays to obtain a total feature array, and inputting the total feature array into an emotion analysis model to obtain the proportion of each type of emotion of the person to be analyzed;

the emotion analysis model comprises a pre-trained recurrent neural network, a preset first full-connection layer and a preset activation function; the cyclic neural network comprises a third preset number of long-time and short-time memory network units, the third preset number is the number of characteristic data in the total characteristic array, the long-time and short-time memory network units of the third preset number are sequentially connected, input data of a first long-time and short-time memory network unit is first characteristic data in the total characteristic array, and input data of each long-time and short-time memory network unit except the first long-time and short-time memory network unit is output data of a previous long-time and short-time memory network unit and corresponding characteristic data in the total characteristic array; the recurrent neural network is used for outputting first emotion data according to the total feature array, the first full-connection layer is used for converting the first emotion data into second emotion data with a second preset number, and the second preset number is the number of emotion types; and the activation function is used for determining the proportion of each type of emotion of the person to be analyzed according to the second preset amount of second emotion data.

In a second aspect, the invention provides an emotion analysis system based on multichannel data and a recurrent neural network, the system comprising:

the data acquisition unit is used for acquiring facial expression pictures, voice data, infrared pulse data and skin resistance data of a person to be analyzed in the process of watching a preset video;

the data conversion unit is used for respectively converting the voice data, the infrared pulse data and the skin resistance data into corresponding spectrograms;

the characteristic determining unit is used for respectively inputting the facial expression picture, the spectrogram corresponding to the voice data, the spectrogram corresponding to the infrared pulse data and the spectrogram corresponding to the skin resistance data into a preset convolutional neural network model to obtain respective corresponding characteristic arrays; each feature array comprises a first preset number of feature data;

the emotion determining unit is used for combining the characteristic arrays to obtain a total characteristic array, inputting the total characteristic array into an emotion analysis model, and obtaining the proportion of each type of emotion of the person to be analyzed; the emotion analysis model comprises a pre-trained recurrent neural network, a preset first full-connection layer and a preset activation function; the cyclic neural network comprises a third preset number of long-time and short-time memory network units, the third preset number is the number of characteristic data in the total characteristic array, the long-time and short-time memory network units of the third preset number are sequentially connected, input data of a first long-time and short-time memory network unit is first characteristic data in the total characteristic array, and input data of each long-time and short-time memory network unit except the first long-time and short-time memory network unit is output data of a previous long-time and short-time memory network unit and corresponding characteristic data in the total characteristic array; the recurrent neural network is used for outputting first emotion data according to the total feature array, the first full-connection layer is used for converting the first emotion data into second emotion data with a second preset number, and the second preset number is the number of emotion types; and the activation function is used for determining the proportion of each type of emotion of the person to be analyzed according to the second preset amount of second emotion data.

In a third aspect, the present invention provides a computer apparatus comprising:

at least one memory;

and at least one processor, wherein:

the at least one memory is for storing a computer program;

the at least one processor is configured to invoke a computer program stored in the at least one memory to perform the sentiment analysis method described above.

In a fourth aspect, the present invention provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, may implement the emotion analysis method described above.

In a fifth aspect, the present invention provides a computer program comprising computer-executable instructions that, when executed, cause at least one processor to perform the emotion analysis method described above.

(III) advantageous effects

The embodiment of the invention provides an emotion analysis method, system, computer equipment, computer readable storage medium and computer program based on multichannel data and a cyclic neural network, which are used for collecting multichannel data of a person to be analyzed, namely human face expression pictures, voice data, infrared pulse data and skin resistance data, and facilitating a convolutional neural network model to extract the characteristics of the multichannel data, and analyzing the proportion of each type of emotion according to the characteristics by adopting an emotion analysis model. Because the method is based on multi-channel data for analysis, the problem that the emotion type cannot be truly reflected by adopting single-channel data in the prior art is solved, and the accuracy of emotion analysis is improved. Because the infrared pulse data and the skin resistance data in the multi-channel data are the physiological data of the human body, the emotion analysis method is not changed by individual consciousness, and can reflect the emotion of a person to be analyzed more truly.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 is a schematic flow chart of an emotion analysis method based on multi-channel data and a recurrent neural network according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of a convolutional neural network model according to an embodiment of the present invention;

FIG. 3 is a diagram illustrating an operation process of the tth long-time and short-time memory network unit according to an embodiment of the present invention;

FIG. 4 is a block diagram of an emotion analysis system based on multi-channel data and a recurrent neural network according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

In a first aspect, the present invention provides an emotion analysis method based on multichannel data and a recurrent neural network, as shown in fig. 1, the emotion analysis method includes:

s101, acquiring facial expression pictures, voice data, infrared pulse data and skin resistance data of a person to be analyzed in the process of watching a preset video;

it is understood that the preset video may include at least one of sadness, anger, happiness, surprise, fear, and disgust.

It can be understood that the facial expression picture may be a picture obtained by photographing a user in a process of watching a preset video by a to-be-analyzed person, or may be a picture selected from videos obtained by photographing a user in a process of watching a preset video by a to-be-analyzed person.

In practical application, a voice acquisition device can be arranged on the site where a person to be analyzed watches the preset video, and the voice data is acquired by the voice acquisition device.

During practical application, can be at the infrared pulse collection equipment of waiting to install and skin resistance collection equipment on the analyst's health, and then can utilize infrared pulse collection equipment to gather the infrared pulse data of waiting to analyst, utilize skin resistance data acquisition to wait analyst's skin resistance data.

It can be understood that the multi-channel data mentioned in the subject are facial expression pictures, voice data, infrared pulse data and skin resistance data.

S102, converting the voice data, the infrared pulse data and the skin resistance data into corresponding spectrograms respectively;

in order to facilitate the subsequent data processing process, the voice data, the infrared pulse data and the skin resistance data are all converted into spectrograms, so that all channel data are in a picture form.

S103, inputting the facial expression picture, the spectrogram corresponding to the voice data, the spectrogram corresponding to the infrared pulse data and the spectrogram corresponding to the skin resistance data into a preset convolutional neural network model respectively to obtain corresponding feature arrays respectively; each feature array comprises a first preset number of feature data;

it can be understood that the convolutional neural network model can perform feature extraction on the data of each channel, and further obtain a feature array corresponding to each channel.

In particular, the convolutional neural network model may take a variety of configurations, one of which is described below with reference to fig. 2: the convolution neural network model comprises five convolution units which are sequentially connected and a second full-connection layer connected with the output end of the fifth convolution unit; wherein: each convolution unit comprises a convolution layer and a down-sampling layer connected with the output end of the convolution layer; and the second full-link layer is used for converting the number of the output data of the fifth convolution unit into a first preset number.

For example, as shown in fig. 2, the convolution layer 301a in the first convolution unit includes 96 convolution kernels with a size of 11 × 11, the sampling kernel of the down-sampling layer 301b in the first convolution unit has a size of 3 × 3, and the sampling step is 2; as another example, the convolution layer 302a in the second convolution unit includes 128 convolution kernels with a size of 5 × 5, the sampling kernel of the downsampling layer 302b in the second convolution unit has a size of 3 × 3, and the sampling step size is 1; for another example, the convolution layer 303a in the third convolution unit includes 192 convolution kernels having a size of 3 × 3, the sampling kernel of the down-sampling layer 303b in the third convolution unit has a size of 3 × 3, and the sampling step size is 1; for another example, the convolution layer 304a in the fourth convolution unit includes 192 convolution kernels with a size of 3 × 3, the sampling kernel of the down-sampling layer 304b in the fourth convolution unit has a size of 3 × 3, and the sampling step size is 1; for another example, the convolution layer 305a in the fifth convolution unit includes 128 convolution kernels having a size of 3 × 3, the sampling kernel of the downsampling layer 305b in the fifth convolution unit has a size of 3 × 3, and the sampling step size is 1.

For example, the facial expression picture is a color picture, and includes R, G and B color channels, so that the facial expression picture corresponds to a three-dimensional array, for example, a three-dimensional array with a size of 6 × 3, where 3 represents 3 color channels, such an array may be understood as a stack of three-dimensional arrays, so that convolution processing of the facial expression picture by the convolution layer may be performed on each two-dimensional array, and then the processed three-dimensional arrays are stacked to form a convolved three-dimensional array. Also, the process of downsampling is similar.

The following describes a principle of convolution processing for a two-dimensional array:

as shown in table 1, a two-dimensional array size is 5 x 5, and as shown in table 2 below, the convolution process uses convolution kernels of (1, 0, 1; 0, 1, 0; 1, 0, 1). The array of rows 1, 2, 3 and columns 1, 2, 3 in table 1 is (1, 1, 1; 0, 1, 1; 0, 0, 1), the convolution kernel is multiplied by the data at the corresponding position in the array of the first three rows, and then the multiplied data are added, i.e., 1+ 0+ 1+ 0+ 1+ 4, to obtain the first output value. By analogy, an output matrix with a size of 3 × 3 may be obtained.

TABLE 1

1	1	1	0	0
					0	1	1	1	0
0	0	1	1	1
					0	0	1	1	0
0	1	1	0	0

TABLE 2

1	0	1
			0	1	0
1	0	1

And after one convolution processing, the size of an output matrix is N x N, wherein N is (W-F)/S + 1. The input matrix of the convolution process has a size W × W, the convolution kernel has a size F × F, and the step size is S.

The following describes a principle of downsampling a two-dimensional array:

the three-dimensional array obtained after the convolution processing is decomposed into three two-dimensional arrays, and as shown in table 3 below, the size of one two-dimensional array obtained after the decomposition is 4 × 4, the down-sampling kernel size is 2 × 2, and the step size is 2. The array of rows 1, 2 and columns 1, 2 in Table 3 is (1, 1; 5, 6), with a maximum of 6 in the array. Since the step size is 2, the array of rows 1, 2 and columns 3, 4 is (2, 4; 7, 8), and the maximum value in the array is 8. By analogy, a two-dimensional array as shown in table 4 can be obtained.

TABLE 3

1	1	2	4
				5	6	7	8
3	2	1	0
				1	2	3	4

TABLE 4

6	8
		3	4

The size of the output matrix obtained after the primary down-sampling process is len × len, wherein len ═ X-pool _ size/stride + 1. The size of the input matrix of the down-sampling layer is X, the size of the kernel of the down-sampling layer is pool _ size, and the step size is stride.

For example, a facial expression picture with 237 × 237 pixels is input into the convolutional neural network model, after the facial expression picture is input into the first convolution unit, because the convolutional layer in the first convolution unit includes 96 convolution kernels with the size of 11 × 11, an array with the dimension of 55 × 96 is obtained after passing through the convolutional layer, and after the array is subjected to downsampling with the kernel of 3 × 3 and the step size of 2, a first three-dimensional array with the dimension of 27 × 96 is obtained; inputting the first three-dimensional array into a second convolution unit of the structure to obtain a second three-dimensional array; inputting the second three-dimensional array into a third convolution unit of the structure to obtain a third three-dimensional array; inputting the third three-dimensional data into a fourth convolution unit of the structure to obtain a fourth three-dimensional array; and inputting the fourth three-dimensional array into a fifth convolution unit of the structure to obtain a fifth three-dimensional array. The size of the fifth three-dimensional array is 6 × 256, and the array with the size of 1 × 4096 is obtained by expanding the fifth three-dimensional array, i.e., 4096 data is obtained. And (3) passing the 1 × 4096 array through a full connection layer with the output data number of 1000 to obtain 1000 data, namely the array with the size of 1 × 1000, wherein the 1 × 1000 array is a feature array of the facial expression picture with the pixel of 237 × 237, and the feature array comprises 1000 data.

Similarly, the spectrogram corresponding to the voice data, the spectrogram corresponding to the infrared pulse data and the spectrogram corresponding to the skin resistance data are respectively input into the convolutional neural network model with the structure, so that a feature array with the size of 1 x 1000 is respectively obtained. That is to say, the facial expression picture, the spectrogram corresponding to the voice data, the spectrogram corresponding to the infrared pulse data and the spectrogram corresponding to the skin resistance data are respectively input into the convolutional neural network model with the structure, so that four feature arrays with the size of 1 × 1000 are obtained.

S104, combining the feature arrays to obtain a total feature array, and inputting the total feature array into an emotion analysis model to obtain the proportion of each type of emotion of the person to be analyzed;

for example, the 4 feature arrays with the size of 1 × 1000 are combined into a total feature array with the size of 1 × 4000, and the total feature array includes 4000 data.

For example, the total feature array includes 4000 feature data, the number of the long and short term memory network units in the recurrent neural network is 4000, and one feature data corresponds to one long and short term memory network unit.

For example, human emotions may generally include six categories of sadness, anger, happiness, surprise, fear, and disgust. For the analysis of six emotions, the second preset number is 6.

Referring to fig. 3, the structure of each long and short term memory network cell may include an input gate, an output gate, and a memory gate; wherein:

the calculation formula of the input gate of the tth long-time memory network unit is as follows:

f_t＝σ(W_f·[h_t-1,x_t]+b_f)

in the formula (f)_tMemorizing the output data of an input gate of the network unit for the tth long time; h is_t-1For the t-1 th long-short time memory of the output data, x of the network unit_tFor the ith long-and-short term, the corresponding characteristic data, W, of the network unit in the total characteristic array is memorized_fAnd b_fIs a pre-trained parameter; the calculated function represented by σ is f (x) 1/[1+ e ^ (-x)]；

The calculation formula of the memory gate of the tth long-time memory network unit is as follows:

C_t＝f_t*C_t-1+i_t* _tg

in the formula, C_tFor the tth duration of the output data of the memory gate of the memory network unit, C_t-1For the t-1 th long-and-short term memory of the output data of the memory gate of the network unit, i_t＝σ(W_i·[h_t-1,x_t]+b_i)，g_t＝tanh(W_g·[h_t-1,x_t]+b_g)，W_i、b_i、W_g、b_gIs a pre-trained parameter;

the calculation formula of the output gate of the tth long-time memory network unit is as follows:

h_t＝O_t*tanh(C_t)

in the formula, h_tMemorizing the output data of an output gate of the network unit for the tth long time; o is_t＝σ(W_o[h_t-1,x_t]+b_o)，W_o、b_oAre pre-trained parameters.

In a specific implementation, the activation function may include:

in the formula, S_iThe proportion of the ith type emotion of the person to be analyzed，V_iAnd the ith second emotion data output for the first full connection layer, wherein C is a second preset quantity.

It is understood that for six emotions, C takes the value 6.

Understandably, for the five emotions, S₁Representing the proportion of the first type of emotion, S₂Representing the proportion of the second type of emotion, S₃Representing the proportion of a third type of emotion, S₄Representing the proportion of the fourth type of emotion, S₅Representing the proportion of the fifth type of emotion, S₆Representing the proportion of the sixth type of emotion.

For example, the second predetermined number is 6, the first emotion data output by the emotion analysis function is input to the first full link layer, and the first full link layer converts the first emotion data into 6 second emotion data, so as to output 6 second emotion data, where each second emotion data corresponds to an emotion type. After the 6 second emotion data are input into the activation function, the proportion of each type of emotion can be obtained.

In specific implementation, the pre-training process of the emotion analysis function is a pre-training process of W and bias in the emotion analysis function, and the pre-training process specifically includes:

a. respectively marking the types of emotions generated in the process that a plurality of training objects watch the preset video;

it is understood that the plurality of training subjects are a plurality of subjects.

In practical application, the emotion type marking can be carried out in a mode of enabling the emotion type generated by the training object in the process of watching the preset video to be selected.

It will be appreciated that the labels can be 0, 1, 2, 3, 4, 5 for the six emotion types, respectively.

b. Respectively acquiring facial expression pictures, voice data, infrared pulse data and skin resistance data of the training objects in the process of watching a preset video;

c. respectively converting the voice data, the infrared pulse data and the skin resistance data of each training object into corresponding spectrogram;

d. respectively inputting the facial expression picture, the spectrogram corresponding to the voice data, the spectrogram corresponding to the infrared pulse data and the spectrogram corresponding to the skin resistance data of each training object into a preset convolutional neural network model to obtain respective corresponding feature arrays;

e. combining the characteristic data corresponding to each training object to obtain a total characteristic array of the training object;

f. and carrying out emotion analysis function training on the total feature arrays of the training objects and the emotion types marked by the training objects to obtain an emotion analysis function.

In step f, the process of performing the emotion analysis function training is actually the process of determining the parameters W and bias in the emotion analysis function.

It can be understood that, the emotion type of a training object label is the output value F in the emotion analysis function, the total feature array of the training object is the input, and the parameters W and bias in the above formula can be determined by training the total feature data of each of a plurality of training objects and the emotion type of the label.

It is understood that the steps b to e are similar to the steps S101 to S104, and the explanation, examples and specific embodiments of the related contents can refer to the corresponding parts in the steps S101 to S104.

It is understood that the size or dimension of the array a × b represents the size or dimension of the array a, b, and c, and the size or dimension of the array a × b × c represents the size or dimension of the array a, b, and c, and it can be understood that the length, the width, and the height are a, b, and c, respectively. And where otherwise "+" denotes multiplication.

The emotion analysis method provided by the invention collects multichannel data of a person to be analyzed, namely facial expression pictures, voice data, infrared pulse data and skin resistance data, is beneficial to extracting the characteristics of the multichannel data by a convolutional neural network model, and analyzes the proportion of each type of emotion according to the characteristics by adopting an emotion analysis model. Because the method is based on multi-channel data for analysis, the problem that the emotion type cannot be truly reflected by adopting single-channel data in the prior art is solved, and the accuracy of emotion analysis is improved. Because the infrared pulse data and the skin resistance data in the multi-channel data are the physiological data of the human body, the emotion analysis method is not changed by individual consciousness, and can reflect the emotion of a person to be analyzed more truly.

In a second aspect, the present invention provides an emotion analyzing system based on multi-channel data, as shown in fig. 4, the system includes:

It is understood that the emotion analysis system provided by the second aspect corresponds to the emotion analysis method provided by the first aspect, and the explanation, the example, the detailed description, the beneficial effects, and the like of the content can be referred to the corresponding content in the first aspect.

at least one memory;

and at least one processor, wherein:

the at least one memory is for storing a computer program;

the at least one processor is configured to invoke a computer program stored in the at least one memory to perform the sentiment analysis method provided in the first aspect.

It is to be understood that each unit in the emotion analyzing system provided by the second aspect is a computer program module, and the computer program modules are computer programs stored in the at least one memory.

In a fourth aspect, the present invention provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the lattice analysis method provided by the first aspect.

In a fifth aspect, the invention provides a computer program comprising computer-executable instructions that, when executed, cause at least one processor to perform the method of emotion analysis provided in the first aspect.

It is to be understood that the explanation, the detailed description, the examples, the advantages, and the like of the contents of the computer device, the computer-readable storage medium, and the computer program provided in the third to fifth aspects can be referred to the corresponding parts in the first aspect.

It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

The above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. An emotion analysis method based on multichannel data and a recurrent neural network is characterized by comprising the following steps:

the preset video comprises at least one type of video of sadness, anger, happiness, surprise, fear and disgust;

the emotion analysis model comprises a pre-trained recurrent neural network, a preset first full-connection layer and a preset activation function; the cyclic neural network comprises a third preset number of long-time and short-time memory network units, the third preset number is the number of characteristic data in the total characteristic array, the long-time and short-time memory network units of the third preset number are sequentially connected, input data of a first long-time and short-time memory network unit is first characteristic data in the total characteristic array, and input data of each long-time and short-time memory network unit except the first long-time and short-time memory network unit is output data of a previous long-time and short-time memory network unit and corresponding characteristic data in the total characteristic array; the recurrent neural network is used for outputting first emotion data according to the total feature array, the first full-connection layer is used for converting the first emotion data into second emotion data with a second preset number, and the second preset number is the number of emotion types; the activation function is used for determining the proportion of each type of emotion of the person to be analyzed according to the second preset amount of second emotion data;

the training process of the recurrent neural network comprises the following steps:

respectively marking the types of emotions generated in the process that a plurality of training objects watch the preset video;

respectively acquiring facial expression pictures, voice data, infrared pulse data and skin resistance data of the training objects in the process of watching a preset video;

respectively converting the voice data, the infrared pulse data and the skin resistance data of each training object into corresponding spectrogram;

respectively inputting the facial expression picture, the spectrogram corresponding to the voice data, the spectrogram corresponding to the infrared pulse data and the spectrogram corresponding to the skin resistance data of each training object into a preset convolutional neural network model to obtain respective corresponding feature arrays;

combining the characteristic data corresponding to each training object to obtain a total characteristic array of the training object;

and performing cyclic neural network training according to the respective total feature arrays of the plurality of training objects and the emotion types marked on the plurality of training objects to obtain a cyclic neural network.

2. An emotion analysis method as claimed in claim 1, wherein the structure of the convolutional neural network model includes five convolutional units connected in sequence and a second fully-connected layer connected to an output terminal of the fifth convolutional unit; wherein: each convolution unit comprises a convolution layer and a down-sampling layer connected with the output end of the convolution layer; and the second full-link layer is used for converting the number of the output data of the fifth convolution unit into a first preset number.

3. The emotion analyzing method as recited in claim 2,

the convolution layer in the first convolution unit of the five convolution units comprises 96 convolution kernels of 11 × 11, the sampling kernel of the down-sampling layer in the first convolution unit is 3 × 3, and the sampling step is 2; and/or

The convolution layer in the second convolution unit of the five convolution units comprises 128 convolution kernels of 5 × 5, the sampling kernel of the down-sampling layer in the second convolution unit is 3 × 3, and the sampling step is 1; and/or

The convolution layer in the third convolution unit in the five convolution units comprises 192 convolution kernels of 3 × 3, the sampling kernel of the down-sampling layer in the third convolution unit is 3 × 3, and the sampling step is 1; and/or

The convolution layer in the fourth convolution unit in the five convolution units comprises 192 convolution kernels of 3 × 3, the sampling kernel of the down-sampling layer in the third convolution unit is 3 × 3, and the sampling step is 1; and/or

The convolution layer in the fifth convolution unit of the five convolution units comprises 128 convolution kernels of 3 × 3, the sampling kernel of the down-sampling layer in the fifth convolution unit is 3 × 3, and the sampling step is 1.

4. An emotion analysis method as claimed in any one of claims 1 to 3, wherein the structure of each long-and-short memory network element includes an input gate, an output gate and a memory gate; wherein:

f_t=σ（W_f · [h_t-1,x_t]+b_f）

in the formula (f)_tMemorizing the output data of an input gate of the network unit for the tth long time; h is_t-1For the t-1 th long-short time memory of the output data, x of the network unit_tFor the ith long-and-short term, the corresponding characteristic data, W, of the network unit in the total characteristic array is memorized_fAnd b_fIs a pre-trained parameter; the calculated function represented by σ is f (x) = 1/[1+ e ^ (x) ^ x)]；

C_t=f_t*C_t-1+i_t* _tg

in the formula, C_tFor the tth duration of the output data of the memory gate of the memory network unit, C_t-1For the t-1 th long-and-short term memory of the output data of the memory gate of the network unit, i_t=σ（W_i ·[h_t-1,x_t]+b_i），g _t =tanh(W_g ·[h_t-1,x_t]+b_g)，W_i、b_i、W_g、b_gIs a pre-trained parameter;

h_t= O_t*tanh(C_t)

in the formula, h_tMemorizing the output data of an output gate of the network unit for the tth long time; o is_t=σ（W_o[h_t-1,x_t]+b_o)，W_o、b_oAre pre-trained parameters.

5. An emotion analysis method as claimed in any one of claims 1 to 3, wherein the activation function includes:

in the formula, the ratio of the ith type emotion of the person to be analyzed is ith second emotion data output by the first full connection layer, and C is a second preset number.

6. An emotion analysis system based on multichannel data and a recurrent neural network, comprising:

the emotion determining unit is used for combining the characteristic arrays to obtain a total characteristic array, inputting the total characteristic array into an emotion analysis model, and obtaining the proportion of each type of emotion of the person to be analyzed; the emotion analysis model comprises a pre-trained recurrent neural network, a preset first full-connection layer and a preset activation function; the cyclic neural network comprises a third preset number of long-time and short-time memory network units, the third preset number is the number of characteristic data in the total characteristic array, the long-time and short-time memory network units of the third preset number are sequentially connected, input data of a first long-time and short-time memory network unit is first characteristic data in the total characteristic array, and input data of each long-time and short-time memory network unit except the first long-time and short-time memory network unit is output data of a previous long-time and short-time memory network unit and corresponding characteristic data in the total characteristic array; the recurrent neural network is used for outputting first emotion data according to the total feature array, the first full-connection layer is used for converting the first emotion data into second emotion data with a second preset number, and the second preset number is the number of emotion types; the activation function is used for determining the proportion of each type of emotion of the person to be analyzed according to the second preset number of second emotion data

7. A computer device, comprising:

at least one memory;

and at least one processor, wherein:

the at least one memory is for storing a computer program;

the at least one processor is configured to invoke a computer program stored in the at least one memory to perform the sentiment analysis method of any one of claims 1 to 5.

8. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, is adapted to carry out a sentiment analysis method according to any one of claims 1 to 5.

9. A computer program comprising computer-executable instructions that, when executed, cause at least one processor to perform a sentiment analysis method as claimed in any one of claims 1 to 5.