CN111813894A

CN111813894A - Natural language emotion recognition method based on deep learning

Info

Publication number: CN111813894A
Application number: CN202010613189.7A
Authority: CN
Inventors: 蔡淑娜; 张素平; 苏晓丹; 孙玉龙; 张林瑞; 李志�; 吴皓; 李泽清; 符佳慧
Original assignee: Zhengzhou Xinda Institute of Advanced Technology
Current assignee: Zhengzhou Xinda Institute of Advanced Technology
Priority date: 2020-06-30
Filing date: 2020-06-30
Publication date: 2020-10-23

Abstract

The invention provides a natural language emotion recognition method based on deep learning, which comprises the steps of training a text feature extraction model doc2vec, a picture feature extraction model CNNs and an emotion recognition model CNN according to mass public sentiment data; collecting new public sentiment information, cleaning and classifying public sentiment data, inputting text information into a doc2vec model, and extracting text characteristics; outputting the picture data to a CNNs model, and extracting picture emotional characteristics; performing feature layer fusion on the text and picture emotional features extracted in the first two steps by adopting a feature layer fusion algorithm based on a kernel matrix; and inputting the obtained fusion characteristics into an emotion recognition model and outputting an emotion recognition result.

Description

Natural language emotion recognition method based on deep learning

Technical Field

The invention relates to the technical field of natural language processing, in particular to a natural language emotion recognition method based on deep learning.

Background

In recent years, the internet industry in China is rapidly developed, 45 th reports of a China Internet information center (CNNIC) show that the scale of the netizens in China is 9.04 hundred million and the popularity of the internet reaches 64.5 percent as long as 3 months in 2020, and the scale of users of most network applications is greatly increased. As an open platform, the internet public opinion environment is mixed with the dragon, wherein suggestions and opinions beneficial to social development can provide convenience for the public, and positive energy is brought to people, while partial unfamiliar and distorted public opinions can bring serious psychological panic and safety threat to people and countries if timely and effective public opinion guidance and supervision cannot be carried out.

The network emotion recognition refers to analyzing viewpoints and emotions published by netizens in the internet by a certain method, and dividing the viewpoints and the emotions into different categories according to different standards. Emotion recognition belongs to one of important research contents in the field of natural language processing, and provides essential technical support for government opinion investigation, merchant marketing strategy formulation, internet public opinion guidance and supervision and the like. At present, algorithms and technologies are provided by the academic community aiming at network emotion recognition, most of the researches adopt calculation means, the recognition precision is good when the linguistic data is rich, the emotion tendency is clear and the distribution is uniform, and the network emotion recognition, research and judgment and guidance are well achieved. However, in the face of massive and complex public opinion data in reality, the traditional method sometimes has the problems of unstable performance, low precision and the like, and a new technical method is urgently needed to be sought to break through the existing dilemma.

However, most of the existing network emotion recognition algorithms are based on a single mode of text, and the research of combining text and picture multi-mode features for emotion recognition is not widely concerned by scholars in the field at home and abroad. With the arrival of a new media age, netizens tend to express their emotions in the form of text and images, even sometimes, the same text and different images express completely opposite meanings, so that it is difficult to identify their true emotions sometimes by ignoring the images only through the text, and most of the existing emotion identification algorithms based on the single mode of text adopt word2vec models, neglecting the influence of the arrangement order among words on sentence or text emotion analysis, and easily omitting text hidden information.

In order to solve the above problems, people are always seeking an ideal technical solution.

Disclosure of Invention

Aiming at the defects of the prior art, the invention provides a natural language emotion recognition model training method based on deep learning and a natural language emotion recognition method based on deep learning, so as to solve the problem of inaccurate emotion recognition caused by neglected picture emotion and insufficient superiority of the conventional algorithm model, and effectively improve the emotion recognition accuracy aiming at image-text multimodal data.

In order to achieve the purpose, the invention adopts the technical scheme that: a natural language emotion recognition model training method based on deep learning comprises the following steps:

s1, acquiring mass image-text public sentiment data, and selecting one of the public sentiment data containing a plurality of pictures with the most emotion representative picture; preprocessing the image-text public sentiment data, including removing stop words, segmenting the text, removing meaningless symbols in the text and converting emoticons into corresponding emotional words;

s2, dividing the graph public sentiment data into a training set and a verification set, and dividing the public sentiment data in the training set and the verification set into text data and picture data respectively;

s3, establishing a doc2vec model, wherein a training algorithm adopted by the doc2vec model is a Skip-Gram model, the window length window is set to be 5, the text feature dimension size is set to be 100, the minimum occurrence frequency min _ count of a training word is set to be 3, and the maximum iteration frequency iter in training is set to be 5000;

preprocessing the text data in the training set and the verification set, training a doc2vec model by using the text data in the training set, and verifying the doc2vec model by using the text data in the verification set; outputting a text feature extraction result in each training and verification;

establishing a CNNs model, selecting a VGG-16 model pre-trained by ImageNet as a reference model, modifying the final layer of full-connection output, and replacing the previous full-connection layer; the modified model comprises 13 convolutional layers, 5 pooling layers and 2 full-connection layers; the weights of the last two fully-connected layers are initialized randomly, weights in a pre-trained reference model are used by the other layers, and the number of neurons is set to be 3; freezing the network layer before the last CNN layer without participating in model training, and only keeping the last CNN layer and the network layer after the last CNN layer to participate in training;

preprocessing the picture data in the training set and the verification set, and zooming the picture data by adopting a cv2.resize () function to form picture data matched with the CNNs model; utilizing picture data in a training set to train the CNNs model, and utilizing picture data in a verification set to verify the CNNs model; outputting a picture emotion feature extraction result in each training and verification;

s4, respectively selecting Gaussian kernel functions as kernel functions to obtain kernel matrixes of the text emotional characteristics and the picture emotional characteristics, and performing weighted fusion on the kernel matrixes, wherein the weighting coefficients of the kernel matrixes of the text emotional characteristics and the picture emotional characteristics are 0.75 and 0.25 respectively; selecting a principal component analysis method to perform dimensionality reduction on the fusion feature matrix to obtain fusion features;

and S5, establishing a CNN network, inputting the obtained fusion characteristics into the CNN network for parameter training, and obtaining an emotion recognition model.

The invention also provides a natural language emotion recognition method based on deep learning, which comprises the following steps:

setting a theme target according to the information requirement of a user, collecting mass and complex public opinion data related to the theme target in the Xinlang microblog by utilizing a social media network crawler, and selecting one picture with the most emotion representative from the microblog containing a plurality of pictures;

pre-processing public sentiment data, including removing stop words, segmenting words of a text, removing meaningless symbols in the text, and converting emoticons into corresponding emotional words;

dividing public opinion data into text data and picture data, and respectively preprocessing the text data and the picture data;

inputting the preprocessed text data into a trained doc2vec model for text feature extraction; inputting the preprocessed picture data into a trained CNNs model, and extracting picture emotional characteristics;

respectively selecting Gaussian kernel functions as kernel functions to obtain kernel matrixes of the text emotional characteristics and the picture emotional characteristics, and performing weighted fusion on the kernel matrixes, wherein the weighting coefficients of the kernel matrixes of the text emotional characteristics and the picture emotional characteristics are 0.75 and 0.25 respectively; reducing the dimension of the fusion feature matrix by adopting a principal component analysis method to obtain fusion features;

inputting the obtained fusion characteristics into a trained emotion recognition model, and outputting an emotion recognition result;

wherein the doc2vec model, the CNNs model and the emotion recognition model are trained by adopting the model training method.

Compared with the prior art, the method has outstanding substantive characteristics and remarkable progress, and particularly, the method adopts doc2vec program to extract text features, so that the text feature vector expression is more accurate due to the consideration of the influence of the arrangement sequence among words on sentence or text emotion analysis, and the hidden information of the text is better mined; the picture emotion characteristics are extracted by adopting a CNNs model, a Gaussian function is selected as a kernel function to obtain a kernel matrix, the obtained kernel matrices of two modes are subjected to weighted fusion to obtain fusion characteristics, and then the CNN network is adopted to finish emotion recognition so as to improve the network emotion recognition accuracy, wherein the algorithm adopted by each submodule has higher operability and superiority, so that the method has high practicability in network public opinion recognition and supervision application.

Drawings

Fig. 1 is a schematic flow chart of the identification method of the present invention.

Detailed Description

The technical solution of the present invention is further described in detail by the following embodiments.

The invention provides a natural language emotion recognition model training method based on deep learning, which comprises the following steps:

s1, acquiring mass image-text public sentiment data, and selecting one of the public sentiment data containing a plurality of pictures with the most emotion representative picture; and preprocessing the image-text public sentiment data, including removing stop words, segmenting the text, removing meaningless symbols in the text and converting the emoticons into corresponding emotional words.

And S2, dividing the picture public opinion data into a training set and a verification set, and dividing the public opinion data in the training set and the verification set into text data and picture data respectively.

S3, a doc2vec model is established, a training algorithm adopted by the doc2vec model is a Skip-Gram model, the window length window is set to be 5, the text feature dimension size is set to be 100, the minimum occurrence frequency min _ count of a training word is set to be 3, and the maximum iteration frequency iter in training is set to be 5000.

Preprocessing the text data in the training set and the verification set, training a doc2vec model by using the text data in the training set, and verifying the doc2vec model by using the text data in the verification set; and outputting a text feature extraction result in each training and verification.

Establishing a CNNs model, selecting a VGG-16 model pre-trained by ImageNet as a reference model, modifying the final layer of full-connection output, and replacing the previous full-connection layer; the modified model comprises 13 convolutional layers, 5 pooling layers and 2 full-connection layers; the weights of the last two fully-connected layers are initialized randomly, weights in a pre-trained reference model are used by the other layers, and the number of neurons is set to be 3; and (4) freezing the network layer before the last layer of CNN, not participating in model training, and only keeping the last layer of CNN and the network layer after the last layer of CNN to participate in training.

Specifically, the CNNs model has the following structure:

the first layer and the second layer are two layers of same convolution layers, the input picture pixel is 224 × 3, the image is a three-channel RGB image, the convolution filter pixel is 3 × 3, the step length is 1, the filter number is 64, and the activation function is a ReLU function;

the third layer is a pooling layer, the size of a convolution kernel is 2, the step length is 2, and the filling size is 0;

the fourth layer and the fifth layer are two same convolution layers, the convolution filter pixels are 3 x 3, the step length is 1, the filter number is 64, and the activation function is a ReLU function;

the sixth layer is a pooling layer, the size of a convolution kernel is 2, the step length is 2, and the filling size is 0;

the seventh layer to the ninth layer are three same convolution layers, the convolution filter pixels are 3 x 3, the step length is 1, the filter number is 64, and the activation function is a ReLU function;

the tenth layer is a pooling layer, the size of a convolution kernel is 2, the step length is 2, and the filling size is 0;

the eleventh layer to the thirteenth layer are three same convolution layers, the convolution filter pixels are 3 x 3, the step length is 1, the filter number is 64, and the activation function is a ReLU function;

the fourteenth layer is a pooling layer, the size of the convolution kernel is 2, the step length is 2, and the filling size is 0;

the sixteenth layer to the seventeenth layer are three same convolution layers, the convolution filter pixels are 3 x 3, the step length is 1, the filter number is 64, and the activation function is a ReLU function;

the eighteenth layer is a pooling layer, the convolution kernel size is 2, the step size is 2, the filling size is 0, and an image with 7 × 512 pixels is output;

the nineteenth layer and the twentieth layer are the same full connection layer, the input is the output of the previous layer, and the picture emotional characteristics are output;

the last layer is a Softmax classification layer for generating classification results.

Preprocessing the picture data in the training set and the verification set, and zooming the picture data by adopting a cv2.resize () function to form picture data matched with the CNNs model; utilizing picture data in a training set to train the CNNs model, and utilizing picture data in a verification set to verify the CNNs model; and outputting a picture emotion feature extraction result in each training and verification.

S4, respectively selecting Gaussian kernel functions as kernel functions to obtain kernel matrixes of the text emotional characteristics and the picture emotional characteristics, and performing weighted fusion on the kernel matrixes; selecting a principal component analysis method to perform dimensionality reduction on the fusion feature matrix to obtain fusion features; and inputting the obtained fusion characteristics into a CNN network for parameter training to obtain an emotion recognition model.

For arbitrary data

The expression of the kernel matrix is as follows:

the kernel matrix weighted addition method is as follows:

wherein the content of the first and second substances,

、

a kernel matrix for text data and picture data respectively,

a weighting coefficient for the text emotional characteristics,

Is the weighting coefficient of the kernel matrix of the picture emotion characteristics,

in practical application, the weighting coefficient needs to be dynamically adjusted according to different databases to obtain the optimal result.

In this embodiment, the graphic public opinion data is from the internet public opinion event in the Xinlang microblog, and for the microblog containing a plurality of pictures, only one of the microblogs with the most emotional representativeness is selected, and 2000 groups of texts and pictures are collected, and in addition, 2691561 pieces of training word vectors are obtained by using the microblog data captured in the previous research in recent years and cleaned.

Based on the image-text public opinion data, after text emotional characteristics and picture emotional characteristics are extracted respectively, a Gaussian function is selected as a kernel function to obtain a kernel matrix, the obtained kernel matrices of two modes are subjected to weighted fusion, different weights are set, the weight occupied by the text characteristics is set as the weight occupied by the picture characteristics, the result obtained under different weight settings is shown in the following table, and as can be seen from table 1, when the weight occupied by the text data is 0.75, and the weight occupied by the picture is 0.25, the recognition rate is highest and is 93.1%.

	0.85	0.75	0.65	0.5	0.35	0.25	0.15
								0.15	0.25	0.35	0.5	0.65	0.75	0.85
Average recognition rate (%)	90.3%	93.1%	90.6%	83.5%	80.1%	60.2%	55.8%

TABLE 1 recognition results under different text and picture feature weights

Therefore, in this embodiment, the weighting coefficients of the kernel matrices of the text emotion characteristics and the picture emotion characteristics are respectively selected to be 0.75 and 0.25.

S5, establishing a CNN network, inputting the obtained fusion characteristics into the CNN network for parameter training, and obtaining an emotion recognition model;

it should be noted that, in the CNN network training learning rate η, a strategy that η exponentially decays with the number of iterations is adopted, and the strategy is as follows:

wherein T is the number of iterations, T is the step length, n

L is not more than

Is the largest integer of (a).

As shown in fig. 1, the present invention further provides a natural language emotion recognition method based on deep learning, including:

Finally, it should be noted that the above examples are only used to illustrate the technical solutions of the present invention and not to limit the same; although the present invention has been described in detail with reference to preferred embodiments, those skilled in the art will understand that: modifications to the specific embodiments of the invention or equivalent substitutions for parts of the technical features may be made; without departing from the spirit of the present invention, it is intended to cover all aspects of the invention as defined by the appended claims.

Claims

1. A natural language emotion recognition model training method based on deep learning is characterized by comprising the following steps:

2. The method for training the deep learning-based natural language emotion recognition model according to claim 1, wherein the CNNs model has the following structure:

3. A natural language emotion recognition method based on deep learning is characterized by comprising the following steps:

inputting the obtained fusion characteristics into a trained emotion recognition model, and outputting emotion recognition results which are divided into positive, negative and neutral classes;

wherein the doc2vec model, the CNNs model and the emotion recognition model are trained by the training method of claim 1 or 2.