CN113435192A

CN113435192A - Chinese text emotion analysis method based on changing neural network channel cardinality

Info

Publication number: CN113435192A
Application number: CN202110658901.XA
Authority: CN
Inventors: 王丽亚; 陈哲
Original assignee: Individual
Current assignee: Individual
Priority date: 2021-06-15
Filing date: 2021-06-15
Publication date: 2021-09-24

Abstract

The invention provides a Chinese text sentiment analysis method based on changing of a neural network channel base number, which comprises the following steps: step1, preprocessing the Chinese text data set through a word embedding system to obtain a word sequence corresponding to Chinese text information and a word vector matrix corresponding to the word sequence; step2, constructing a multi-channel model, wherein the multi-channel model consists of n groups of parallel CNN-BiGRU-Attention networks, and performing feature extraction by using the multi-channel model to obtain n corresponding local feature vectors; step3, splicing the n local feature vectors together by using a full-join function to generate a global feature vector; and Step4, inputting the generated global feature vector into a sigmoid classifier for classification to obtain a final emotion analysis result.

Description

Chinese text emotion analysis method based on changing neural network channel cardinality

Technical Field

The invention relates to the technical field of natural language processing, in particular to a Chinese text emotion analysis method based on changing of a neural network channel base number.

Background

Text emotion Analysis (Sentiment Analysis) refers to a process of analyzing, processing and extracting subjective text with emotional colors by using natural language processing and text mining technologies. The fields of the method include natural language processing, text mining, information retrieval, information extraction, machine learning and the like. With the development of social multimedia, a large amount of comment data with emotional tendency and the like are provided, so that the emotion analysis technology plays an increasingly important role in understanding the attitude and the opinion of people to certain events, a neural deep learning model is widely applied to the emotion analysis aspect, at present, the technology of deep learning applied to the emotion analysis field mainly comprises word embedding, CNN, RNN, attention mechanism and the like, but networks such as RNN and the like have the defect of high complexity, in addition, most of the existing emotion analysis methods extract text features by constructing a single-channel or double-channel neural network model, and as the number of network layers is increased, the performance of the channel model is influenced, and the characteristics of the text cannot be fully extracted.

In summary, it is an urgent need to solve the problem of the skilled in the art to provide a method for analyzing Chinese text emotion based on changing the channel cardinality of the neural network, which can determine the emotional tendency of the Chinese text by changing the channel cardinality of the neural network and effectively improve the emotion analysis accuracy of the Chinese text.

Disclosure of Invention

Aiming at the problems and the requirements mentioned above, the scheme provides a Chinese text sentiment analysis method based on changing the neural network channel base number, and the technical problems can be solved by adopting the following technical scheme.

In order to achieve the purpose, the invention provides the following technical scheme: a Chinese text emotion analysis method based on changing of neural network channel cardinality comprises the following steps: step1, preprocessing a Chinese text data set through a word embedding system to obtain a word sequence corresponding to Chinese text information and a word vector matrix corresponding to the word sequence, wherein the word vector is used for representing the positions of words and words in the word sequence;

step2, constructing a multi-channel model, wherein the multi-channel model consists of n groups of parallel CNN-BiGRU-Attention networks, and performing feature extraction by using the multi-channel model to obtain n corresponding local feature vectors;

step3, splicing the n local feature vectors together by using a full-join function to generate a global feature vector;

and Step4, inputting the generated global feature vector into a sigmoid classifier for classification to obtain a final emotion analysis result.

Further, the pre-processing comprises: obtaining a Chinese text data set, carrying out word segmentation on original text data in the Chinese text data set to obtain a sentence sequence consisting of single words, inputting the sentence sequence into a Skip-gram model training word vector in word2vec, and further obtaining a word vector matrix S ═ { x ═ corresponding to the sentence sequence₁,x₂,…,x_nWhere the word vector for each word in the sentence sequence is x_i，x_i∈R^n×dN is the number of words and d is the vector dimension.

Furthermore, the feature extraction process of each group of CNN-BiGRU-Attention network comprises the following steps:

A. inputting the word vector matrix into a CNN network to extract local characteristic information of input text data, dividing and combining word vectors corresponding to the word vector matrix according to different word quantities by using convolution kernels with 3 sizes of a convolution neural network CNN to extract characteristics, and obtaining 3 corresponding different characteristic values;

B. connecting the 3 corresponding different feature values to obtain a serialized phrase feature U, inputting the serialized phrase feature U into a BiGRU network for learning, and acquiring the forward and backward contact information of the phrase feature of the level;

C. and B, highlighting important information in the local features by using a feedforward Attention mechanism Attention network, distributing corresponding weights to the feature information extracted in the step B, and screening to obtain the most important emotional information features.

Furthermore, the CNN network includes a convolutional layer, a pooling layer, and a full-link layer, and the word vector matrix S is input into 3 filters set in the convolutional layer to extract local features of the input text data, so as to obtain a feature vector C ═ C¹,c²,…,c^m]Wherein c is^jFeature vectors obtained for a filter, c^j＝[c₁,c₂,...,c_n-h+1]H is the size of the convolution kernel, c_iIs a local feature vector; the pooling layer is used for matching the feature vector c by adopting a Max scaling technology^jDown-sampling to obtain optimal solution M of local value^jObtaining a feature vector M with the length M, wherein M is [ M ═ M¹,M²,…,M^m]Wherein M is^j＝max(c₁,c₂,...,c_n-h+1)＝max{c^j}; and inputting the characteristic vector M into a full-connection layer to obtain a vector U.

Furthermore, the outputs of the 3 filters after the features are pooled are respectively marked as M₃、M₄、M₅Using full-join function to process the vector M after the pooling layer processing₃、M₄、M₅Connected as a vector U, where U ═ M₃,M₄,M₅}。

Furthermore, the vector U is input into a BiGRU network for learning, and the forward and backward contact information of the sequence phrase characteristics is extracted to obtain the latent state semantic code h_t。

Further, the step of assigning corresponding weights to the feature information extracted in the step B and performing screening to obtain the most important emotion information includes: generating a target attention weight v_t，v_t＝tanh(h_t) Tanh is a function for attention mechanics; the attention weight is then probabilistic, according to the formula:

generation of probability vectors p by softmax function_t(ii) a And finally, according to a formula:

the generated attention weight is configured to the corresponding hidden layer state semantic code h_tWherein a is_tIs h_tWeighted average of p_tIs a weight value, a new vector a processed by a feedforward attention mechanism_tProcessing by a dropout layer to obtain a characteristic vector a_t ^r，a_t ^rIs the feature vector at time t.

Furthermore, n corresponding feature vectors are obtained and are marked as a_t ⁿAnd splicing the n feature vectors together by using a full-concatenation function, and recording the spliced n feature vectors as feature vectors:

A＝{a_t ¹:a_t ²:a_t ³:a_t ⁴:a_t ⁵:a_t ⁶:a_t ⁷:a_t ⁸:a_t ⁹:a_t ¹⁰}，

wherein: a splice symbol is represented.

Further, the inputting the generated global feature vector into a sigmoid classifier for classification includes: adding a Dense layer, inputting a feature vector A serving as an input element into the Dense layer, wherein the parameter of the Dense layer is 1, an activation function is a sigmoid function, and according to a formula:

p(y＝1|x,ω)＝h_ω(x)＝g(ω^Tx)＝1/(1+exp(-ω^Tx)) are classified with a sigmoid function,

wherein, the sample is x, y is 0 or 1, which represents positive class or negative class, x is sample feature vector, omega is training model parameter,

by using

The model parameters omega are trained as a loss function,

wherein, y_iIs input x_iTrue class of h_ω(x_i) For predicting input x_iProbability of belonging to class 1.

According to the technical scheme, the invention has the beneficial effects that: the emotional tendency of the Chinese text can be judged by changing the channel base number of the neural network, and the accuracy of the emotional analysis of the Chinese text can be effectively improved.

In addition to the above objects, features and advantages, preferred embodiments of the present invention will be described in more detail below with reference to the accompanying drawings so that the features and advantages of the present invention can be easily understood.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments of the present invention or the prior art will be briefly described, wherein the drawings are only used for illustrating some embodiments of the present invention and do not limit all embodiments of the present invention thereto.

FIG. 1 is a schematic diagram illustrating specific steps of a Chinese text sentiment analysis method based on changing a neural network channel cardinality according to the present invention.

Fig. 2 is a schematic diagram of the specific steps of the feature extraction process of each group of CNN-BiGRU-Attention networks in this embodiment.

Fig. 3 is a schematic structural diagram of the multi-channel model in this embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings of specific embodiments of the present invention. Like reference symbols in the various drawings indicate like elements. It should be noted that the described embodiments are only some embodiments of the invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the described embodiments of the invention without any inventive step, are within the scope of protection of the invention.

The channel cardinality (channel cardinality) refers to the number of channels in the model in this embodiment, and it can be understood that when the neural network model is established, after the embedding layer (embedding _ layer), n sets of parallel CNN-BiGRU-orientation networks are used for performing the feature extractionAnd (4) feature extraction, namely performing feature splicing before a classifier layer. For example, the 1-CBGA model is composed of 1 group of CNN-BiGRU-Attention networks, and the number of the channel bases is 1. The 2-CBGA model is formed by 2 groups of CNN-BiGRU-Attention networks in parallel, and the base number of the channel is recorded as 2. As shown in fig. 1 to 3, the method specifically includes: step1, preprocessing a Chinese text data set through a word embedding system to obtain a word sequence corresponding to Chinese text information and a word vector matrix corresponding to the word sequence, wherein the word vector is used for representing the position of a word and a word in the word sequence, and the preprocessing comprises the following steps: obtaining a Chinese text data set, carrying out word segmentation on original text data in the Chinese text data set to obtain a sentence sequence consisting of single words, inputting the sentence sequence into a Skip-gram model training word vector in word2vec, and further obtaining a word vector matrix S ═ { x ═ corresponding to the sentence sequence₁,x₂,…,x_nWhere the word vector for each word in the sentence sequence is x_i，x_i∈R^n×dN is the number of words and d is the vector dimension.

Because the influence of the channel base number on the model effect exists, and meanwhile, the model network is more complex and the model requires more parameters with the increase of the channel base number, and the time cost is increased. Due to the limitation of experimental conditions, in the embodiment, only the model analysis of the channel base numbers of 1-10 is given, but the method is not limited to the channel base numbers of 1-10. FIG. 3 shows a model diagram with channel cardinality of 1-10, where n is the model channel cardinality, and n is ∈ {1,2,3,4,5,6,7,8,9,10 }. When preprocessing Chinese text data, a sentence needs to be split into words to obtain a sentence list composed of single words, for example: [ …, [ ' disperse ', ' hot ', ' non ', ' normal ', ' rod ', ' | for! ' ], … ]. And then, constructing a word vector by using a Skip-gram model in word2vec, wherein if the size of a context window is 5, and the vector form corresponding to the current word W (t) is V (W (t)), and the vector forms corresponding to the 4 surrounding words are V (W (t +2)), V (W (t +1)), V (W (t-1) and V (W (t-2)), the Skip-gram model predicts the surrounding words through the central word, and is solved by using the conditional probability value of the central word vector V (W (i)) V (W (t)), as shown in a formula P (V (W (i)) | V (W (t)). After preprocessing is completed, the word vector matrix is input into a multi-channel model for feature extraction.

And Step2, constructing a multi-channel model, wherein the multi-channel model consists of n groups of parallel CNN-BiGRU-Attention networks, and performing feature extraction by using the multi-channel model to obtain corresponding n local feature vectors.

As shown in fig. 2, in the method, the feature extraction process of each group of CNN-BiGRU-Attention network includes:

Specifically, the CNN network includes a convolutional layer, a pooling layer, and a full-link layer, and the word vector matrix S is input to the convolutional layer and the 3 filters having the set size are used to extract local features of input text data, so as to obtain a feature vector C ═ C¹,c²,…,c^m]Wherein c is^jFeature vectors obtained for a filter, c^j＝[c₁,c₂,...,c_n-h+1]M is the number of convolution kernels, h is the size of the convolution kernels, c_iIs a local feature vector, and c_i＝f(ω×x_i:i+h-1+ b) wherein c_iIs the local feature extracted by the convolution operation; f represents a non-linear function; ω is a filter of size h × d; h is the size of the convolution kernel, representing h words; x is the number of_i:i+h-1Is a phrase vector consisting of h words from i to i + h-1; b is an offsetAn item. For example, after passing through the convolutional layer, a filter obtains the eigenvector c¹，c¹＝[c₁,c₂,...,c_n-h+1]. Obtaining a characteristic vector C ═ C by m convolution kernels¹,c²,…,c^m](ii) a The pooling layer is used for matching the feature vector c by adopting a Max scaling technology^jDown-sampling to obtain optimal solution M of local value^jObtaining a feature vector M with the length M, wherein M is [ M ═ M¹,M²,…,M^m]Wherein M is^j＝max(c₁,c₂,...,c_n-h+1)＝max{c^j}, e.g. by applying Max boosting techniques to the local feature vector c¹Down-sampling to obtain optimal solution M of local value¹And M is¹＝max(c₁,c₂,...,c_n-h+1)＝max{c¹}; and inputting the feature vector M into a full-connection layer to obtain a vector U, wherein the outputs of the pooled features extracted by the 3 filters are respectively recorded as M₃、M₄、M₅Using full-join function to process the vector M after the pooling layer processing₃、M₄、M₅Connected as a vector U, where U ═ M₃,M₄,M₅}. Inputting the vector U into a BiGRU network for learning, and extracting the forward and backward contact information of the sequence phrase characteristics to obtain a latent state semantic code h_t. In this embodiment, the implemented CNN network is different from the ordinary CNN network in that 3 filters with convolution kernels of 3,4, and 5 are provided, because the sizes of the convolution kernels are different, different phrases can be formed for different partitions of sentences, that is, different feature values can be extracted through convolution windows of different sizes, so that relatively comprehensive features can be extracted. And since the input of BiGRU must be a serialized structure, the vector M after the pooling layer needs to be connected using a full-join function (Concatenate)₃、M₄、M₅And connecting the two continuous high-order windows into a vector U, taking a new continuous high-order window U as the input of the BiGRU, and learning the obtained serialized phrase features U to obtain the forward and backward contact information of the phrase features of the level, namely the features of a sentence system.

The BiGRU consists of a forward GRU,And the output state connecting layers of the reverse GRU and the forward GRU form a neural network. If the hidden state of the forward GRU output at the moment of t is recorded as

The hidden state of the reverse GRU output is

Then the hidden state h of the BiGRU output_tThe specific calculation process is as follows:

wherein, w_t,v_tIs the weight matrix, GRU: GRU function, U_t: GRU input at time t, b_t: a bias vector.

Then, a feedforward attention mechanism is introduced to obtain a sentence hiding state h_tNew vector a after weight assignment_tAnd B, distributing corresponding weight to the characteristic information extracted in the step B and screening, wherein the step B of obtaining the most important emotion information comprises the following steps: generating a target attention weight v_t，v_t＝tanh(h_t) Tanh is a function for attention mechanics; the attention weight is then probabilistic, according to the formula:

configuring the generated attention weight to the pairLatent state semantic coding of content_tWherein a is_tIs h_tWeighted average of p_tIs a weight value, a new vector a processed by a feedforward attention mechanism_tProcessing by a dropout layer to obtain a characteristic vector a_t ^r，a_t ^rIs the feature vector at time t.

Step3, the n local feature vectors are spliced together using a full join function to generate a global feature vector.

Specifically, the corresponding n feature vectors obtained above are denoted as a_t ⁿAnd splicing the n feature vectors together by using a full-concatenation function, and recording the spliced n feature vectors as feature vectors:

wherein: a splice symbol is represented.

Step4, inputting the generated global feature vector into a sigmoid classifier for classification to obtain a final emotion analysis result, wherein the Step of inputting the generated global feature vector into the sigmoid classifier for classification specifically comprises the following steps: adding a Dense layer, inputting a feature vector A serving as an input element into the Dense layer, wherein the parameter of the Dense layer is 1, an activation function is a sigmoid function, and according to a formula: p (y ═ 1| x, ω) ═ h_ω(x)＝g(ω^Tx)＝1/(1+exp(-ω^Tx)) is classified by a sigmoid function, wherein the sample is { x, y }, y is 0 or 1 and represents a positive class or a negative class, x is a sample feature vector, and omega is a training model parameter and is adopted

Training model parameters ω as a loss function, where yi is the input x_iTrue class of h_ω(x_i) For predicting input x_iProbability of belonging to class 1.

In the embodiment, the data set experiment data used for the channel base experiment is from Chinese web shopping comment text provided by BUPTLdy personal github website, and the data set setting is shown in Table 1.

Table 1 experimental data set-up

Data set	Training set	Verification set	Test set	Total number of
					Data	16884	2000	2221	21105

Considering time cost, experimental software and hardware environment configuration and the like, only the network model with the channel base number of 1-10 is researched and realized, and the comparative experimental setup is shown in table 2.

Table 2 comparative experimental setup

ChannelRadix	Name of model
		1	1-CBGA
2	2-CBGA
		3	3-CBGA
n∈{4,5,6,7,8,9,10}	n-CBGA

The experiment calculates 4 model evaluation indexes Accuracy, Precision, Recall and F1 values commonly used in the NLP field on a test set to evaluate the model. The results of the comparison of the 10 models are shown in Table 3.

TABLE 3 comparison of models

Model	Accuracy	Precision	Recall	F₁
					1-CBGA	0.9217	0.9059	0.9376	0.9215
2-CBGA	0.9289	0.9298	0.9247	0.9273
					3-CBGA	0.9194	0.8950	0.9467	0.9201
4-CBGA	0.9145	0.9610	0.8604	0.9079
					5-CBGA	0.9185	0.9416	0.8889	0.9145
6-CBGA	0.9167	0.8944	0.9412	0.9172
					7-CBGA	0.9212	0.8981	0.9467	0.9218
8-CBGA	0.9239	0.9220	0.9229	0.9224
					9-CBGA	0.9163	0.9035	0.9284	0.9158
10-CBGA	0.9158	0.8881	0.9477	0.9169

The above results illustrate the feasibility and effectiveness of the Chinese text sentiment analysis method based on changing the neural network channel cardinality proposed in the present application.

It should be noted that the described embodiments of the invention are only preferred ways of implementing the invention, and that all obvious modifications, which are within the scope of the invention, are all included in the present general inventive concept.

Claims

1. A Chinese text emotion analysis method based on changing of a neural network channel base number is characterized by comprising the following steps:

step1, preprocessing a Chinese text data set through a word embedding system to obtain a word sequence corresponding to Chinese text information and a word vector matrix corresponding to the word sequence, wherein the word vector is used for representing the positions of words and words in the word sequence;

2. The method for sentiment analysis of chinese text based on changing neural network channel cardinality of claim 1, wherein the preprocessing comprises: obtaining a Chinese text data set, carrying out word segmentation on original text data in the Chinese text data set to obtain a sentence sequence consisting of single words, inputting the sentence sequence into a Skip-gram model training word vector in word2vec, and further obtaining a word vector matrix S ═ { x ═ corresponding to the sentence sequence₁,x₂,…,x_nWhere the word vector for each word in the sentence sequence is x_i，x_i∈R^n×dN is the number of words and d is the vector dimension.

3. The method for emotion analysis of Chinese text based on changing channel cardinality of neural network as claimed in claim 2, wherein the feature extraction process of each group of CNN-BiGRU-Attention network includes:

4. The method as claimed in claim 3, wherein the CNN network includes a convolutional layer, a pooling layer and a full link layer, and the 3 filters configured for inputting the word vector matrix S into the convolutional layer extract local features of the input text data to obtain a feature vector C ═ C¹,c²,…,c^m]Wherein c is^jFeature vectors obtained for a filter, c^j＝[c₁,c₂,...,c_n-h+1]M is the number of convolution kernels, h is the size of the convolution kernels, c_iIs a local feature vector; the pooling layer is used for matching the feature vector c by adopting a Max scaling technology^jDown-sampling to obtain optimal solution M of local value^jObtaining a feature vector M with the length M, wherein M is [ M ═ M¹,M²,…,M^m]Wherein M is^j＝max(c₁,c₂,...,c_n-h+1)＝max{c^j}; and inputting the characteristic vector M into a full-connection layer to obtain a vector U.

5. The method as claimed in claim 4, wherein the outputs of the 3 kinds of filter extracted features after pooling are respectively marked as M₃、M₄、M₅Using full-join function to process the vector M after the pooling layer processing₃、M₄、M₅Connected as a vector U, where U ═ M₃,M₄,M₅}。

6. The method of emotion analysis for Chinese text based on changing the channel cardinality of neural network as claimed in claim 5, wherein said vector U is inputted into BiGRU network for learning before and after feature of sequence phraseExtracting the contact information to obtain a latent state semantic code h_t。

7. The method for analyzing Chinese text sentiment based on changing neural network channel cardinality according to claim 6, wherein the step of assigning corresponding weights to the feature information extracted in the step B and performing screening, and the step of obtaining the most important sentiment information comprises the steps of: generating a target attention weight v_t，v_t＝tanh(h_t) Tanh is a function for attention mechanics; the attention weight is then probabilistic, according to the formula:

8. The method of emotion analysis of Chinese text based on changing the cardinality of neural network channels as claimed in claim 7, wherein corresponding n eigenvectors, denoted as a, are obtained_t ⁿAnd splicing the n feature vectors together by using a full-concatenation function, and recording the spliced n feature vectors as feature vectors:

wherein: a splice symbol is represented.

9. The method for emotion analysis of Chinese text based on changing channel cardinality of neural network as claimed in claim 8, wherein said inputting the generated global feature vector into sigmoid classifier for classification comprises: adding a Dense layer, inputting a feature vector A serving as an input element into the Dense layer, wherein the parameter of the Dense layer is 1, an activation function is a sigmoid function, and according to a formula:

by using

The model parameters omega are trained as a loss function,