CN113761910A

CN113761910A - Comment text fine-grained emotion analysis method integrating emotional characteristics

Info

Publication number: CN113761910A
Application number: CN202110283681.7A
Authority: CN
Inventors: 周少龙; 陈欣洁; 余智华; 冯凯; 李建广
Original assignee: Golaxy Data Technology Co ltd
Current assignee: Golaxy Data Technology Co ltd
Priority date: 2021-03-17
Filing date: 2021-03-17
Publication date: 2021-12-07

Abstract

The invention discloses a comment text fine-grained emotion analysis method fused with emotion characteristics, which comprises the following steps of S1, preprocessing the existing comment data corpus; s2, constructing a joint vector; and S3, carrying out ADBC fine-grained emotion model training. Has the advantages that: the invention represents the text vector by fusing the emotion label, the emotion word and the aspect word, designs a high-robustness fine-grained emotion analysis framework aiming at the enhancement semantics of the comment text, more excavates the potential emotion information in the text, and simultaneously embeds the CNN convolutional neural network to carry out feature reinforcement learning on the output layer of the Bi-GRU before the attention mechanism, thereby improving the accuracy of fine-grained emotion analysis. The real evaluation information of the user can be reflected, so that the user can more directly know the quality of each aspect of the product. The feedback result is provided for the consumers and the suppliers with high efficiency and reliability.

Description

Comment text fine-grained emotion analysis method integrating emotional characteristics

Technical Field

The invention relates to the technical field of natural language text processing, in particular to a comment text fine-grained emotion analysis method integrating emotion characteristics.

Background

With the development and perfection of the network and the APP terminal, people can conveniently conduct online shopping, sharing and commenting on concerned things, for example, after people buy commodities, people habitually comment on various attributes of the commodities, describe the actual conditions of the commodities and describe the use experience of the people. After a meal is received at a restaurant, dishes, services, etc. are reviewed. Through the emotion analysis on the comments, the emotional tendency and the satisfaction degree of the user on the commodity can be obtained, and the production, sale and purchase decisions of people can be helped.

The fine-grained sentiment analysis focuses on the mining of attribute level information, and generally requires that comment texts have strong sentiment tendency or definite attribute information, so that the information extraction of related texts is effectively enhanced by constructing a domain sentiment dictionary.

In the previous fine-grained sentiment analysis, most of the sentiment tendency of the attribute is determined by extracting attribute and viewpoint pairs, namely the attribute and the viewpoint are determined by the appearance of pairs, and the sentiment determination accuracy of the attribute level is low; this also tends to create a lack of views of some implicit attributes, such as: (1) the catering method comprises the following steps: "relatively close, very convenient", there is no attribute, but is an evaluation to "position"; (2) and (3) evaluation aspects: the expression of 'very noisy sound' is the evaluation of 'sound effect'; (3) and (3) on-line purchase: "with damage" expresses an assessment of "integrity of the good". On the other hand, there is a possibility that the evaluation of the product or the attribute is not performed even if there is an evaluation viewpoint, and therefore, there is a lack of a method for improving the accuracy of the judgment of the fine-grained emotional tendency by using a comment sentence having emotion.

An effective solution to the problems in the related art has not been proposed yet.

Disclosure of Invention

The invention provides a fine-grained sentiment analysis framework facing product comments, and provides a fine-grained sentiment analysis method based on Att-Dic-BiGRU-CNN (ADBC). by means of the idea of mapping label information to semantic space, in the fine-grained sentiment analysis, sentiment aspect word information, sentiment word information, corresponding part-of-speech information and word vectors are used as an embedding layer, a Bi-GRU network is used for carrying out sentiment semantic information feature extraction on information in comment text context, then the information is input to a convolutional neural network, the information is output to a sentiment mechanism layer introducing sentiment labels for weight distribution, and finally label classification is carried out on text features through a softmax function. The method can effectively improve the accuracy of fine-grained emotion classification of products.

In order to achieve the purpose, the invention provides the following technical scheme:

a comment text fine-grained emotion analysis method integrating emotion characteristics comprises the following steps:

s1, preprocessing the linguistic data of the existing comment data;

s2, constructing a joint vector;

and S3, carrying out ADBC fine-grained emotion model training.

Further, the pre-processing of the comment data corpus in the step S1 includes the following steps:

s11, pre-processing the comment corpus, such as word segmentation, stop word removal, punctuation and the like;

s12, using random initialization vector matrix for characters, attribute terms, emotional terms and the like in the sentence;

s13, constructing an aspect word vector sequence for the attribute aspect words;

s14, constructing an emotion word vector sequence for the emotion words;

and S15, constructing a label vector for the emotion labels.

Further, the step S2 is to construct a joint vector, and combine the plurality of vectors in the step S1, and adopt a concat function in the tensoflow to splice the vector sequences, so as to obtain a new fusion vector model.

Further, the step S3ADBC fine-grained emotion model training includes the following steps:

s31, training a GRU network layer;

s32, training a CNN network layer;

s33, and training an attention network layer.

Compared with the prior art, the invention has the following beneficial effects: according to the invention, by fusing external emotion characteristics, more potential emotion information in the text is mined, and meanwhile, a CNN convolutional neural network is embedded before an attention mechanism to perform characteristic reinforcement learning on the output layer of the Bi-GRU, so that the accuracy of fine-grained emotion analysis is improved. The invention represents the text vector by fusing the emotion label, the emotion word and the aspect word, designs a high-robustness fine-grained emotion analysis framework aiming at the enhancement semantics of the comment text, more excavates the potential emotion information in the text, and simultaneously embeds the CNN convolutional neural network to carry out feature reinforcement learning on the output layer of the Bi-GRU before the attention mechanism, thereby improving the accuracy of fine-grained emotion analysis. The real evaluation information of the user can be reflected, so that the user can more directly understand the product in all aspects. The feedback result is provided for the consumers and the suppliers with high efficiency and reliability.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings required to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 is a flowchart of a comment text fine-grained emotion analysis method fused with emotion characteristics according to an embodiment of the present invention;

FIG. 2 is a vector diagram of the comment text fine-grained emotion analysis method fused with emotion features according to the embodiment of the invention.

Detailed Description

The invention is further described with reference to the following drawings and detailed description:

referring to fig. 1, a comment text fine-grained emotion analysis method fused with emotion features according to an embodiment of the present invention includes the following steps:

s1, preprocessing the linguistic data of the existing comment data;

s2, constructing a joint vector;

and S3, carrying out ADBC fine-grained emotion model training.

Preprocessing the corpus of the comment data in the step S1;

and S11, performing preprocessing such as word segmentation, stop word removal, punctuation and the like on the comment corpus.

S12, performing word vector representation on the comment text by using a random initialization vector matrix for characters, attribute terms, emotional words and the like in the sentence:

x_c＝[c₁,c₂,...,c_n]

s13, constructing an aspect word vector sequence for the attribute aspect words, wherein the sequence is expressed as:

x_w＝[w₁,w₂,...,w_m]

s14, constructing an emotion word vector sequence for the emotion words, wherein the emotion word vector sequence is expressed as:

x_s＝[s₁,s₂,...,s_m]

s15, constructing a label vector for the emotion labels, wherein the label vector is expressed as:

x_l＝[l₁,l₂,...,l_m]

the step S2 joint vector construction;

combining the multiple vectors in step 1, splicing the vector sequences by using a concat function in tenserflow, specifically referring to the process of fig. 2, to obtain a new fusion vector model, which can be expressed as:

x_t＝concat([x_w,x_s,x_c],1)

s3, carrying out ADBC fine-grained emotion model training;

s31, training a GRU network layer;

and (4) taking the joint vector obtained in the step (S2) as an input of the BiGRU network, namely performing vector language representation according to the sequence of words in the sentence, and coding the sentence. The method is characterized in that a weight is learned for a vector of each time step of a hidden layer, namely different weights are given to different words in a comment text when the vector representation of a sentence is obtained, and then the vector representation of the sentence is obtained by weighting the combined vectors of the different weights. The input and output layers represent the process shown in fig. 2. Model training was performed using the ADBC model, where the formula for the GRU is expressed as follows:

forget gate representation: z is a radical of_t＝σ(W_z·[h_t-1,x_t]+b_z)

The output gate represents: r is_t＝σ(W_r·[h_t-1,x_t]+b_r)

Hidden layer representation:

the memory cell represents:

s32, training a CNN network layer;

convolutional neural networks can learn to capture a particular feature regardless of its location. Thus, the hidden unit is then input as a CNN convolutional layer, the sentence being:

H＝h₀+h₁+...+h_n-1

vector h_i:i+jRepresenting a series of word vectors h_i,h_i+1,...,h_i+j. Each convolution operation has a filter w that generates a new signature through a window containing m words:

H_i＝f(w*h_i:i+m-1+b)

where b is the bias, the final text vector is represented as:

H^*＝[H₀,H₁,...,H_n-m]

s33, and training an attention network layer.

Adding an Attention mechanism for embedding emotion tags on the basis of a CNN network, and calculating cosine similarity between tag vector and each word in the characteristics of a sequence sentence output by a CNN network, namely the product of L2norm of the c-th tag embedding and the L-th word embedding. Wherein is a normalized matrix:

to better capture information between successive words, non-linearities are introduced in the similarity, here implemented jointly using convolution and activation functions, denoted g_l-r:l+rThe relevance of the central word l in a window with the length of 2r +1 obtains the similarity between the central word l and each label:

u_l＝RELU(g_l-r:l+rW₁+b₁)

through maximum pooling, the label with the maximum correlation with the word l and the corresponding similarity can be obtained:

m_l＝f(u_l)

after obtaining the maximum similarity between each word in the article and the label, m is a vector with the length of L, and after softmax, the attention score of the normalized text sequence of the word is obtained as follows:

β＝softmax(m_l)

and carrying out weighted sum on word vectors of each word in the article, wherein the text represents sentence emotional tendency in fine granularity obtained by label-based attribute score weighted word embedding:

the attention mechanism is represented by an additive model:

a＝softmax(w(t)c(t))

the final sentence vector is represented as:

v＝tanh(h*a)

then judging the emotional tendency, and taking the output v of the Attention mechanism as the input of the softmax layer to perform fine-grained emotional classification prediction vector:

y＝softmax(w*v+b)

and obtaining a final judgment result vector of the sentence. The vector length is the number of types of the judgment sentences, the vector values correspond to the positive, neutral and negative emotion numerical values respectively, and the emotion type corresponding to the maximum value is the emotion tendency of the fine granularity level.

The loss value is calculated using a cross entropy loss function:

E＝-∑y+λ||θ||²

using random gradient descent method for the model parameters. In the training, iterative training is carried out through the set minimum batch, the corpus in each minipatch calculates a loss function, and parameters in the network are optimized through back propagation. In the gradient descent process, dropout and an L2 paradigm are adopted to constrain the weight vector. After multiple iterations, when the accuracy rate tends to be stable, the model training is completed. The loss function of the model puts the vector representation of the label into a classifier in addition to the cross entropy mentioned above, and adds the cross entropy of the probability distribution of the forward propagation result and the label to the loss function to obtain the vector representation of the supervised information training label.

Through the scheme, the potential emotion information in the text is more mined by fusing the external emotion characteristics, and the CNN convolutional neural network is embedded to perform feature reinforcement learning on the output layer of the Bi-GRU before the attention mechanism, so that the accuracy of fine-grained emotion analysis is improved.

In practical application, the emotion labels, the emotion words and the aspect words are fused to represent text vectors, a high-robustness fine-grained emotion analysis frame aiming at the enhancement semantics of comment texts is designed, more potential emotion information in the texts is mined, and meanwhile, a CNN (convolutional neural network) is embedded before the attention mechanism to conduct feature reinforcement learning on the output layer of the Bi-GRU, so that the accuracy of fine-grained emotion analysis is improved. The real evaluation information of the user can be reflected, so that the user can more directly know the quality of each aspect of the product. And providing efficient and reliable feedback results for consumers and suppliers.

Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims

1. A comment text fine-grained emotion analysis method fused with emotion characteristics is characterized by comprising the following steps:

s1, preprocessing the linguistic data of the existing comment data;

s2, constructing a joint vector;

and S3, carrying out ADBC fine-grained emotion model training.

2. The method for analyzing the comment text fine granularity sentiment fused with the sentiment features according to claim 1, wherein the pre-processing of the existing comment data corpus in the step S1 comprises the following steps:

s14, constructing an emotion word vector sequence for the emotion words;

and S15, constructing a label vector for the emotion labels.

3. The method for fine-grained emotion analysis of comment text fused with emotion characteristics as claimed in claim 1, wherein said step S2 is a joint vector construction, and said vector sequence is spliced by using a concat function in tensoflow in combination with the plurality of vectors in step S1 to obtain a new fused vector model.

4. The method for analyzing the comment text fine-grained emotion fused with emotion characteristics according to claim 1, wherein the step S3 of ADBC fine-grained emotion model training comprises the following steps:

s31, training a GRU network layer;

s32, training a CNN network layer;

s33, and training an attention network layer.