CN112100376A

CN112100376A - Mutual enhancement conversion network for fine-grained emotion analysis

Info

Publication number: CN112100376A
Application number: CN202010951154.4A
Authority: CN
Inventors: 蒋斌; 侯静; 杨超
Original assignee: Hunan University
Current assignee: Hunan University
Priority date: 2020-09-11
Filing date: 2020-09-11
Publication date: 2020-12-18
Anticipated expiration: 2040-09-11
Also published as: CN112100376B

Abstract

The invention relates to a mutual enhancement conversion network for fine-grained emotion analysis, belonging to a fine-grained emotion analysis task aiming at determining the emotion polarity of each specific attribute in a given sentence. The invention provides a mutual enhancement conversion network for fine-grained emotion analysis. First, the attribute enhancement module in the network refines the attribute characterization learning by semantic features extracted from sentences to give richer information to the attributes. Second, the network iteratively enhances the representation of attributes and contexts using a hierarchy to achieve more accurate emotion prediction. The invention is effective for fine-grained emotion analysis tasks and performs well in both single-attribute sentences and multi-attribute sentences.

Description

Mutual enhancement conversion network for fine-grained emotion analysis

Technical Field

The invention relates to a mutual enhancement conversion network for fine-grained emotion analysis, and belongs to the technical field of fine-grained emotion analysis tasks.

Background

The fine-grained sentiment analysis task comprises two subtasks, namely attribute extraction and attribute sentiment classification. The invention assumes that the attributes are known and focuses only on the attribute emotion classification task. In the fine-grained emotion analysis task, multiple attributes may appear in one sentence. Other attributes and related words become noisy when predicting the emotion of the current attribute. Therefore, how to efficiently model semantic relationships between a given attribute and words in a sentence is an important challenge.

The traditional method mainly depends on the characteristics of manual design, and the characterization mode almost reaches the performance bottleneck. With the development of deep learning technology, especially the proposal of attention mechanism, the above problems are well solved, and many neural attention models are proposed. In these works, the model typically first takes a representation of the attributes, and then applies an attention mechanism to extract contextual features that are relevant to the given attributes for emotion prediction. Attention machines, however, have some drawbacks. When a sentence contains multiple attributes and they have different emotional tendencies, the viewpoint modifiers of the other attributes are noise information for the current attribute. However, attention mechanisms have difficulty learning different perspective modifiers that distinguish multiple attributes, and this directly affects the final prediction result. For example, in the sentence "I like coming back to Mac OS but this is a stitch is lacking in the spoke quality compensated to my $400old HP lap", the attention force mechanism should pay more attention to the opinion word "like" with positive emotional tendency for the attribute "Mac OS". However, attention mechanisms typically involve unrelated opinion words, such as the opinion word "lacing" with negative emotional tendencies, which can interfere with emotion prediction for the attribute "Mac OS". To this end, researchers have proposed some work to ameliorate the drawbacks of the attention mechanism. However, most work has been directed to designing complex neural networks to improve the characterization learning of context. Little work has been focused on improving the characterization learning of attributes.

Disclosure of Invention

The invention provides a mutual enhancement conversion network for fine-grained emotion analysis, which aims to determine the emotion polarity of each specific attribute in a given sentence.

The invention comprises a BERT layer, a bidirectional enhanced conversion layer and a convolution characteristic extractor which are connected in sequence;

the BERT layer generates a word representation of the sequence using pre-trained BERT;

the bidirectional enhancement conversion layer comprises a bidirectional LSTM layer, an attribute enhancement module and a group of word conversion units, wherein the bidirectional LSTM layer is respectively connected with the attribute enhancement module and the word conversion units;

the bidirectional LSTM layer is used for capturing long dependency relationship and position information between texts, and the encoding result has two directions, one is an attribute enhancement module, and the other is a word conversion unit;

the attribute enhancement module receives the attribute representation and the average of the bidirectional LSTM layer coding result, and finally outputs an enhanced attribute representation which is input into the word conversion unit;

the attribute enhancement module utilizes the extracted context characteristics to enhance the attributes;

the word conversion unit receives the encoding result from the bidirectional LSTM layer and the enhanced attribute representation from the attribute enhancement module;

the convolutional feature extractor receives attribute information using a GCAE network to control the transfer of the emotional features of the sentence, which further enhances the link between the attributes and the context, and furthermore, introduces relative position information to better extract the emotional features.

The process of reasoning and training through the mutual enhancement conversion network for fine-grained emotion analysis is as follows:

step 1, a BERT layer, which uses pre-trained BERT to generate word representation of a sequence, supposing that a sentence contains m words and an attribute contains n words, vector representation of the sentence can be obtained through the BERT layer

Vector representation of sum attribute a ═ a₁,a₂,...,a_n}∈Rⁿ×^dWhere d represents the dimension of the BERT output layer.

And 2, bidirectional enhancement conversion layers, wherein each bidirectional enhancement conversion layer comprises three parts, namely a bidirectional LSTM layer, an attribute enhancement module and a group of character conversion units. The bi-directional LSTM layer first generates a contextualized word representation from the input. The attribute enhancement module then uses the word representations to enhance the attribute representations. Finally, the word conversion unit generates an attribute-specific word representation based on the contextualized word representation and the enhanced attribute representation.

And S21, learning the context dependence relationship of the text through the bidirectional LSTM layer. As shown in fig. 1, the bidirectional enhancement translation layer is repeated a plurality of times through the hierarchical structure. The input of the bi-directional LSTM in the lowest bi-directional enhancement conversion layer is a contextual representation of the BERT layer output. The input to the bi-directional LSTM in the next bi-directional enhancement conversion layer is from the output of the word conversion unit in the previous bi-directional enhancement conversion layer.

The word representation of the bidirectional LSTM output may be represented as

Forward LSTM outputs a set of hidden state vectors

Wherein d is_hIndicating the number of hidden units. Similarly, backward LSTM also outputs a set of hidden state vectors

Finally, the word representation of the bidirectional LSTM output is obtained by connecting the two hidden state lists

Wherein

S22, attribute enhancing module, before the first attribute enhancing operation, obtaining the initial attribute representation. Specifically, the attribute vector a output by BERT is first set to { a ═ a }₁,a₂,...,a_n}∈R^n×dInput into another bi-directional LSTM, and then apply to the obtained hidden state vector

Using an averaging pooling methodFinally, an initial attribute representation is obtained

Taking the lowest bi-directional enhancement translation layer as an example, after obtaining the initial attribute representation, the contextualized word vector h is output based on the bi-directional LSTM⁽¹⁾We obtain a vector by averaging the pooling layers

This is referred to as a context vector. The context vectors are then fused into the initial attribute representation using a basic feature fusion method (point-by-bit addition), which can be expressed as

This is an enhancement operation that acts on the attribute. And so on, the final attribute representation is

This formula expands as follows:

wherein

Representing the context vector in the ith dyad conversion layer. According to equation (1), the attributes are enhanced by different context vectors in multiple dyads. Attribute vector

There are two destinations, one destination being a word conversion unit in the same dyad and the other destination being a property enhancement module in the next dyad.

S23, a word conversion unit, which uses the same structure as the CPT module in the TNet model. The unit will attribute the vector

And the word represents

As an input, wherein

Is represented by the ith word level of the bi-directional LSTM layer output,

is the enhanced attribute vector. Specifically, firstly, the

And

is input to a full connection layer to obtain an ith attribute-specific word representation

Where g (—) is a non-linear activation function and ": means a vector join operation.

And

respectively weight matrix and bias. There is an information protection mechanism to ensure that context dependent information captured from the bi-directional LSTM layer is not lost. This information protection mechanism enhances the delivery and use of features, which can be expressed as:

wherein

Is the output of the word conversion unit.

Step 3, a convolution feature extractor is introduced to a variable p_iTo measure the relative position information between the ith word and the current attribute word in the context, p_iIs calculated as follows:

where k is the index of the first word in the attribute, C is a pre-specified constant, and n is the length of the attribute phrase. When a sentence is filled, the index i may be larger than the actual length m of the sentence. Then, p is added_iThe word output by the ith word conversion unit in the Lth bidirectional enhanced conversion layer is multiplied by the weight to represent that:

x at this time_iIs a word representation that incorporates location information.

Then, the sentence with the position information is expressed with X ═ X₁,x₂,...,x_mAnd the final attribute vector

Inputting into a gated convolution network to generate a feature map c:

s_i＝tanh(W_sX_i:i+k-1+b_s) (7)

c_i＝s_i×a_i (8)

where k is the convolution kernel size, W_a，V_a，b_a，W_sAnd b_sAre all learnable parameters, c_iIs one item in the feature map c, s_iIs a calculated emotional feature, a_iIs the calculated attribute feature; x denotes element-by-element multiplication. Then, the sentence representation z is obtained by s convolution kernels and applying the maximum pooling method:

z＝{max(c₁),...,max(c_s)} (9)

where max is a function of the maximum. Finally, z is input to a fully connected layer for final emotion prediction:

wherein softmax is a normalized exponential function, W_fAnd b_fAre learnable parameters.

Step 4, the mutually enhanced transformation network for fine-grained sentiment analysis referred to herein can be trained in an end-to-end manner within a supervised learning framework to optimize all parameters Θ. With L₂The cross entropy of the regularization term is used as a loss function, defined as:

wherein y is_iRepresenting the true probability that a given sentence is labeled as each emotion,

representing the estimated probability that a given sentence is labeled as each emotion, O representing the number of classes of emotion polarity, and λ being L₂Parameters of the regularization term.

The method has the advantages of improving the attribute characterization learning and realizing the iterative interactive learning between the attribute and the context. The attribute enhancement module in the network improves the attribute characterization learning through semantic features extracted from sentences so as to endow the attributes with richer information. Second, the network iteratively enhances the representation of attributes and contexts using a hierarchy to achieve more accurate emotion prediction.

Drawings

FIG. 1 is an overall architecture of a mutually enhanced conversion network for fine-grained sentiment analysis.

Fig. 2 is a structural diagram of the first bidirectional enhanced conversion module.

Fig. 3 is a block diagram of a word conversion unit.

Detailed Description

In the following, preferred embodiments of the present invention will be further explained with reference to fig. 1 to 3, wherein the dashed arrows in fig. 1 represent the conversion of attributes, and the solid arrows represent the conversion of sentences.

the BERT layer generates a word representation of the sequence using pre-trained BERT; BERT is an english abbreviation expressed by the bidirectional coder of the transform model, which is a common abbreviation of those skilled in the art.

The bidirectional enhancement conversion layer comprises a bidirectional LSTM layer, an attribute enhancement module and a group of word conversion units, wherein the bidirectional LSTM layer is respectively connected with the attribute enhancement module and the word conversion units; iterative interactive learning of attributes and contexts is realized by adopting a hierarchical structure, and each computing layer is a bidirectional enhancement conversion component; attribute information is added in the process of extracting emotional characteristics through GCAE, wherein the GCAE is an English abbreviation of a gated convolution network with embedded attributes and is a common abbreviation for a person skilled in the art; the relation between the attribute and the context is further enhanced, and relative position information is introduced to better provide emotional characteristics; compared with the prior art, the feature extractor is replaced by GCAE from CNN; CNN is an english abbreviation of convolutional neural network, which is commonly abbreviated by those skilled in the art.

the word conversion unit receives the encoding result from the bi-directional LSTM layer and the enhanced attribute representation from the attribute enhancement module.

The convolutional feature extractor receives attribute information using a GCAE network to control the transfer of emotional features of the sentence, which further enhances the link between the attribute and the context, and furthermore, introduces relative position information to better extract the emotional features, and GCAE is an english abbreviation of gated convolutional network with embedded attribute, which is commonly abbreviated by those skilled in the art.

The process of reasoning and training of the invention is as follows:

Vector representation of sum attribute a ═ a₁,a₂,...,a_n}∈R^n×dWhere d represents the dimension of the BERT output layer.

The word representation of the bidirectional LSTM output may be represented as

Forward LSTM outputs a set of hidden state vectors

Wherein

Applying an average pooling method to obtain an initial attribute representation

Taking the lowest bi-directional enhancement translation layer as an example, after obtaining the initial attribute representation, the contextualized word vector h is output based on the bi-directional LSTM⁽¹⁾We obtain by averaging the pooling layersA vector

This formula expands as follows:

wherein

S23, a word conversion unit, which uses the same structure as the CPT module in the TNet model. TNet is an english abbreviation for attribute-oriented transition network, which is commonly abbreviated by those skilled in the art; CPT is an english abbreviation of context protection mechanism, which is a common shorthand for those skilled in the art; the unit will attribute the vector

And the word represents

As an input, wherein

Is represented by the ith word level output by the bidirectional LSTM layer, LSTM is an english abbreviation of long-short term memory network, which is commonly abbreviated by those skilled in the art,

is the enhanced attribute vector. Specifically, firstly, the

And

And

wherein

Is the output of the word conversion unit.

Expressing the sentence integrated with the position information as X ═ X₁,x₂,...,x_mAnd the final attribute vector

Inputting into a gated convolution network to generate a feature map c:

s_i＝tanh(W_sX_i:i+k-1+b_s) (7)

c_i＝s_i×a_i (8)

z＝{max(c₁),...,max(c_s)} (9)

Claims

1. The mutual enhancement conversion network for fine-grained emotion analysis is characterized in that:

the device comprises a BERT layer, a bidirectional enhanced conversion layer and a convolution feature extractor which are connected in sequence;

2. The mutual enhancement conversion network for fine-grained emotion analysis is characterized in that the reasoning and training process comprises the following steps:

Vector representation of sum attribute a ═ a₁,a₂,...,a_n}∈R^n×dWherein d represents the dimension of the BERT output layer;

step 2, a bidirectional enhanced conversion layer, wherein a bidirectional LSTM layer generates contextualized word representation according to input; the attribute enhancement module then further enhances the attribute representation with these word representations; finally, the word conversion unit generates an attribute-specific word representation based on the contextualized word representation and the enhanced attribute representation;

s21, learning the context dependency relationship of the text through the bidirectional LSTM layer; the bidirectional enhancement conversion layer is repeated a plurality of times through the hierarchical structure, and the input of the bidirectional LSTM in the bottommost bidirectional enhancement conversion layer is the context representation of the output of the BERT layer;

the input of the bidirectional LSTM in the next bidirectional enhancement conversion layer is from the output of the word conversion unit in the previous bidirectional enhancement conversion layer;

the word representation of the bidirectional LSTM output may be represented as

Forward LSTM outputs a set of hidden state vectors

Wherein d is_hIndicating the number of hidden units; backward LSTM also outputs a set of hidden state vectors

Connecting the two hidden state lists results in a word representation of the bi-directional LSTM output

Wherein

S22, an attribute enhancement module, before the first attribute enhancement operation, obtaining the initial attribute representation; first, the attribute vector a output by BERT is ═ a₁,a₂,...,a_n}∈R^n×dInput into another bi-directional LSTM, and then apply to the obtained hidden state vector

Contextualized word vector h based on bi-directional LSTM output after initial attribute representation is obtained⁽¹⁾Obtaining a vector by averaging the pooling layers

It is referred to as a context vector; the context vectors are then fused into the initial attribute representation using a point-by-point bitwise additive feature fusion method, resulting in an enhanced operation on the attributes, represented as

The final attribute representation is obtained

The formula expands as follows:

wherein

i∈[1,L]Representing a context vector in an ith bi-directional enhancement conversion layer;

according to formula (1), the attributes are reinforced by different context vectors in a plurality of bidirectional enhancement conversion layers;

attribute vector

There are two destinations, one destination being a word conversion unit in the same dyad and the other destination being a genus in the next dyadA sexual enhancement module;

s23, word conversion unit, converting attribute vector

And the word represents

As an input, wherein

Is represented by the ith word level of the bi-directional LSTM layer output,

is the enhanced attribute vector;

firstly, the method is carried out

And

Wherein g () is a non-linear activation function, ": indicates a vector join operation;

and

respectively, weight matrix and bias; using information protection mechanisms to ensure context-dependent information captured from a bi-directional LSTM layerInformation is not lost, and the information protection mechanism enhances the delivery and use of features, expressed as:

wherein

Is the output of the word conversion unit;

wherein k is the index of the first word in the attribute, C is a pre-specified constant, and n is the length of the attribute phrase; when a sentence is filled, the index i may be larger than the actual length m of the sentence;

p is to be_iThe word output by the ith word conversion unit in the Lth bidirectional enhanced conversion layer is multiplied by the weight to represent that:

Inputting into a gated convolution network to generate a feature map c:

s_i＝tanh(W_sX_i:i+k-1+b_s) (7)

c_i＝s_i×a_i (8)

where k is the convolution kernel size, W_a，V_a，b_a，W_sAnd b_sAre all learned parameters, x represents element-by-element multiplication; c. C_iIs one item in the feature map c, s_iIs a calculated emotional feature, a_iIs the calculated attribute feature;

the sentence representation z is obtained by s convolution kernels and applying the maximum pooling method:

z＝{max(c₁),...,max(c_s)} (9)

wherein softmax is a normalized exponential function, W_fAnd b_fAre learnable parameters;

step 4, the mutually enhanced transformation network for fine-grained sentiment analysis referred to herein can be trained in an end-to-end manner within a supervised learning framework to optimize all parameters Θ, with L₂The cross entropy of the regularization term is used as a loss function, defined as: