CN114896969A

CN114896969A - Method for extracting aspect words based on deep learning

Info

Publication number: CN114896969A
Application number: CN202210514804.8A
Authority: CN
Inventors: 杨鹏; 张朋辉; 戈妍妍
Original assignee: Nanjing Youhui Xin'an Technology Co ltd
Current assignee: Nanjing Youhui Xin'an Technology Co ltd
Priority date: 2022-05-12
Filing date: 2022-05-12
Publication date: 2022-08-12

Abstract

The invention discloses a method for extracting facet words based on deep learning, which comprises the following steps of constructing a facet word extraction data set; embedding sentence features in the data set into a semantic space; using a multi-feature encoder to encode sentence features; encoding a sentence context using a bidirectional LSTM-based context encoding layer; extracting global semantic information of the sentence by using a global semantic information extraction layer based on a multi-head self-attention mechanism, and capturing semantic relation between the aspect words and the context; a sequence decoding layer based on the conditional random field decodes the vector learned by the model, extraction of the aspect words in the sentence is completed through sequence labeling, and the method can be used for extraction of the aspect words of the social media text. The method is based on fully learning the multi-feature of the sentence, completes the capture of the context semantic information of the word in the aspect by combining a multi-head self-attention mechanism, can still show good effect in a complex scene, and has the characteristics of high precision and strong robustness.

Description

Method for extracting aspect words based on deep learning

Technical Field

The invention relates to a method for extracting facet words based on deep learning, which can be used for extracting facet words of social media texts and belongs to the technical field of internet and natural language processing.

Background

With the continuous development of the internet, more and more netizens are used to express views and attitudes towards news events using social media (e.g., microblog, Twitter, etc.). The social media platforms gradually become sensors for the development of real world events, and network public sentiment plays more and more important roles in reflecting the civil ideas, refracting the reality and the like. Meanwhile, various bad opinions are also filled in the network, but the network space is not the other than the law, and the network public opinion is supervised by the network public opinion analysis technology, so that the network environment with harmonious health is created, and the government departments can know the civilian, properly process social public opinion events. The emotion analysis technology is an important component of public opinion analysis technology, and the quality of emotion analysis directly determines the quality analysis of public opinion analysis. The existing emotion analysis technology is document level emotion analysis and sentence level emotion analysis, and cannot meet the requirements of a public sentiment analysis system on social media content emotion details, so that the introduction of an aspect level emotion analysis technology is needed. However, the aspect word extraction is a prerequisite for the aspect level emotion analysis, and the high-quality aspect words have significance for the aspect level emotion analysis.

In recent years, many scholars have conducted intensive studies on the facet extraction technology. The methods of investigation include two types: supervised learning based methods and unsupervised learning based methods. Researchers of the supervised learning-based method for extracting the aspect words regard the aspect word extraction as a sequence labeling task, and the common research methods include a graph-based method, a semantic analysis-based method and a statistical-based method. Although the method improves the precision of the aspect word extraction to a certain extent, the method depends heavily on high-quality data labeled manually, the cost problem is caused by the manually labeled data, and the model is difficult to migrate to a new field. The method based on unsupervised learning can solve the problems of the method to a certain extent. But the unsupervised learning-based method does not fully consider the capture of word sequence information and neglects the extraction of character-level features, which leads to incomplete model-extracted aspect words.

The invention provides a method for extracting aspect words based on deep learning, which aims at solving the problem that the semantic features of sentences are not fully learned in the existing aspect word extraction research. Firstly, initially coding a sentence by using a multi-feature coding layer; then, the initial code is sent into a context coding layer of a bidirectional LSTM-based model to learn the context information of the sentence; secondly, sending the learning result of the previous layer into a global context information extraction layer based on a multi-head self-attention mechanism to learn semantic association between the aspect words; and finally, finishing sequence decoding through a sequence decoding layer based on the conditional random field to obtain an aspect word extraction result. The method improves the robustness of the aspect word extraction model and improves the extraction accuracy.

Disclosure of Invention

Aiming at the problems and the defects in the prior art, the invention provides the method for extracting the aspect words based on the deep learning.

In order to achieve the purpose, the technical scheme of the invention is as follows: a method for extracting aspect words based on deep learning covers the whole process of extracting aspect words, mainly comprises the processes of multi-feature coding, context coding, global context information extraction, word sequence decoding and the like, and can effectively extract aspect words from comment texts, so that the precision of the task is improved. The method mainly comprises three steps as follows:

step 1, constructing a facet word extraction data set. SemEval2014 resultatant dataset and Laptop dataset are collected firstly, then ACL14 Twitter public dataset is collected, and finally, the datasets are divided into training sets and verification sets according to the ratio of 8:2 and are respectively used for training and verifying the aspect extraction model.

And 2, training an aspect word extraction model. Firstly, initially coding a sentence by using a multi-feature coding layer; then, the initial code is sent into a context coding layer of a bidirectional LSTM-based model to learn the context information of the sentence; secondly, sending the learning result of the previous layer into a global context information extraction layer based on a multi-head self-attention mechanism to learn semantic association between the aspect words; and finally, finishing sequence decoding through a sequence decoding layer based on the conditional random field to obtain an aspect word extraction result. In the training stage, the loss function of the model compares the predicted value with the true value of the model and calculates the loss value, and the model parameters are updated through back propagation, so that the model parameters become better. In addition, each time one round of training is carried out, the data of the verification set is sent to the model for verification;

step 2, training an aspect word extraction model, wherein the implementation process of the step is divided into 4 sub-steps:

and a substep 2-1, performing initial coding on the sentence by using a multi-feature coding layer, wherein the specific process is as follows:

definition E _w Representing the result of the overall embedding of three types of information, E _T 、E _s And E _P Representing word embedding, fragment embedding and position embedding respectively, information embedding in Roberta is expressed as:

E _w ＝E _T +E _S +E _P (1)

then, the multi-layer Transformer encoder encodes the embedded result, and the input of the first-layer encoder is defined as H ₀ Then there is H ₀ ＝E _w Then the process of encoding is represented as:

H _i ＝Transformer(H _i-1 ),i∈[1,L] (2)

wherein H _i Represents the result of i-th layer transform encoding, and L represents the total number of layers of the transform of the Roberta-base encoder.

Next, the characters corresponding to each word are encoded, and it is assumed that the character sequence after padding is C ═ C ₁ ,c ₂ ,...,c _n N represents the number of characters. Suppose Emb _c For an embedded matrix of characters, the embedding process of the characters can be expressed as:

E _c ＝Emb _c ·C (3)

in the character encoding stage, a bidirectional long-short term memory network is used as a character encoder, and the encoding process can be expressed as follows:

wherein the content of the first and second substances,

representing the forward hidden state output of the bi-directional LSTM,

representing a backward hidden state output, H, of a bi-directional LSTM _C Representing the final output of the bi-directional LSTM,

representing the join operation of the vectors.

And finally, fusing four different granularity characteristics of words, positions, segments and characters, wherein the process is represented as follows:

wherein H _CW Vector representation representing four features of fused words, positions, segments and characters, H _L Representing the output of the last layer of the transform of the Roberta-base, H _C Representing the final output of the bi-directional LSTM,

representing the join operation of the vectors.

And a substep 2-2, obtaining vector representation fusing four different granularity characteristics from a multi-characteristic coding layer, and carrying out context coding on sentences in a context coding layer based on a bidirectional long-short term memory network, wherein the process is as follows:

the context coding process based on the long-short term memory network can be expressed as follows:

wherein the content of the first and second substances,

represents the output of the bi-directional LSTM forward concealment layer,

representing the output of a bi-directional LSTM backward hidden layer, H _ctx Representing the final output of the bi-directional LSTM,

representing the join operation of the vectors.

Input gate of LSTM cell i _t And an output gate o _t And forget door f _t The calculation process of (a) can be expressed as:

output of LSTM cell

And

the calculation method of (a) can be expressed as:

wherein W represents a weight matrix, b represents a bias value, sigmoid and tanh represent activation functions, and x represents matrix multiplication.

And a substep 2-3, extracting the global context information of the sentence by using a global context information extraction layer based on a multi-head self-attention mechanism, wherein the specific process is as follows:

first, by converting the input vector in a linear layer, the calculation process can be expressed as:

wherein the content of the first and second substances,

to input the feature vectors obtained by the linear layer calculation,

and

respectively weight matrix and bias value.

Then, the feature vectors are combined with three weight matrices W ^Q 、W ^K And W ^V Are multiplied respectively to obtain q _i 、k _j And v _j The calculation process can be expressed as:

then, q is _i Is transferred to

And k is _j Multiplying to obtain an attention score, and dividing the attention score by the attention score

Finally, the weight matrix w can be obtained through the standardization of the softmax function _ij The calculation process can be expressed as:

after that, v is _i And a weight w _ij Multiplying and then adding up the output vector from the attention layer

The calculation process can be expressed as:

wherein an indicates a matrix multiplication. The k output from the attention head is

The vector join process of the multi-headed self-attention mechanism can be expressed as:

wherein concat represents a vector join operation, and K outputs from the attention head are joined to obtain a hidden state vector H' _att 。

Finally, H' _att Obtaining the final output of the multi-head attention mechanism through the operation of the linear layer

A matrix of weights is represented by a matrix of weights,

representing the bias value, the calculation process is represented as:

substeps 2-4 of using a conditional random field based sequence decoding layer as a sequenceAnd the column decoder finishes the extraction of the aspect words in the sentence through sequence marking. Let X be input to the sequence decoding layer as X ═ X ₁ ,x ₂ ,...,x _m Y ═ Y for the tag sequence ₁ ,y ₂ ,...,y _m Then the predicted calculation process can be expressed as:

P(Y|X)＝softmax(s(X,Y)) (27)

where s (X, Y) represents the score of the tag prediction, A represents a randomly initialized matrix,

for representing adjacent labels y _i And y _i+1 The correlation of (c). H represents the output of the upper layer and,

denotes the y th _i+1 The score of each tag. P (Y | X) represents the conditional probability of Y occurring under the condition of X, softmax being the activation function.

Finally, using Viterbi algorithm to calculate the label sequence with highest score, and using it as final prediction result

The calculation process can be expressed as:

the loss function of the model can be expressed as:

where ln represents a natural logarithm, and P (Y | X) represents a conditional probability of Y occurring under the condition of X.

And 3, testing the model by using the test set. And (3) for the text to be processed, firstly, sending the text to be processed into the model obtained through the training process in the step (2), carrying out multi-feature coding, context coding, global context information extraction and the like on the sentence by using the model, and finally, completing the extraction of the aspect words by using a sequence decoder.

Compared with the prior art, the invention has the following beneficial effects:

the method fully learns the characteristics of the data set, encodes the initial characteristics of the sentence through the multi-characteristic encoding layer, excavates the deep information of the sentence through the context encoding layer, and finally learns the association between the face words through the global context information extraction layer, so that the accuracy of extracting the face words by the model is further improved, and the model has strong robustness. The method can ensure the integrity of the extracted aspect words and lay a good foundation for the aspect level emotion classification.

Drawings

FIG. 1 is a flow chart of a method of an embodiment of the present invention;

FIG. 2 is a general framework diagram of a method of an embodiment of the invention;

FIG. 3 is a diagram of the internal structure of a context coding layer based on a long-term and short-term memory network;

FIG. 4 is a detailed diagram of the global context information extraction layer based on a multi-headed self-attention mechanism.

Detailed Description

The following examples are included to further illustrate the invention and to provide a better understanding and appreciation for the invention.

Example 1: referring to fig. 1 to 4, a method for extracting an aspect word based on deep learning includes the following steps:

definition E _w Representing the result of the overall embedding of the three types of information, E _T 、E _s And E _P Representing word embedding, fragment embedding and position embedding respectively, information embedding in Roberta is expressed as:

E _w ＝E _T +E _S +E _P (1)

H _i ＝Transformer(H _i-1 ),i∈[1,L] (2)

E _c ＝Emb _c ·C (3)

wherein the content of the first and second substances,

representing the forward hidden state output of the bi-directional LSTM,

representing the join operation of the vectors.

representing the join operation of the vectors.

And a substep 2-2, obtaining vector representation fusing four different granularity characteristics from a multi-characteristic coding layer, and carrying out context coding on sentences in a context coding layer based on a bidirectional long-term and short-term memory network, wherein the process is as follows:

wherein the content of the first and second substances,

represents the output of the bi-directional LSTM forward hidden layer,

representing the join operation of the vectors.

output of LSTM cell

And

the calculation method of (a) can be expressed as:

wherein, the first and the second end of the pipe are connected with each other,

to input the feature vectors obtained by the linear layer calculation,

and

respectively weight matrix and bias value.

Then, the feature vectors are combined with three weight matrices W ^Q 、W ^K And W ^V Are multiplied respectively to obtain q _i 、k _j And v _j The calculation process can representComprises the following steps:

then, q is added _i Is transferred to

after that, v is adjusted _i And a weight w _ij Multiplying and then adding up the output vector from the attention layer

The calculation process can be expressed as:

Vector connection of multi-head self-attention mechanismThe process can be represented as:

A matrix of weights is represented by a matrix of weights,

representing the bias value, the calculation process is represented as:

and a substep 2-4, using a sequence decoding layer based on the conditional random field as a sequence decoder, and completing the extraction of the aspect words in the sentence through sequence marking. Let X be input to the sequence decoding layer as X ═ X ₁ ,x ₂ ,...,x _m Y ═ Y for the tag sequence ₁ ,y ₂ ,...,y _m }, the calculation process of prediction can be expressed as:

P(Y|X)＝softmax(s(X,Y)) (27)

where s (X, Y) represents the score of the label prediction, A represents a randomly initialized matrix,

The calculation process can be expressed as:

the loss function of the model can be expressed as:

And 3, for the text to be processed, firstly sending the text to be processed into the model obtained through the training process in the step 2, carrying out multi-feature coding, context coding, global context information extraction and the like on the sentence by using the model, and finally obtaining the result of extracting the aspect words by using sequence decoding.

In summary, the invention firstly uses the multi-feature coding layer to initially code the sentence, then the context coding layer learns the context information of the sentence, then the global context information extraction layer learns the associated information between the face words, and finally the sequence decoding layer is used to complete the extraction of the face words.

It should be noted that the above-mentioned embodiments illustrate rather than limit the scope of the invention, and that those skilled in the art will be able to modify the invention in its various equivalent forms after reading the present invention, all within the scope of the appended claims.

Claims

1. A method for extracting aspect words based on deep learning is characterized by comprising the following steps:

step 1, constructing a facet word extraction data set,

step 2, training an aspect word extraction model,

and 3, testing the data set.

2. The method for extracting the aspect words based on the deep learning of claim 1, wherein in the step 1, an aspect word extraction data set is constructed, specifically, a SemEval2014 retaurant data set and a Laptop data set are collected firstly, then an ACL14 Twitter public data set is collected, and finally the data sets are divided into a training set and a verification set according to a ratio of 8:2 and are respectively used for training and verifying an aspect word extraction model.

3. The method for extracting the aspect words based on the deep learning of claim 1, wherein the step 2 comprises, firstly, sending the sentences of the aspect words to be extracted into a multi-feature coding layer of the model to obtain an initial coding result of the sentences; then, the initial coding result of the sentence is sent into a context coding layer based on the bidirectional LSTM to carry out context coding of the sentence; then, extracting global contextual features of the sentence by a global contextual information extraction layer based on a multi-head self-attention mechanism, and capturing semantic association among the aspect words; and finally, decoding is completed through a sequence decoding layer based on the conditional random field to obtain the result of extracting the aspect words, in the training stage, the loss function of the model compares the predicted value with the true value of the model, the loss value is calculated, and the model parameters are updated through back propagation to enable the model parameters to become better.

4. The method for extracting the aspect words based on the deep learning of claim 3, wherein in the step 2, the aspect word extraction model is trained, and the implementation process of the step is divided into 4 sub-steps:

E _w ＝E _T +E _S +E _P (1)

H _i ＝Transformer(H _i-1 ),i∈[1,L] (2)

wherein H _i Represents the result of the i-th layer transform coding, and L represents the total number of layers of the transform of the Roberta-base encoder;

then, the characters corresponding to each word are encoded, and the filled character sequence is set as C ═ C ₁ ,c ₂ ,...,c _n N denotes the number of characters, Emb _c For an embedding matrix of characters, the embedding process of the characters is expressed as:

E _c ＝Emb _c ·C (3)

wherein the content of the first and second substances,

representing the forward hidden state output of the bi-directional LSTM,

a join operation representing a vector;

a join operation representing a vector;

wherein the content of the first and second substances,

represents the output of the bi-directional LSTM forward hidden layer,

a join operation that represents a vector is performed,

input gate of LSTM cell i _t And an output gate o _t And forget door f _t The calculation processes of (a) are respectively expressed as:

output of LSTM cell

And

the calculation method of (a) is expressed as:

wherein, W represents a weight matrix, b represents a bias value, sigmoid and tanh represent activation functions, and x represents matrix multiplication;

wherein the content of the first and second substances,

to input the feature vectors obtained by the linear layer calculation,

and

respectively a weight matrix and an offset value;

then, the feature vectors are combined with three weight matrices W ^Q 、W ^K And W ^V Are multiplied respectively to obtain q _i 、k _j And v _j The calculation process is expressed as:

then, q is added _i Is transferred to

The calculation process can be expressed as:

wherein the output of the k-th self attention head is

wherein concat represents a vector join operation, and K outputs from the attention head are joined to obtain a hidden state vector H' _att ；

A matrix of weights is represented by a matrix of weights,

representing the bias value, the calculation process is represented as:

and a substep 2-4 of using a sequence decoding layer based on the conditional random field as a sequence decoder, completing the extraction of aspect words in the sentence through sequence marking, and assuming that the input of the sequence decoding layer is X ═ { X ═ X ₁ ,x ₂ ,...,x _m Y ═ Y for the tag sequence ₁ ,y ₂ ,...,y _m And then the predicted calculation process is represented as:

P(Y|X)＝softmax(s(X,Y)) (27)

for representing adjacent labels y _i And y _i+1 H represents the output of the upper layer,

denotes the y th _i+1 The score of each label, P (Y | X) represents the conditional probability of Y occurring under the condition of X, softmax is an activation function, and finally, a Viterbi algorithm is used for calculating the label sequence with the highest score to serve as a final prediction result

The calculation process is expressed as:

the loss function of the model can be expressed as:

5. The method for extracting the aspect words based on the deep learning of claim 1, wherein in step 3, the model is tested by using a test set, specifically, for the text to be processed, the text to be processed is firstly sent to the model obtained through the training process in step 2, the model performs the steps of multi-feature coding, context coding, global context information extraction and the like on the sentence, and finally, the extraction of the aspect words is completed by using a sequence decoder.