CN114564953A

CN114564953A - Emotion target extraction model based on multiple word embedding fusion and attention mechanism

Info

Publication number: CN114564953A
Application number: CN202210185044.0A
Authority: CN
Inventors: 况丽娟; 戴宪华
Original assignee: Sun Yat Sen University
Current assignee: Sun Yat Sen University
Priority date: 2022-02-28
Filing date: 2022-02-28
Publication date: 2022-05-31

Abstract

The invention relates to an emotion target extraction model ME-ATT-CRF based on multiple word embedding fusion and attention mechanism. The model adopts three types of word embedding for fusion, universal embedding, embedding in a specific domain and considering that the word shape can reflect the part of speech to a certain extent so as to influence the labeling result, morphological information rich characteristic representation of the word is added in character level convolution learning, and the character level characteristics are extracted. The model achieves better results without using any additional supervision. In addition, a self-attention mechanism is introduced into a hidden layer of the model, so that the model can automatically learn the association and the weight between different words in the input text, fully understand the context semantics, and pay more attention to the target words to be extracted. Experimental validation and comparison was performed on the four data sets and the experimental results showed that the proposed model accuracy, recall, F1 score was better than the baseline model LSTM-CRF.

Description

Emotion target extraction model based on multiple word embedding fusion and attention mechanism

Technical Field

The invention relates to the field of emotion analysis of natural language processing texts, in particular to an emotion target extraction model based on multiple word embedding fusion and attention mechanism.

Background

Target extraction is used as one of subtasks of fine-grained sentiment analysis, and the purpose is to extract clear comment target words from user comments. In product reviews, the goal is some attribute of the product. For example, "the notebook has a large memory and is expensive," the emotion objects to be extracted in the comment are "memory" and "price," and if the attribute words do not contain emotion, extraction is not necessary, which is helpful for better understanding of the sentence structure. Many researchers also express the extraction task as a sequence annotation task. As one of the important tasks in the field of natural language processing, sequence tagging can perform named entity recognition, part-of-speech tagging, and the like, and recognize entities having specific meanings in sentences, and methods based on dictionaries, statistics, and neural networks are available. For example, traditional sequence models such as conditional random fields and long-short term memory networks are applied to the target extraction task, and certain effects are achieved. In addition, the dependency syntax-based research uses syntax analysis to learn the relationship between the target word and the emotion word in the sentence, but the dependency syntax analysis quality is required, so that the method is suitable for the case that the sentence structure is simple, the effect on the unstructured text is poor, and information irrelevant to tasks is encoded when some deep learning models process extraction problems.

The study found that the mainstream deep learning model BiLSTM-CRF in the objective extraction task has two major disadvantages. Firstly, the input layer usually adopts word2vec or GloVe word vectors, the word vectors pre-trained by large-scale linguistic data lack knowledge in the field of specific data sets, the problems of word meaning and ambiguity in different data sets exist, in addition, the universal word vectors can not encode character-level information, when training texts contain rare words which do not appear in dictionaries, the word vectors can not be mapped into effective word vectors, the word vectors serve as the most basic layer which has great influence on subsequent feature extraction, the word embedding representation capability on words should be enhanced, and the higher-quality word embedding determines that a subsequent network layer can be simpler and more efficient. Secondly, the model mainly adopts LSTM to extract the deep features of the text, but the hidden layer state of each moment of the LSTM is only related to the previous moment and has strong dependency, when the input sentence is long, the hidden state of the current moment is difficult to capture the initially input information, and the information is possibly very critical. The importance of directly outputting hidden layer vectors without using a weight function to adaptively update different hidden states lacks the use of information that should be focused more. Generally speaking, many current algorithms are difficult to fully mine the association between words, so that the problems of incomplete target word extraction, extraction of entities without emotion and the like are caused.

Aiming at the problem one, in order to enhance the representation capability of word embedding in one word with multiple meanings, GloVe universal word embedding cannot be adopted, in-field word embedding and character level word embedding also need to be added, and multiple information is fused to obtain a high-quality word vector so as to enhance the task performance of the downstream; in the second problem, because the RNN has a forgetting problem, the review is too long, and the last state cannot remember the whole sentence and cannot effectively use the information of the whole sentence, the entity can be identified, but an unnecessary target word without emotional colors may be identified. And the attention mechanism can directly capture the relation between any two words no matter how far away, thereby helping to extract the target words which are more closely related to the emotional words. The task can be completed more intelligently and efficiently by combining a model of various word embedding and attention mechanism.

Disclosure of Invention

Aiming at the problems and the technical requirements, a simple and effective emotion target extraction model ME-ATT-CRF is provided, three types of word embedding are adopted for fusion, the universal embedding, the embedding of a specific domain and the consideration that the word shape can reflect the part of speech to a certain extent are adopted, the character representation information of the words can be learned by adding character level convolution, and the character level characteristics are extracted. The model can also achieve better effect without additional supervision. In addition, a self-attention mechanism is introduced into a hidden layer of the model, so that the model can automatically learn the weights and associations of different words in the text, and fully understand the context semantics, thereby paying more attention to the target words to be extracted. Experimental validation and comparison was performed on the four data sets and the experimental results showed that the proposed model accuracy, recall, F1 score was better than the baseline model LSTM-CRF.

The technical scheme of the invention is as follows:

a sentiment target extraction model ME-ATT-CRF based on multiple word embedding fusion and attention mechanism is disclosed, which is characterized in that on the basis of a reference model LSTM-CRF, in order to obtain higher-quality text vector representation, universal word embedding, domain word embedding and character level embedding are fused, and the association between learning words in the attention mechanism is added to strengthen the semantic learning. The method comprises the following steps:

four public data sets were downloaded at the SemEval challenge web site, each data set being as per 3: 1: the proportion of 1 is divided into a training set, a verification set and a test set.

And downloading a word embedding table, and mapping the data into a word vector. The glove.840B.300d.zip file is downloaded in the GloVe official network, the size of the file is 5 GB, the file contains 220 ten thousand words, and each word vector is 300-dimensional. And finding corresponding indexes and word vectors in the GloVe dictionary according to the words of the data set to be embedded as common words. The domain word embedding representation is pre-trained from a small intra-domain corpus, where the range of domains is exactly the domain to which the training and test data belongs. For embedding in a specific field, FastText can be adopted to pretrain Laptop and Restaurant corpora, a word can be split into substrings to obtain character-level embedding, and the method can be used for processing text classification problems. The Laptop corpus is from all notebook reviews of Amazon review datasets and the Restaurant corpus is from the Yelp review datasets. Two types of word embedding are directly downloaded and imported without training.

The character level embedding and acquiring process mainly comprises the steps of carrying out unique hot coding on the following thirty-seven characters, including twenty-six English letters and ten numbers, adding an all-zero vector to represent characters which are not in a character table, processing the unique hot coded vector by using a one-dimensional convolutional neural network, wherein the unique hot coded vector consists of 1 convolutional layer and 1 all-connected layer, adding Dropout to prevent over-fitting of a model, and outputting a 100-dimensional character level vector by using the network.

The GloVe word vector, the specific field vector and the character level vector are spliced, so that word embedding fusion is completed, the ambiguity problem of the same word in different data sets is solved, external knowledge is added, the OOV problem of a dictionary is solved, and great help is brought to subsequent feature extraction.

The model is constructed, semantic features are extracted mainly through bidirectional ReGU, the ReGU consists of two gates, and has cell states at the same time, the advantages of LSTM and GRU are included, and the calculation speed and the accuracy are high.

The RegU controls the updating of the cell state and the hidden layer state through a forgetting gate unit and a residual error gate unit.

A self-attention mechanism is introduced to hidden layer vector, the purpose is to better learn context association without considering distance length, meanwhile, more important information can be given higher weight, the accuracy of final target extraction is improved, and the calculation efficiency is also improved through weight matrix parallel calculation.

During calculation of the self-attention mechanism, three vectors, namely Query, Key and Value, are initialized for each word, weighting is carried out by using different weight matrixes respectively to obtain corresponding vector representation Q, K, V, h times of normalization point-by-attention operation is carried out in parallel, then vectors obtained by h times of operation are spliced, and finally a final multi-head attention score is obtained through linear transformation.

The model is finally a CRF output layer, and the probability model can improve the rationality and the consistency of the final output label. Maximum likelihood probability learning is adopted during training, and stored parameters and dynamic programming ideas are used for decoding during testing.

Dropout with a probability of 0.5 is used in the model training process to prevent overfitting of the model. The learning rate was set to 0.00001 and the number of batchsize samples was set to 32.

And (4) determining the optimal parameters of the model on a verification set, and after 20 iterations, saving the model when the loss function is not reduced any more and evaluating the effect of the model on a test set.

Model evaluation compares mainly the F1 scores, which can evaluate the average of accuracy and recall.

Compared with the most classical extraction model LSTM-CRF, the F1 score of the proposed model is respectively improved by 2.37%, 2.78%, 4.01% and 2.34% on four data sets, and the performance improvement is large.

The beneficial technical effects of the invention are as follows:

1. the application discloses an emotion target extraction model ME-ATT-CRF based on multiple word embedding fusion and an attention mechanism, which is characterized in that on the basis of a reference model LSTM-CRF, in order to obtain higher-quality text vector representation, universal word embedding, field word embedding and character level embedding are fused, association between learning words in the attention mechanism is added, and learning of semantics is enhanced, so that the accuracy rate and recall rate of extracting text emotion targets and F1 scores are improved, and the performance of the extracted model exceeds that of a mainstream algorithm through experiments on multiple public data sets.

2. Because the problem that the meanings of certain specific words are inconsistent with the meanings of the general words of the words and the condition that some words are beyond the range of a word vector table and cannot be represented by word vectors exist during training on different data sets, the method provides the method for fusing the knowledge of specific fields on the basis of general word embedding, mainly aims at performing word embedding training on Restaurant and Laptop data sets by using GloVes to obtain general word vectors and then fusing the general word vectors with field word vectors to solve the polysemy problem of the specific words, also fuses word vectors at a character level, extracts the characteristics at the character level by a convolution n-gram sliding window method, improves the OOV problem existing in the word vector table and further improves the word representation capability of the model.

3. In order to enhance the feature extraction capability and high-level semantic representation of a text, an attention mechanism is introduced into the hidden-level representation of fusion context information extracted by LSTM to acquire more attention information in the text so as to extract target attribute words more accurately, wherein the attention mechanism is to calculate different weights of the context to emotion targets by setting attention matrixes Q, K and V to hidden-level vectors of each word and normalize the weight sums, and the extracted attention matrix with the maximum attention value is input to a decoding layer to be decoded. The text proves that the introduction of the attention mechanism can better extract the information concerned by the text through experiments.

Drawings

FIG. 1 is a block diagram of the model ME-ATT-CRF in the present application.

FIG. 2 is a schematic illustration of the reference model LSTM-CRF in the present application.

Fig. 3 is a schematic diagram of a reguu feature extractor in the present application.

Fig. 4 is a schematic diagram of character-level convolution in the present application.

Fig. 5 is a schematic diagram of a self-attention mechanism in the present application.

Detailed Description

The invention is further described below with reference to the accompanying drawings.

The application discloses an emotion target extraction model ME-ATT-CRF based on multiple word embedding fusion and attention mechanism, which is characterized in that on the basis of a reference model LSTM-CRF, in order to obtain higher-quality text vector representation, universal word embedding, field word embedding and character level embedding are fused, association between learning words in the attention mechanism is added, and learning of semantics is enhanced, so that the accuracy and recall rate of text emotion targets are improved, and experiments on four public data sets show that the performance of the proposed model exceeds that of a mainstream algorithm.

The method disclosed by the invention is an emotion target extraction method for text comments, the main structure of a model refers to fig. 1, a benchmark model refers to fig. 2, the method combines three word embedding and a self-attention mechanism to extract a target, and mainly refers to fig. 3 and fig. 5, and a character level convolution refers to fig. 4.

The first step is as follows: four public data sets are downloaded at the SemEval challenge web site, each data set being as follows, 3: 1: the proportion of 1 is divided into a training set, a verification set and a test set.

The second step is that: and downloading a word embedding table, and mapping the data into a word vector. The glove.840B.300d.zip file is downloaded in the GloVe official network, the size of the file is 5 GB, the file contains 220 ten thousand words, and each word vector is 300-dimensional. Corresponding indexes and word vectors in the GloVe dictionary are found according to the words of the data set and are embedded as universal words. Domain word embedding means pre-trained from a small intra-domain corpus, where the range of domains is exactly the domain to which training and test data belongs. For embedding in a specific field, FastText can be used for pre-training Laptop and Restaurant corpora, words can be split into substrings to obtain character-level embedding, and the embedding method can be used for processing text classification problems. The Laptop corpus is from all notebook reviews of Amazon review datasets and the Restaurant corpus is from the Yelp review datasets. Two types of word embedding are directly downloaded and imported without training.

The third step: the character level embedding and acquiring process mainly comprises the steps of carrying out one-hot coding on the following thirty-seven characters, including twenty-six English letters and ten numbers, adding an all-zero vector to represent the characters which are not in a character table, processing the one-dimensional convolution neural network to obtain the one-hot coded vector, wherein the one-dimensional convolution neural network comprises 1 convolution layer and 1 all-connected layer, adding a Dropout layer to prevent over-fitting of a model, and outputting a 100-dimensional character level vector through the network.

The fourth step: the GloVe word vector, the specific field vector and the character level vector are spliced, so that word embedding fusion is completed, and subsequent feature extraction is facilitated.

The fifth step: the model feature extraction module is constructed, semantic features are extracted mainly through bidirectional ReGU, the ReGU is composed of two gates, the intermediate state is recorded by using the cell state, the advantages of LSTM and GRU are included, and the calculation speed and the accuracy are high.

The ReGU extraction feature process refers to FIG. 3.

And a sixth step: a self-attention mechanism is introduced to hidden layer vector, the purpose is to better learn context association without considering distance length, meanwhile, more important information can be given higher weight, the accuracy of final target extraction is improved, and the calculation efficiency is also improved through weight matrix parallel calculation.

The self-attention mechanism calculation flow refers to fig. 5.

The seventh step: the CRF codes an output layer, and the probability model can improve the rationality and the consistency of the final output label. And maximum likelihood probability learning parameters are adopted during training, and stored parameters and a dynamic programming idea are used for decoding during testing.

Dropout with a probability of 0.5 is used in the model training process to prevent overfitting of the model. The learning rate was 0.00001 and the batch size was 32.

Eighth step: and (4) determining the optimal parameters of the model on a verification set, and after 20 iterations, saving the model when the loss function is not reduced any more and evaluating the effect of the model on a test set.

The ninth step: model evaluation compares mainly the F1 scores, which can evaluate the average of accuracy and recall.

The tenth step: the proposed model achieved an increase in F1 score over the four datasets of 2.37%, 2.78%, 4.01% and 2.34% respectively relative to the most classical extraction model LSTM-CRF.

What has been described above is only a preferred embodiment of the present application, and the present invention is not limited to the above embodiment. It is to be understood that other modifications and variations directly derivable or suggested by those skilled in the art without departing from the spirit and concept of the present invention are to be considered as included within the scope of the present invention.

Claims

1. An emotion target extraction model ME-ATT-CRF based on multiple word embedding fusion and attention mechanism is characterized in that on the basis of a reference model LSTM-CRF, universal word embedding, domain word embedding and character level embedding are fused for obtaining higher-quality text vector representation, association between learning words in the attention mechanism is added, and learning of semantics is enhanced. The method comprises the following steps:

four public data sets were downloaded at the SemEval challenge web site, with each data set case as 3: 1: the proportion of 1 is divided into a training set, a verification set and a test set.

And downloading a word embedding table, and mapping the data into a word vector. And downloading a glove.840B.300d.zip file in a GloVe official website, and finding corresponding indexes and word vectors in a GloVe dictionary according to words in the data set to be embedded as common words. Domain word embedding means pre-trained from a small intra-domain corpus, where the range of domains is exactly the domain to which training and test data belongs. For embedding in a specific field, FastText can be used for pre-training Laptop and Restaurant corpora, words can be split into substrings to obtain character-level embedding, and the embedding method can be used for processing text classification problems. Two types of word embedding are directly downloaded and imported without training.

The character level embedding acquisition process mainly carries out one-hot coding on the following thirty-seven characters, including twenty-six English letters and ten numbers, and additionally adds an all-zero vector to represent the characters which are not in a character table. A one-dimensional convolutional neural network is used for processing the vector of the one-hot coding, the vector consists of 1 convolutional layer and 1 full-connection layer, a Dropout layer is added to prevent the model from being over-fitted, and the network outputs a 100-dimensional character-level vector. And splicing the GloVe word vector, the specific field vector and the character level vector, thereby completing the word embedding fusion.

2. The method of claim, wherein the model is constructed by extracting semantic features primarily by bidirectional RegU, which consists of two gates with cell states, including the advantages of LSTM and GRU, fast computation and high accuracy. A self-attention mechanism is introduced to hidden layer vector, the purpose is to better learn context association without considering distance length, meanwhile, more important information can be given higher weight, the accuracy of final target extraction is improved, and the calculation efficiency is also improved through weight matrix parallel calculation.

3. The method of claim 1 or 2, wherein Dropout with a probability of 0.5 is used during model training to prevent model overfitting. The learning rate was 0.00001 and the number of batchsize samples was 32. Model evaluation compares mainly the F1 scores, which can evaluate the average of accuracy and recall.

4. The method further comprises the following steps: the model improvement module was experimentally validated on the four datasets disclosed in SemEval and compared to the mainstream model algorithm and the baseline algorithm for performance, and the experimental results were graphically and visually presented.

5. The method of claim 4, further comprising the F1 scores of the proposed model over the four datasets obtaining 2.37%, 2.78%, 4.01% and 2.34% improvement over the most classical extraction model LSTM-CRF, respectively.