CN109635109B

CN109635109B - Sentence classification method based on LSTM and combined with part-of-speech and multi-attention mechanism

Info

Publication number: CN109635109B
Application number: CN201811430542.7A
Authority: CN
Inventors: 苏锦钿; 周炀; 朱展东
Original assignee: South China University of Technology SCUT
Current assignee: South China University of Technology SCUT
Priority date: 2018-11-28
Filing date: 2018-11-28
Publication date: 2022-12-16
Anticipated expiration: 2038-11-28
Also published as: CN109635109A

Abstract

The invention discloses a sentence classification method based on LSTM and combined with part of speech and a multi-attention machine mechanism, which comprises the following steps: converting each sentence into two continuous and dense semantic word vector matrixes and part-of-speech word vector matrixes in an input layer; respectively learning the context information of words or parts of speech in sentences in a shared bidirectional LSTM layer, and outputting the learning results of each step after being connected in series; learning important local features at each position in a sentence from a semantic word vector sequence and a part-of-speech word vector sequence respectively by adopting a self-attention mechanism and a point-by-point function in a self-attention layer to obtain corresponding semantic attention vectors and part-of-speech attention vectors, and constraining the semantic attention vectors and the part-of-speech attention vectors through KL (karhunen-Loeve) distances; performing weighted summation on the output sequence of the bidirectional LSTM layer by using the obtained semantic attention vector and the part of speech attention vector in the merging layer to obtain semantic representation and part of speech representation of a sentence and obtain final semantic representation of the sentence; and finally, performing prediction and classified output through an MLP output layer.

Description

Sentence classification method based on LSTM and combined with part-of-speech and multi-attention mechanism

Technical Field

The invention relates to the field of natural language processing, in particular to a sentence classification method based on LSTM and combined with a part of speech and a multi-attention mechanism.

Background

Sentence classification has been a research hotspot in the field of Natural Language Processing (NLP). In recent years, with the wide application of deep learning in NLP, many scholars successively propose various sentence classification and classification methods based on Long Short-Term memory Model (LSTM), and have achieved better effects than the conventional machine learning methods on many sentence classification corpora such as Stanford Twitter Sentime (STS), stanford sentime Treebank binary classification (SSTb 2), quinary classification (SSTb 5), TREC, IMDB, etc. Compared with the convolutional Neural Network CNN, the LSTM can better describe the context information and long-term dependency of text sequence data, and effectively avoids the problem of gradient disappearance or gradient explosion of a traditional RNN (Current Neural Network) model, so that the LSTM is widely applied to sentence classification tasks.

Currently, in various sentence classification models based on LSTM, a word vector obtained based on large-scale corpus training is mainly used to convert words in a sentence into a distributed representation. The existing research proves that the word vector obtained based on large-scale corpus training contains more comprehensive grammar and semantic information, and the sentence classification effect can be greatly improved. The pre-training word vector commonly used at present is mainly obtained by training with a CBOW or Skip-gram model of word2vec, a GloVe algorithm or a FastText algorithm and the like. These models or algorithms are based primarily on word co-occurrence information within a window (or globally) when training word vectors, and do not contain part-of-speech information for the words themselves. Therefore, the trained word vector only contains information of a content level and does not reflect part-of-speech information of words. In a general text classification task (such as news text classification), feature words have an important indication effect on the classification result, and the feature words mainly comprise nouns or verbs. For example, "typhoon will enter the southeast coast of China" or "China will continue to tax the middle and small enterprises". In the text emotion classification task, the viewpoint words or emotion words for indicating positive or negative emotional tendency are more important, and the words are mainly verbs or adjectives. For example, "i like this part of the movie" or "this movie is too nice looking". Related studies have also shown that adjectives are the main carriers of opinion and emotion. Therefore, the introduction of the part-of-speech information can enrich the feature representation of the sentence better, thereby being beneficial to improving the sentence classification effect. In recent years, some scholars have introduced Attention (Attention) mechanisms in graphic images into NLP and have achieved a series of state-of-the-art effects in many subtasks, such as machine translation, text summarization, relationship extraction, reading comprehension, and text implications. The attention mechanism enables the model to better comprehensively consider different influences of elements in the input source on the target result, and reduces the problem of detail information loss caused by long sentences. Some researchers have proposed a Self-attention (Self-attention) mechanism, also called Intra-attention (Intra-attention), whose main idea is to use the position information of each element in a sentence to calculate a corresponding attention vector and characterize the sentence. Currently, the combination of LSTM with attention (or self-attention) mechanisms has become the core of many models. However, these studies are mainly directed to attention at the content level, and the part-of-speech information of the words is not considered.

Disclosure of Invention

The invention aims to provide a sentence classification method based on LSTM and combined with part of speech and a multi-attention machine mechanism, aiming at the defects of the prior art, the method can not only fully utilize the advantage that a large-scale corpus can provide more accurate grammar and semantic information, but also introduce part of speech information of a sentence to further make up the defect that a pre-training word vector lacks part of speech information, thereby better describing the characteristics of the sentence in the aspects of grammar and semantics.

The purpose of the invention can be realized by the following technical scheme:

a sentence classification method based on LSTM and combined with part of speech and multi-attention mechanism is based on the following five-layer neural network model, the first layer to the fifth layer are respectively an input layer, a shared bidirectional LSTM layer, a self-attention layer, a merging layer and an MLP output layer, and the method specifically comprises the following steps:

after preprocessing the sentences in the input layer, respectively utilizing a pre-training word vector table and a matrix generated based on uniformly distributed random initialization to give mathematical expressions of each word and the part of speech thereof in the sentences, thereby converting each sentence into a semantic word vector matrix and a part of speech word vector matrix;

respectively learning the context information of words or parts of speech in sentences through two LSTM layers in opposite directions in a shared bidirectional LSTM layer, and outputting the learning results of each step after being connected in series;

in the self-attention layer, a self-attention mechanism and a point multiplication function are adopted to learn important local features at each position in a sentence from a semantic word vector sequence and a part-of-speech word vector sequence respectively to obtain corresponding semantic attention vectors and part-of-speech attention vectors, and the semantic attention vectors and the part-of-speech attention vectors are constrained by KL (karhunen-Loeve) distance so as to ensure that the semantic attention vectors and the part-of-speech attention vectors are distributed on each position in the sentence as consistent as possible;

in the merging layer, the semantic attention vector and the part-of-speech attention vector obtained from the attention layer are used for carrying out weighted summation on the output sequence of the bidirectional LSTM layer to obtain semantic representation and part-of-speech representation of the sentence, and then final sentence semantic representation is obtained by comparing weighted average, series connection, summation and maximum value calculation;

and finally, performing prediction and classified output through an MLP output layer comprising a fully-connected hidden layer and a fully-connected softmax layer.

Further, the preprocessing of the sentence in the input layer includes performing word segmentation, illegal character filtering and length completion operations on the sentence.

Furthermore, the number of neurons of the fully connected hidden layer in the MLP output layer is obtained by the sum of the input layer node number and the MLP output layer node number and is the number of categories of the corresponding classification system.

Further, in the training process of the five-layer neural network model, the semantic word vector is kept unchanged, and the part-of-speech word vector is adjusted by using a back propagation algorithm.

Further, to ensure that the KL distance is as small as possible, the KL distance is added to the loss function and serves as one of the objectives of neural network model optimization.

Compared with the prior art, the invention has the following advantages and beneficial effects:

the sentence classification method based on LSTM and combined with the part of speech and the multi-attention mechanism provided by the invention can fully utilize the advantage that a large-scale corpus can provide more accurate grammar and semantic information, and can introduce the part of speech information of the sentence to further make up the defect that the pre-training word vector lacks the part of speech information, thereby better describing the characteristics of the sentence in the aspects of grammar and semantics. The method also comprehensively utilizes the advantages of the LSTM in the aspect of learning context information of words and parts of speech in the sentence and the advantages of an attention mechanism in the aspect of learning important local features of the sentence, the provided classification model has the advantages of high accuracy, strong universality and the like, and good effects are achieved in some famous public corpora including a 20Newsgroup corpus, an IMDB corpus, a Movie Review, a TREC, a Stanford Sentment Treebank (SSTB) and the like.

Drawings

Fig. 1 is a general structure diagram of a five-layer neural network model in the embodiment of the present invention.

Detailed Description

The present invention will be described in further detail with reference to examples and drawings, but the present invention is not limited thereto.

Example (b):

the embodiment provides a sentence classification method based on LSTM and combined with part of speech and multi-attention mechanism, which mainly adopts the following steps that on one hand, a pre-training word vector is utilized to give semantic word vector representation of words in a sentence, on the other hand, a part of speech tagging tool is utilized to tag the words in the sentence, and in combination with a simplified part of speech tag set (mainly comprising nouns, verbs, adjectives, adverbs, ending tags UNK and the like), the part of speech is converted into a serial number form, and then mapping and learning are carried out through an embedding layer; secondly, respectively learning the context information of the semantic word vector and the part-of-speech word vector by utilizing a shared bidirectional LSTM, and outputting the forward learning result and the reverse learning result of each time step after being connected in series and combined, thereby respectively obtaining the context relationship of the words and the parts-of-speech; on the basis, a self-attention layer is utilized to learn position information in sentences respectively aiming at semantic word vector sequences and part-of-speech word vector sequences output by an LSTM layer, corresponding attention vectors are constructed, and KL distances are utilized to constrain the attention vectors, so that when the attention weight of the semantic word vectors at a certain position is high, the attention weight of the part-of-speech word vectors is also high, and useful semantic and part-of-speech characteristics for sentence classification are captured better; then, a user-defined merging layer is used for taking two attention vectors obtained from the attention layer and the output of the LSTM as input, weighted averaging is carried out respectively, then summing is carried out to obtain the representation of the sentence in the aspects of semantics and part of speech, and results are merged (various different modes such as weighted smoothing, series connection, summing and maximum value solving are adopted respectively) to obtain the final semantic representation of the sentence; finally, a multi-layer perceptron MLP comprising a fully-connected hidden layer and a softmax output layer is used for prediction and classification output. In the learning process of the model, the pre-training word vectors are kept unchanged, and the part-of-speech word vectors are adjusted by using a back propagation algorithm in the model training process.

The method is based on the following five-layer neural network model, the structure of which is shown in fig. 1, the first layer to the fifth layer are respectively an input layer, a shared bidirectional LSTM layer, a self-attention layer, a merging layer and an MLP output layer, and part of key parameters in the model are shown in table 1:

TABLE 1

The first layer of the model firstly preprocesses sentences, mainly comprises punctuation mark filtration, abbreviation filling, space deletion and the like, then determines the length threshold of the sentences by combining the length distribution and the mean square error of the sentences, and performs length filling; then, on one hand, a pre-training word vector table is used for representing semantic vectors of all words in the sentence, on the other hand, NLTK is used for marking parts of speech of all words in the sentence, then parts of speech of the same type are merged and simplified and converted into a sequence number form, then, the parts of speech are randomly and initially set to be word vectors with specified dimensions by utilizing uniform distribution in an interval (-0.25, 0.25), and learning and adjustment are carried out in a model training process through an embedding layer. For each sentence, the input layer can finally obtain a corresponding semantic word vector matrix and a part-of-speech word vector matrix. In the model training process, the semantic word vector is kept unchanged, and the part-of-speech word vector is learned.

The second layer of the model comprises a shared two-way LSTM network. For the semantic word vector matrix and the part-of-speech word vector matrix of the sentence obtained by the input layer, each two-way LSTM learns the upper and lower information of the sentence by utilizing a forward LSTM and a reverse LSTM, and the learning result of each step is output in series, so that a vector containing semantic and context information and a vector containing part-of-speech and context information are finally obtained respectively.

And the third layer of the model comprises a self-attention layer, and the self-attention mechanism and the point multiplication function are adopted to obtain corresponding semantic attention vectors and part-of-speech attention vectors from important local features at each position in the academic sentences of the semantic word vector sequence and the part-of-speech word vector sequence respectively, and the semantic attention vectors and the part-of-speech attention vectors are constrained through KL distances. To ensure that the KL distance is as small as possible, we add the KL distance to the loss function and serve as one of the goals for model optimization.

The fourth layer of the model comprises a self-defined merging layer, the output sequence of the LSTM layer is subjected to weighted summation mainly by using the semantic attention vector and the part-of-speech attention vector obtained from the attention layer to obtain the semantic representation and the part-of-speech representation of the sentence, and then the semantic representation and the part-of-speech representation of the sentence are merged to obtain the final semantic representation of the sentence; in the experimental process, various combination modes such as weighted average, series connection, summation and maximum value solving are comprehensively compared, and the results are analyzed, so that the effect of the weighted average and series connection mode is better than that of the mode of simply solving or obtaining the maximum value.

The fifth layer of the model is a fully-connected hidden layer and a softmax layer aiming at multi-classification logistic regression, and the categories of the sentences are predicted and output by adopting multivariate cross entropy and an rmsprop classifier based on random gradient descent. In the whole model training process, the part-of-speech word vectors in the input layer are adjusted by combining back propagation, and a loss function and a KL distance are optimized.

The above description is only for the preferred embodiments of the present invention, but the protection scope of the present invention is not limited thereto, and any person skilled in the art can substitute or change the technical solution of the present invention and the inventive concept within the scope of the present invention, which is disclosed by the present invention, and the equivalent or change thereof belongs to the protection scope of the present invention.

Claims

1. A sentence classification method based on LSTM and combined with part of speech and multi-attention mechanism is characterized in that the method is based on a five-layer neural network model, wherein the first layer to the fifth layer are respectively an input layer, a shared bidirectional LSTM layer, a self-attention layer, a merging layer and an MLP output layer, and the method specifically comprises the following steps:

respectively learning context information of words or parts of speech in sentences through two LSTM layers in opposite directions in a shared bidirectional LSTM layer, and outputting learning results of each step after being connected in series;

2. The method of claim 1 for classifying sentences based on LSTM and combined with part-of-speech and multi-attention mechanisms, wherein the method comprises: the preprocessing of the sentence in the input layer comprises the operations of word segmentation, illegal character filtering and length completion of the sentence.

3. The method of claim 1 for classifying sentences based on LSTM and combined with part-of-speech and multi-attention mechanisms, wherein the method comprises: the number of neurons of the fully-connected hidden layer in the MLP output layer is obtained according to the square of the product of the number of nodes of the input layer and the number of nodes of the MLP output layer, and the number of neurons of the fully-connected softmax layer is the number of categories of the corresponding classification system.

4. The method of claim 1 for classifying sentences based on LSTM combined with part-of-speech and multi-attention mechanism, wherein: in the training process of the five-layer neural network model, the semantic word vector is kept unchanged, and the part-of-speech word vector is adjusted by using a back propagation algorithm.

5. The method of claim 1 for classifying sentences based on LSTM and combined with part-of-speech and multi-attention mechanisms, wherein the method comprises: in order to ensure that the KL distance is as small as possible, the KL distance is added to the loss function and serves as one of the targets of neural network model optimization.