CN109635109A

CN109635109A - Sentence classification method based on LSTM and combination part of speech and more attention mechanism

Info

Publication number: CN109635109A
Application number: CN201811430542.7A
Authority: CN
Inventors: 苏锦钿; 周炀; 朱展东
Original assignee: South China University of Technology SCUT
Current assignee: South China University of Technology SCUT
Priority date: 2018-11-28
Filing date: 2018-11-28
Publication date: 2019-04-16
Anticipated expiration: 2038-11-28
Also published as: CN109635109B

Abstract

The invention discloses a kind of based on LSTM and combines the sentence classification method of part of speech and more attention mechanism, comprising steps of each sentence is converted to two based on continuous and dense semantic term vector matrix and part of speech term vector matrix in input layer；Learn word or the contextual information of part of speech in sentence respectively in shared two-way LSTM layers, and is exported after the learning outcome of each step is connected；It is using from attention mechanism and dot product function from attention layer respectively from the important local feature in semantic term vector sequence and part of speech term vector Sequence Learning sentence on each position, it obtains corresponding semantic attention vector sum part of speech and pays attention to force vector, and they are constrained by KL distance；Notice that force vector is weighted summation to two-way LSTM layers of output sequence using obtained semantic attention vector sum part of speech in merging layer, obtains the characterizing semantics and part of speech characterization of sentence, and obtain final sentence semantics to indicate；It is predicted finally by MLP output layer and output of classifying.

Description

Sentence classification method based on LSTM and combination part of speech and more attention mechanism

Technical field

The present invention relates to natural language processing fields, and in particular to one kind is based on LSTM and combines part of speech and more attention machines The sentence classification method of system.

Background technique

Sentence classification is the field natural language processing (Natural Language Processing, NLP) all the time One research hotspot.In recent years, the extensive use with deep learning in NLP, many scholars propose various based on length successively The sentence classification classification method of short-term memory model (Long Short-Term Model, LSTM), and in many sentences classification language Expect library such as Stanford Twitter Sentiment (STS), Stanford Sentiment Treebank binary classification (SSTb2) and on five yuan of classification (SSTb5), TREC, IMDB etc. the effect better than conventional machines learning method is achieved.Relatively For convolutional neural networks CNN, LSTM can preferably portray the contextual information of text sequence data and long-term rely on is closed System, and the gradient for efficiently avoiding the appearance of traditional RNN (Recurrent Neural Network) model disappears or ladder Explosion issues are spent, therefore are widely used in sentence classification task.

At present in the sentence disaggregated model various based on LSTM, mainly using resulting based on large-scale corpus training Word in sentence is converted into the mode of distributed expression by term vector.Existing research has been proved based on large-scale corpus training institute The term vector obtained contains more fully syntactic and semantic information, can greatly improve the effect of sentence classification.It is common at present Pre-training term vector be mainly to utilize CBOW the or Skip-gram model, GloVe algorithm or FastText algorithm of word2vec Deng training gained.These models or algorithm are based primarily upon in certain window the word co-occurrence letter of (or global) in training term vector Breath does not include the part-of-speech information of word itself.Therefore the information that the term vector come contains only content level is trained, is not had Embody the part-of-speech information of word.In general text categorization task (such as newsletter archive classification), knot of the Feature Words for classification Fruit has important indicative function, and these Feature Words are mainly based on noun or verb.For example, " typhoon will enter China east South is coastal " or " China will continue to reduce tax to medium-sized and small enterprises ".And in text emotion classification task, be used to indicate front or The viewpoint word or emotion word of negative emotion tendency are then even more important, these words are mainly based on verb or adjective.For example, " I Like this part film " or " this film is very good to be seen ".Correlative study also indicates that adjective is that viewpoint and the main of emotion are held Carry word.Therefore, the character representation of sentence can preferably be enriched by introducing part-of-speech information, to help to promote the effect that sentence is classified Fruit.In recent years, attention (Attention) mechanism in graph image is introduced into NLP by some scholars, and is appointed in many sons A series of state-of-the-art effects are obtained in business, such as machine translation, text snippet, Relation extraction, read understanding and text Originally contain.Attention mechanism enables model preferably to comprehensively consider in input source each element to the different shadows of objective result Power is rung, and is reduced since the detailed information occurred when sentence is longer loses problem.Some scholars also proposed from attention (Self-attention) mechanism, also referred to as interior attention (Intra-attention), main thought are to utilize each member in sentence The corresponding attention vector sum of the positional information calculation of element characterizes sentence.At present by LSTM and attention (or from attention) mechanism Combine the core for having become many models.But these researchs are simultaneously primarily directed to the attention in content level, equally The part-of-speech information of word is not accounted for.

Summary of the invention

It is a kind of based on LSTM and combination part of speech and more the purpose of the present invention is in view of the above shortcomings of the prior art, providing The sentence classification method of attention mechanism, the method can either make full use of Large Scale Corpus to can provide more accurate grammer And the advantages of semantic information, and the part-of-speech information that can introduce sentence further compensates for pre-training term vector and lacks part-of-speech information Deficiency, to preferably portray feature of the sentence in terms of syntax and semantics.

The purpose of the present invention can be achieved through the following technical solutions:

A kind of sentence classification method based on LSTM and combination part of speech and more attention mechanism, the method are based on following five Layer neural network model, first layer to layer 5 be respectively input layer, shared two-way LSTM layer, from attention layer, merging layer With MLP output layer, specifically includes the following steps:

After being pre-processed in input layer to sentence, be utilized respectively pre-training term vector table and based on it is equally distributed with The matrix that machine initialization generates provides each word and its mathematical notation of part of speech in sentence, so that each sentence is converted to semanteme Term vector matrix and part of speech term vector matrix；

The LSTM layer for passing through two opposite directions in shared two-way LSTM layer learns word in sentence or part of speech respectively Contextual information, and will be exported after the series connection of the learning outcome of each step；

It is using from attention mechanism and dot product function from attention layer respectively from semantic term vector sequence and part of speech word Sequence vector learns the important local feature in sentence on each position, obtains corresponding semantic attention vector sum part of speech and pays attention to Force vector, and they being constrained by KL distance, it is therefore an objective to guarantee that their distributions in sentence on each position to the greatest extent may be used It can be consistent；

Pay attention to force vector to two-way using from the obtained semantic attention vector sum part of speech of attention layer in merging layer LSTM layers of output sequence is weighted summation, obtains the characterizing semantics and part of speech characterization of sentence, then flat by comparing weighting , it connects, sum, maximizing various ways obtain final sentence semantics expression；

Finally by comprising connecting hidden layer entirely and the MLP output layer of softmax layer that connects is predicted and classified defeated entirely Out.

Further, it is described in input layer to sentence carry out pretreatment include sentence is segmented, forbidden character mistake Filter and the operation of length polishing.

Further, the neuronal quantity of hidden layer is connected in MLP output layer entirely according to input layer number, MLP output layer Obtained by the product of number of nodes extracts square root, the neuronal quantity of the softmax layer connected entirely is then the classification number of corresponding classification system Amount.

Further, in the training process of five layers of neural network model, semantic term vector is remained unchanged, and part of speech Term vector is adjusted using Back Propagation Algorithm.

Further, to guarantee that apart from as small as possible, KL distance is added and as nerve in the KL in loss function One of the target of network model optimization.

Compared with the prior art, the invention has the following advantages and beneficial effects:

Sentence classification method provided by the invention based on LSTM and combination part of speech and more attention mechanism, can either be abundant The advantages of can provide more accurate syntactic and semantic information using Large Scale Corpus, but can introduce the part-of-speech information of sentence into One step makes up the deficiency that pre-training term vector lacks part-of-speech information, to preferably portray spy of the sentence in terms of syntax and semantics Sign.The method also fully utilizes advantage of the LSTM in study sentence in terms of word and the contextual information of part of speech, and note Advantage of the power mechanism of anticipating in terms of learn the important local feature of sentence, the disaggregated model of proposition is with accuracy rate height and versatile The advantages that, in some famous open corpus, including 20Newsgroup corpus, IMDB corpus, Movie Review, TREC and Stanford Sentiment Treebank (SSTb) etc., achieves good effect.

Detailed description of the invention

Fig. 1 is the overall construction drawing of five layers of neural network model in the embodiment of the present invention.

Specific embodiment

Present invention will now be described in further detail with reference to the embodiments and the accompanying drawings, but embodiments of the present invention are unlimited In this.

Embodiment:

A kind of sentence classification method based on LSTM and combination part of speech and more attention mechanism is present embodiments provided, mainly Thinking is on the one hand to be indicated using the semantic term vector that pre-training term vector provides word in sentence, on the one hand utilizes part-of-speech tagging work Tool is labeled the word in sentence, and combining simplified part of speech tally set (mainly includes noun, verb, adjective, pair Word, closing tag UNK etc.) by Speech conversion at the form of serial number, then mapped and learnt by embeding layer；Then, sharp Learn the contextual information of semantic term vector and part of speech term vector respectively with a shared two-way LSTM, and by each time The forward direction and reversion choice result of step export after carrying out series connection merging, to respectively obtain the context relation of word and part of speech；? On the basis of this, it is directed to the semantic term vector sequence and part of speech term vector sequence of LSTM layers of output respectively from attention layer using one Learn the location information in sentence, and construct corresponding attention force vector, while attention force vector is carried out about using KL distance Beam, it is therefore an objective to guarantee when the attention weight of term vector semantic on some position is high, the attention weight of part of speech term vector Also high, to preferably capture the semanteme for classifying useful for sentence and part of speech feature；Then, a customized merging is utilized Layer will be used as together with the output of LSTM from the obtained two attentions force vector of attention layer and input, and be weighted and averaged respectively Summation afterwards obtains characterization of the sentence in terms of semantic and part of speech, and result is merged (be respectively adopted weighting stabilize, connect, summing, The multitude of different ways such as maximizing) obtain the final characterizing semantics of sentence；Finally, by one comprising full connection hidden layer and The multilayer perceptron MLP of softmax output layer is predicted and is classified output.In the learning process of model, for pre-training Term vector remains unchanged, and part of speech term vector is then adjusted during model training using Back Propagation Algorithm.

The method is based on following five layers of neural network model, and structure is as shown in Figure 1, first layer is respectively defeated to layer 5 Enter layer, shared two-way LSTM layers, from attention layer, merge layer and MLP output layer, the Partial key parameter such as table 1 in model It is shown:

Table 1

Model first layer first pre-processes sentence, mainly includes punctuation mark filtering, abbreviation polishing, deletes space Deng determining the length threshold of sentence then in conjunction with sentence length distribution and mean square deviation, and carry out length polishing；Then, on the one hand It is indicated, is on the other hand provided using NLTK each in sentence using the semantic vector that pre-training term vector table provides each word in sentence The part-of-speech tagging of a word is then combined with and simplifies the part of speech of same type and be converted into the form of serial number, followed by section [- 0.25,0.25] it is uniformly distributed the term vector that part of speech is initially to specified dimension at random on, and by embeding layer in model training Learnt in the process and is adjusted.For each sentence, finally by the available corresponding semantic term vector matrix of input layer With part of speech term vector matrix.During model training, semantic term vector is remained unchanged, and part of speech term vector is then learnt.

The second layer of model contains a shared two-way LSTM network.The semanteme of sentence obtained for input layer Term vector matrix and part of speech term vector matrix, each two-way LSTM learn it using a forward direction and a reverse LSTM Information above and below, and the learning outcome of each step is subjected to series connection output, finally respectively obtains one and include semanteme And the vector sum of contextual information one vector comprising part of speech and contextual information.

The third layer of model includes one from attention layer, using from attention mechanism and dot product function respectively from semantic word Important local feature in sequence vector and part of speech term vector sequence degree sentence on each position obtains corresponding semantic attention Force vector and part of speech pay attention to force vector, and are constrained by KL distance them.In order to guarantee KL apart from as small as possible, I Joined in loss function KL distance and one of the target as model optimization.

The 4th layer of model includes a customized merging layer, main to pay attention to using from the obtained semanteme of attention layer Force vector and part of speech notice that force vector is weighted summation to LSTM layers of output sequence, obtain the characterizing semantics and part of speech of sentence Then characterization merges to obtain final sentence semantics expression；We are flat by the way that weighting has been comprehensively compared during the experiment , a variety of merging modes such as series connection, summation, maximizing, and its result is analyzed, finally discovery weighted average and series connection Mode of the mode than summing merely or being maximized effect it is more preferable.

The layer 5 of model is a full connection hidden layer and a softmax layer returned for more sorted logics, is used Polynary cross entropy and rmsprop classifier based on stochastic gradient descent are predicted and are exported to the classification of sentence.Entire In the training process of model, the part of speech term vector in input layer is adjusted in combination with back-propagating, and optimize damage simultaneously Lose function and KL distance.

The above, only the invention patent preferred embodiment, but the scope of protection of the patent of the present invention is not limited to This, anyone skilled in the art is in the range disclosed in the invention patent, according to the present invention the skill of patent Art scheme and its patent of invention design are subject to equivalent substitution or change, belong to the scope of protection of the patent of the present invention.

Claims

1. a kind of sentence classification method based on LSTM and combination part of speech and more attention mechanism, which is characterized in that the method Based on following five layers of neural network model, first layer to layer 5 be respectively input layer, shared two-way LSTM layers, from attention Layer merges layer and MLP output layer, specifically includes the following steps:

After pre-processing in input layer to sentence, it is utilized respectively pre-training term vector table and based on equally distributed random first Beginning metaplasia at matrix provide each word and its mathematical notation of part of speech in sentence, thus by each sentence be converted to semantic word to Moment matrix and part of speech term vector matrix；

LSTM layer in shared two-way LSTM layer through two opposite directions learns the upper and lower of word in sentence or part of speech respectively Literary information, and will be exported after the series connection of the learning outcome of each step；

It is using from attention mechanism and dot product function from attention layer respectively from semantic term vector sequence and part of speech term vector Important local feature in Sequence Learning sentence on each position, obtain corresponding semantic attention vector sum part of speech attention to Amount, and they are constrained by KL distance, it is therefore an objective to guarantee their distributions as far as possible one in sentence on each position It causes；

Pay attention to force vector to two-way LSTM using from the obtained semantic attention vector sum part of speech of attention layer in merging layer The output sequence of layer is weighted summation, the characterizing semantics and part of speech characterization of sentence is obtained, then by comparing weighted average, string Connection, summation, maximizing various ways obtain final sentence semantics expression；

Finally by comprising connecting hidden layer entirely and output is predicted and classified to the MLP output layer of softmax layer that connects entirely.

2. a kind of sentence classification method based on LSTM and combination part of speech and more attention mechanism according to claim 1, It is characterized by: it is described in input layer to sentence carry out pretreatment include sentence is segmented, forbidden character filters and long Spend polishing operation.

3. a kind of sentence classification method based on LSTM and combination part of speech and more attention mechanism according to claim 1, It is characterized by: the neuronal quantity of hidden layer is connected in MLP output layer entirely according to input layer number, MLP output layer number of nodes Product extract square root gained, the neuronal quantity of the softmax layer connected entirely is then the categorical measure of corresponding classification system.

4. a kind of sentence classification method based on LSTM and combination part of speech and more attention mechanism according to claim 1, It is characterized by: semantic term vector remains unchanged, and part of speech term vector in the training process of five layers of neural network model It is adjusted using Back Propagation Algorithm.

5. a kind of sentence classification method based on LSTM and combination part of speech and more attention mechanism according to claim 1, It is characterized by: to guarantee that apart from as small as possible, KL distance is added and as neural network mould in the KL in loss function One of the target of type optimization.