CN107169035A

CN107169035A - A kind of file classification method for mixing shot and long term memory network and convolutional neural networks

Info

Publication number: CN107169035A
Application number: CN201710257132.6A
Authority: CN
Inventors: 苏锦钿; 霍振朗; 欧阳志凡
Original assignee: South China University of Technology SCUT
Current assignee: South China University of Technology SCUT
Priority date: 2017-04-19
Filing date: 2017-04-19
Publication date: 2017-09-15
Anticipated expiration: 2037-04-19
Also published as: CN107169035B

Abstract

The invention discloses a kind of file classification method for mixing shot and long term memory network and convolutional neural networks, by fully combining two-way shot and long term memory network in the advantage of advantage and convolutional neural networks in terms of learning text local feature in terms of the contextual information of learning text, after the contextual information for learning word using two-way shot and long term memory network, further learn to extract the local feature of the term vector of contextual information by convolutional neural networks again, then two-way shot and long term memory network is recycled to learn the context of these local features, form the output of fixed dimension, classification output is carried out finally by a multilayer perceptron.The accuracy rate of category of model can be further improved, and with preferable versatility, good effect is all achieved on multiple corpus of test.

Description

A kind of file classification method for mixing shot and long term memory network and convolutional neural networks

Technical field

The present invention relates to natural language processing field, and in particular to one kind mixing shot and long term memory network and convolutional Neural net The file classification method of network.

Background technology

Text automatic classification based on machine learning is a research most popular in natural language processing field in recent years Direction, it is numerous in information retrieval, search engine, automatic question answering, ecommerce, digital library, automatic abstract, news portal etc. Field has obtained extensive application.So-called text automatic classification refers on the premise of given taxonomic hierarchies, sharp The process of text categories is automatically determined after being analyzed with the mode of machine learning come the content to text.1990s with Before, text automatic classification mainly by the way of KBE, i.e., is classified, it has the disadvantage into by hand by professional This height, waste time and energy.Since the nineties, many researchers start various statistical methods and machine learning method being applied to certainly Dynamic text classification, such as support vector machines, AdaBoost algorithms, NB Algorithm, KNN algorithms and Logistic are returned Return.In recent years, with the fast development of deep learning and various neural network models, the text classification side based on deep learning Method causes the close attention and research of academia and industrial quarters, some typical neural network models, such as recurrent neural network (using shot and long term memory network LSTM and GRU as main representative) and convolutional neural networks CNN are widely used in text In classification, and achieve good effect.Existing research and application are proved recurrent neural network and are suitable for learning sentence Long-term dependence between middle linguistic unit, convolutional neural networks are suitable for learning the local feature of sentence, but current grind Study carefully and be not adequately bonded recurrent neural network and the respective advantage of convolutional neural networks, be also not bound with language in consideration sentence The contextual information of unit.

The content of the invention

The purpose of the present invention is that there is provided one kind mixing shot and long term memory network and volume for above-mentioned the deficiencies in the prior art The file classification method of product neutral net, using the information above and context information of word in two-way LSTM learning texts sentence, connects Learning outcome and local feature is further extracted by CNN, then recycle one two-way LSTM layers to learn local feature Between relation, finally learning outcome is classified and exported by a multilayer perceptron.

The purpose of the present invention can be achieved through the following technical solutions：

A kind of file classification method for mixing shot and long term memory network and convolutional neural networks, methods described includes following step Suddenly：

Step 1, the sentence in text is pre-processed, the distribution of lengths of sentence and square in combined training corpus Unified sentence length is formed after difference, the length threshold for determining sentence, input text is obtained using the good term vector table of pre-training In the vectorization of each word represent, form continuous and dense real number vector matrix；

Step 2, the sentence term vector for input, respectively by positive each word of LSTM e-learnings above The context information of information and reverse each word of LSTM e-learnings, and the result of study is subjected to series connection merging, so that Sentence term vector comprising semantic information is represented to be converted into while including the expression of semantic and contextual information；

Step 3, the word exported respectively to two-way LSTM networks using multiple different in width, the nuclear matrix comprising different weights Vector matrix carries out two-dimensional convolution computing, extracts local convolution feature, and generate the local convolution eigenmatrix of multilayer；

Step 4, using one-dimensional maximum pond algorithm down-sampling is carried out to the local convolution eigenmatrix of multilayer, obtain sentence Multilayer global characteristics matrix, and by result carry out series connection merging；

Step 5, using the LSTM networks of two opposite directions learn the long-term dependence between sentence local feature respectively, And exported last learning outcome；

Step 6, the output result of step 5 is first passed through to a full connection hidden layer, it is then right by one softmax layers again The classification of sentence is predicted.

Further, a kind of shot and long term memory network and the file classification method of convolutional neural networks of mixing is one Completed in individual multilayer neural network, the step 1 is completed in first layer input layer, and step 2 is two-way LSTM layers in the second layer Middle to complete, step 3 completion in CNN layer of third layer, step 4 is completed in the 4th layer of pond layer, and step 5 is two-way in layer 5 Completed in LSTM layers, step 6 is completed in layer 6 output layer.

Further, two-way LSTM layers of the second layer is used for the context letter for learning to be originally inputted the word of each in sentence Breath, and export after the learning outcome of each word is connected, two-way LSTM layer of the layer 5 learns sentence spy after convolution Contextual information between levying, and only export the learning outcome of final step.

Further, it is described to include punctuation mark filtering, abbreviation polishing to sentence progress pretreatment, delete sky in step 1 Lattice, participle and forbidden character are carried out to sentence filter.

Further, the step 3 is local feature learning process, passes through the two-dimensional convolution window of multiple different word step-lengths The term vector comprising contextual information is learnt with convolution kernel, so as to obtain varigrained phrase information.

Further, the step 4 is sampling and reduction process, and multilayer is locally rolled up by one-dimensional maximum pond algorithm Product eigenmatrix carries out down-sampling, obtains most important characteristic value in the window of the pond of each in sentence, and be used as local window In character representation.

Further, the step 5 learns for the context of local feature, by between two-way LSTM study local features Contextual information, and the learning outcome of last term vector is exported, while forming the one-dimensional output of fixed dimension.

Further, the step 6 is exported for classification, and classification judgement is carried out by a multilayer perceptron connected entirely, And final output is obtained according to the probability distribution situation on specified taxonomic hierarchies.

Further, the step 6 is completed in the multilayer perceptron of two layers, including a full connection hidden layer and one Individual softmax layers, the output result of step 6 is the prediction classification of correspondence text.

The present invention compared with prior art, has the following advantages that and beneficial effect：

The present invention is being learned by fully combining advantages and CNN of the two-way LSTM in terms of the contextual information of learning text The advantage in terms of text local feature is practised, a kind of mixing LSTM and CNN file classification method is proposed, by using two-way LSTM Learn word contextual information after, then by CNN further learn extract contextual information term vector local feature, then Recycle two-way LSTM to learn the context of these local features, the output of fixed dimension is formed, finally by a multilayer sense Know that device carries out classification output.The accuracy rate of category of model can be further improved, and with preferable versatility, in many of test Good effect is all achieved on individual corpus.

Brief description of the drawings

Fig. 1 is the general frame figure of multilayer neural network model of the embodiment of the present invention.

Embodiment

With reference to embodiment and accompanying drawing, the present invention is described in further detail, but embodiments of the present invention are not limited In this.

Embodiment：

A kind of file classification method for mixing shot and long term memory network and convolutional neural networks is present embodiments provided, it is described Method comprises the following steps：

Step 1, sentence in text is pre-processed, including punctuation mark filtering, abbreviation polishing, delete space, distich Son carries out the distribution of lengths and mean square deviation of sentence in participle and forbidden character filtering, combined training corpus, determines the length of sentence Unified sentence length is formed after degree threshold value, the vector of each word in input text is obtained using the good term vector table of pre-training Change and represent, form continuous and dense real number vector matrix；

A kind of shot and long term memory network and the file classification method of convolutional neural networks of mixing described above is more than one Completed in layer neutral net, the Organization Chart of multilayer neural network is as shown in figure 1, the step 1 is complete in first layer input layer Into；Step 2 is completed in the second layer is two-way LSTM layers, wherein, two-way LSTM output dimension is 256 dimensions；Step 3 is in third layer Completed in CNN layers, wherein, the convolution word step-length in CNN layers is respectively 2,3,4, and output dimension is 128 dimensions；Step 4 is at the 4th layer Completed in the layer of pond, the word step-length of pond window is respectively 2,3,4, and use one-dimensional maximum pond；Step 5 is in layer 5 Completed in two-way LSTM layers, wherein two-way LSTM layers output dimension is 128 dimensions, and only export the study knot of last word Really；Step 6 is completed in layer 6 output layer, and the output layer is the multilayer perceptron of two layers, including a full connection Hidden layer and one softmax layers, the full connection hidden layer is 128 dimensions, and dropout values are 0.5, and the output result of step 6 is pair Answer the prediction classification of text.Loss function is defined using polynary cross entropy during model training, and combines RMSProp optimizations Device.

Wherein, two-way LSTM layers of the second layer is used for the contextual information for learning to be originally inputted the word of each in sentence, and And exported after the learning outcome of each word is connected, after the two-way LSTM layers of study convolution of layer 5 between sentence characteristics Contextual information, and only export the learning outcome of final step.

Wherein, the step 3 is local feature learning process, passes through the two-dimensional convolution window and volume of multiple different word step-lengths Term vector of the product verification comprising contextual information is learnt, so as to obtain varigrained phrase information, the step 4 is to adopt Sample and reduction process, carry out down-sampling to the local convolution eigenmatrix of multilayer by one-dimensional maximum pond algorithm, obtain sentence In most important characteristic value in each pond window, and as the character representation in local window, the step 5 is local special The context study levied, learns the contextual information between local feature, and export last term vector by two-way LSTM Learning outcome, while formed fixed dimension one-dimensional output, the step 6 for classification export, by one connect entirely it is many Layer perceptron carries out classification judgement, and obtains final output according to the probability distribution situation on specified taxonomic hierarchies.

It is described above, it is only patent preferred embodiment of the present invention, but the protection domain of patent of the present invention is not limited to This, any one skilled in the art is in the scope disclosed in patent of the present invention, according to the skill of patent of the present invention Art scheme and its patent of invention design are subject to equivalent substitution or change, belong to the protection domain of patent of the present invention.

Claims

1. a kind of file classification method for mixing shot and long term memory network and convolutional neural networks, it is characterised in that methods described Comprise the following steps：

Step 1, the sentence in text is pre-processed, the distribution of lengths and mean square deviation of sentence in combined training corpus, really Unified sentence length is formed after the length threshold for determining sentence, obtains each in input text using the good term vector table of pre-training The vectorization of individual word is represented, forms continuous and dense real number vector matrix；

Step 2, the sentence term vector for input, pass through the information above of positive each word of LSTM e-learnings respectively With the context information of reverse each word of LSTM e-learnings, and the result of study is subjected to series connection merging, so that will bag Sentence term vector containing semantic information represents to be converted into while including the expression of semantic and contextual information；

Step 3, the term vector exported respectively to two-way LSTM networks using multiple different in width, the nuclear matrix comprising different weights Matrix carries out two-dimensional convolution computing, extracts local convolution feature, and generate the local convolution eigenmatrix of multilayer；

Step 4, using one-dimensional maximum pond algorithm down-sampling is carried out to the local convolution eigenmatrix of multilayer, obtain many of sentence Layer global characteristics matrix, and result is subjected to series connection merging；

Step 5, the long-term dependence between sentence local feature that learnt respectively using the LSTM networks of two opposite directions, and will Last learning outcome is exported；

Step 6, the output result of step 5 is first passed through to a full connection hidden layer, then again by one softmax layers to sentence Classification be predicted.

2. a kind of file classification method for mixing shot and long term memory network and convolutional neural networks according to claim 1, It is characterized in that：A kind of shot and long term memory network and the file classification method of convolutional neural networks of mixing is in a multilayer Completed in neutral net, the step 1 is completed in first layer input layer, step 2 is completed in the second layer is two-way LSTM layers, Step 3 is completed in CNN layers of third layer, and step 4 is completed in the 4th layer of pond layer, and step 5 is in layer 5 is two-way LSTM layers Complete, step 6 is completed in layer 6 output layer.

3. a kind of file classification method for mixing shot and long term memory network and convolutional neural networks according to claim 2, It is characterized in that：Two-way LSTM layers of the second layer is used for the contextual information for learning to be originally inputted the word of each in sentence, and Exported after the learning outcome of each word is connected, it is upper between sentence characteristics after the two-way LSTM layers of study convolution of layer 5 Context information, and only export the learning outcome of final step.

4. a kind of file classification method for mixing shot and long term memory network and convolutional neural networks according to claim 1, It is characterized in that：It is described pretreatment is carried out to sentence to include punctuation mark filtering, abbreviation polishing, deletion space, right in step 1 Sentence carries out participle and forbidden character filtering.

5. a kind of file classification method for mixing shot and long term memory network and convolutional neural networks according to claim 1, It is characterized in that：The step 3 is local feature learning process, passes through the two-dimensional convolution window and convolution of multiple different word step-lengths Term vector of the verification comprising contextual information is learnt, so as to obtain varigrained phrase information.

6. a kind of file classification method for mixing shot and long term memory network and convolutional neural networks according to claim 1, It is characterized in that：The step 4 is sampling and reduction process, by one-dimensional maximum pond algorithm to the local convolution feature of multilayer Matrix carries out down-sampling, obtains most important characteristic value in the window of the pond of each in sentence, and be used as the spy in local window Levy expression.

7. a kind of file classification method for mixing shot and long term memory network and convolutional neural networks according to claim 1, It is characterized in that：The step 5 learns for the context of local feature, above and below between two-way LSTM study local features Literary information, and the learning outcome of last term vector is exported, while forming the one-dimensional output of fixed dimension.

8. a kind of file classification method for mixing shot and long term memory network and convolutional neural networks according to claim 1, It is characterized in that：The step 6 exports for classification, by the multilayer perceptron connected entirely a progress classification judgement, and according to The probability distribution situation on taxonomic hierarchies is specified to obtain final output.

9. a kind of file classification method for mixing shot and long term memory network and convolutional neural networks according to claim 1, It is characterized in that：The step 6 is completed in the multilayer perceptron of two layers, including a full connection hidden layer and one Softmax layers, the output result of step 6 is the prediction classification of correspondence text.