CN109241287A

CN109241287A - Textual classification model and method based on intensified learning and capsule network

Info

Publication number: CN109241287A
Application number: CN201811109798.8A
Authority: CN
Inventors: 林东定; 潘嵘
Original assignee: Sun Yat Sen University
Current assignee: Sun Yat Sen University
Priority date: 2018-09-21
Filing date: 2018-09-21
Publication date: 2019-01-18
Anticipated expiration: 2038-09-21
Also published as: CN109241287B

Abstract

The present invention relates to the technical fields of natural language processing, text classification, more particularly, to textual classification model and method based on intensified learning and capsule network.The present invention is basic frame with intensified learning Actor-Critic, capsule network CapsNet, and capsule network extracts the feature of text information, and intensified learning differentiates the connection between capsule layer.Innovative content of the invention is to introduce the routing relation that intensified learning is gone between study capsule network layer, is introduced into capsule network and goes to solve the task that the multi-tag in textual classification model is classified.Using advantage of the capsule network in multi-tag classification task, in the task applied to the classification of text multi-tag, to reach better effect；The whole mechanism of misadjustment, study to the contact method between preferably routing are met using intensified learning.

Description

Textual classification model and method based on intensified learning and capsule network

Technical field

The present invention relates to the technical field of natural language processing, text classification, more particularly, to based on intensified learning and The textual classification model and method of capsule network.

Background technique

Feature learning is a fundamental problem of artificial intelligence field, especially in terms of natural language processing, feature Extraction it is even more important.And in natural language processing, text classification is one very basic, very common process, it ten Divide the learning process dependent on feature.Different from image domains, the semantic logic of text is more difficult to be caught in and express, It is bigger so as to cause the classification task difficulty on text.The optimized integration of general artificial intelligence in natural language processing is exactly Machine is appreciated that the language of the mankind, that is, understands the semantic information of text, so as to execute the task of defined.And text classification It is that machine understands background task in text semantic information task, thus has important research significance herein.For present Textual classification model, there is no be similar to the good of single label text disaggregated model for the performance on multi-tag text categorization task Effect, and the textual classification model based on capsule network has advantageous advantage, and its routing algorithm in this field Belong to a kind of unsupervised clustering algorithm to a certain extent, and intensified learning have the effect of in terms of cluster it is good.

The Text Representation method of present mainstream can substantially be divided into four classes.1. bag of words characteristic model is that one kind does not consider The Text Representation method of the sequence of word in sentence, it encodes the word in sentence, in sentence the feature of word to Amount length is exactly the size of bag of words, for example, the DAN model that Mohit et al. is proposed, it splits the word in a sentence Label, then in afferent nerve network structure, the word of these labels does not retain original location information；What Joulin et al. was proposed Fasttext model, word is directly passed through a lookup table by it, and a neural network model is added, and is not accounted for To the order information of word.2. sequence table representation model is then a kind of model for considering word order, such as Convolutional Neural Network, RecurrentNeural Network etc., but it the shortcomings that first is that not using sentence knot The information of structure.Kim et al. proposes TextCNN model, and using the convolution property of CNN, the information of the word of front and back K is combined Come, thus retain the part order information of word in sentence, the structural information of word but it does not with a hook at the end, each word Part of speech is the same.3. structure feature model is then a kind of model for considering sentence structure information, for example, Tai et al. is mentioned Tree-like LSTM model out has modified the chain type LSTM model of script, tree structure is changed to according to syntax tree, to retain sentence The structural information of middle word；Qian et al. proposes Recursive Autoencoder model, it is also using making in advance Syntax tree go construction sentence structure feature.4. based on the model of attention mechanism, for each ingredient in a sentence, Semantic percentage contribution is different, attention mechanism is intended to artificial for specific word label one relatively high point Number, does not have markd word compared to other, attention mechanism improves contribution of the particular words in semantic description.Yang et al. The HAN model based on attention mechanism is proposed, word improves text point by encoding and then adding attention mechanism The effect of class.

The shortcomings that above-mentioned prior art is 1. bag of words.The shortcomings that bag of words, is not accounting for suitable in sentence Sequence structure problem.Textual classification model based on bag of words has collected the word occurred in sentence, and will be under the storage of its word frequency Come, but such algorithm model has abandoned many sentence information, and from the angle of information theory, more information can be brought Better effect.2. sequence table representation model.The shortcomings that sequence table representation model, is its information being only utilized between sentence sequence, And for the word across word, series model captures its information there is no method, therefore sequence table representation model to a certain extent And it is lost information.3. structure feature model.The shortcomings that structure feature model, is to use the language made in advance Method tree goes the structure feature of construction word not to be directed to although phraseological information is utilized in capture semantically The structure of property.4. the model based on attention mechanism.Model based on attention mechanism corresponds in relationship in acquisition input and output With bigger advantage, but the text model based on attention mechanism does not account for the information of word order sequence, this for Natural language is very bad, because text word order contains very big information.

Summary of the invention

The present invention in order to overcome at least one of the drawbacks of the prior art described above, is provided based on intensified learning and capsule network Textual classification model and method, its effect in multi-tag text classification is promoted with the capsule network in conjunction with intensified learning Fruit.

For multi-tag text categorization task, capsule network has advantageous advantage.First we uses TextCNN The word order between word and partial structural information are obtained, after obtaining its feature, the connection of feature is carried out with capsule network, it can To merge the information of word order and structure.And in capsule network, the characteristic information of fusion can parsing by neural network To the classification results of multi-tag.And intensified learning to a certain extent may be used as the routing mechanism between training capsule network layer To obtain better on-link mode (OLM).Therefore, in conjunction with the textual classification model of intensified learning and capsule network, other moulds be can solve The insurmountable word order of type, the defect of structure.

The technical scheme is that the present invention is with intensified learning Actor-Critic, capsule network CapsNet is basic Frame, capsule network extract text information feature, intensified learning differentiate capsule layer between connection, concrete implementation step It is as follows:

It is divided into the frame of two parts: the frame of intensified learning and the frame of capsule network herein.

The frame of intensified learning includes:

State: indicating current state, and state here mainly contains environment locating for Agent, oneself state, tool It is the connection relationship between two capsule layers herein for body, the state of the connection of all capsule layers constitutes extensive chemical The state of a step in habit.

Action: indicating the action of Agent, and action here is main whether be the connection between capsule layer, either Connect probability.

Reward: indicating the reward that Agent is obtained, and is generally divided into reward immediately and following reward, reward here exist It is exactly the effect classified for text classification, general we use F1, and the measurement standards such as Precision are as reward.

The frame of capsule network the following steps are included:

S1. either divide original raw text to word by participle, be converted into using a lookup table The word or word of embedding form；

S2., the word of embedding form or root are obtained to the Primary after convolution according to the method for TextCNN Capsule；

S3. Primary Capsule is connect after Routing with next layer of Capsule Layer, then and Full Connect Network connection, exports the probability size of different labels；

S4. by lookup in the weight size of BP algorithm modification Full Connect Layer and Embedding layers The representation of each word of table.

In the step S1, detailed process is:

S11. the lookup table of oneself is first initialized according to existing word embedding, wherein The depth of embedding is 300, as the random number being then set between 0 or 0-1 not occurred；

S12. and then again by way of search, each of raw text word or each word are converted into The format of embedding, in this way, each sentence of raw text has been converted into the matrix of long*300 form.

In the step S2, detailed process is:

It S21. is respectively 3*300,4*300,5* with kernel size according to the matrix of the resulting long*300 of S1 step 300 convolution nuclear convolution obtains corresponding feature vector, the quantity of each kernel size is 32；

S22. the Vector_Size (VS) * 32 generated according to S21 step, by the transition matrix of a 32*32*16, obtains To the Primary Capsule of VS*32*16, the dimension of Capsule is 16 here.

In the step S3, detailed process is:

S31. according to the Primary Capsule of the obtained VS*32*16 of S22,32* is set in Capsule Layer The Capsule Layer of 16*16 has used 16 filters here；

S32. the weight matrix value of VS*32*32*16 is set as the State in intensified learning, action each time is just It is modification weighted value therein；

S33. the Capsule Layer of 32*16*16 is calculated according to weight matrix, spreads out through a Full The neural network of Connect, then by one Softmax layers, obtain the probability size of different labels；

S34. according to compared with correct result, obtained loss value utilizes A3C as the Reward in intensified learning Weight matrix in algorithm improvement S32.

Compared with prior art, beneficial effect is: innovative content of the invention is that introducing intensified learning removes study glue Routing relation between keed network layers is introduced into the task for the multi-tag classification that capsule network goes to solve in textual classification model.Benefit With advantage of the capsule network in multi-tag classification task, in the task applied to the classification of text multi-tag, to reach more preferable Effect；The whole mechanism of misadjustment, study to the contact method between preferably routing are met using intensified learning.

Detailed description of the invention

Fig. 1 is intensified learning partial schematic diagram of the present invention.

Fig. 2 is capsule network partial schematic diagram of the present invention.

Specific embodiment

The attached figures are only used for illustrative purposes and cannot be understood as limitating the patent；In order to better illustrate this embodiment, attached Scheme certain components to have omission, zoom in or out, does not represent the size of actual product；To those skilled in the art, The omitting of some known structures and their instructions in the attached drawings are understandable.Being given for example only property of positional relationship is described in attached drawing Illustrate, should not be understood as the limitation to this patent.

As shown in Figure 1,

S1:state is the weighted value of present weight matrix, and Critic can be commented according to finally obtained Loss Value The value of valence present weight matrix；

S2:Policy is the operation for modifying weight matrix, the value for the modification weight matrix that actor can be random；

S3: the operation done according to S2 obtains new loss value.

It is as shown in Figure 2:

S1: either divide original raw text to word by participle, be converted into using a lookup table The word or word of embedding form；

S2: the word of embedding form or root are obtained to the Primary after convolution according to the method for TextCNN Capsule；

S3: Primary Capsule is connect after Routing with next layer of Capsule Layer, then and Full Connect Network connection, exports the probability size of different labels.

S4: by lookup in the weight size of BP algorithm modification Full Connect Layer and Embedding layers The representation of each word of table.

Obviously, the above embodiment of the present invention be only to clearly illustrate example of the present invention, and not be pair The restriction of embodiments of the present invention.For those of ordinary skill in the art, may be used also on the basis of the above description To make other variations or changes in different ways.There is no necessity and possibility to exhaust all the enbodiments.It is all this Made any modifications, equivalent replacements, and improvements etc., should be included in the claims in the present invention within the spirit and principle of invention Protection scope within.

Claims

1. the textual classification model based on intensified learning and capsule network, which is characterized in that frame and glue including intensified learning The frame of keed network；

The frame of intensified learning includes:

State: indicating current state, and state here mainly contains environment locating for Agent, oneself state；

Action: indicating the action of Agent, and action here is main whether be the connection between capsule layer, or connection Probability；

Reward: indicating the reward that Agent is obtained, and is divided into reward immediately and following reward.

2. the method for the textual classification model described in claim 1 based on intensified learning and capsule network, it is characterised in that: glue The frame of keed network the following steps are included:

S2., the word of embedding form or root are obtained to the Primary Capsule after convolution according to the method for TextCNN；

3. the method for the textual classification model according to claim 2 based on intensified learning and capsule network, feature exist In: in the step S1, detailed process is:

S11. oneself lookup table is first initialized according to existing word embedding, wherein embedding Depth is 300, as the random number being then set between 0 or 0-1 not occurred；

4. the method for the textual classification model according to claim 2 based on intensified learning and capsule network, feature exist In: in the step S2, detailed process is:

It S21. is respectively 3*300,4*300 with kernel size according to the matrix of the resulting long*300 of S1 step, 5*300's Convolution nuclear convolution obtains corresponding feature vector, the quantity of each kernel size is 32；

S22. the Vector_Size (VS) * 32 generated according to S21 step, by the transition matrix of a 32*32*16, obtains The Primary Capsule of VS*32*16, the dimension of Capsule is 16 here.

5. the method for the textual classification model according to claim 4 based on intensified learning and capsule network, feature exist In: in the step S3, detailed process is:

S31. according to the Primary Capsule of the obtained VS*32*16 of S22,32*16*16 is set in Capsule Layer Capsule Layer, used 16 filters here；

S32. the weight matrix value of VS*32*32*16 is set as the State in intensified learning, action each time is exactly to repair Change weighted value therein；

S34. according to compared with correct result, obtained loss value utilizes A3C algorithm as the Reward in intensified learning Improve the weight matrix in S32.