CN111897957A - Capsule neural network integrating multi-scale feature attention and text classification method - Google Patents

Capsule neural network integrating multi-scale feature attention and text classification method Download PDF

Info

Publication number
CN111897957A
CN111897957A CN202010683462.3A CN202010683462A CN111897957A CN 111897957 A CN111897957 A CN 111897957A CN 202010683462 A CN202010683462 A CN 202010683462A CN 111897957 A CN111897957 A CN 111897957A
Authority
CN
China
Prior art keywords
capsule
representation
target text
layer
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010683462.3A
Other languages
Chinese (zh)
Other versions
CN111897957B (en
Inventor
琚生根
王超凡
周刚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sichuan University
Original Assignee
Sichuan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sichuan University filed Critical Sichuan University
Priority to CN202010683462.3A priority Critical patent/CN111897957B/en
Publication of CN111897957A publication Critical patent/CN111897957A/en
Application granted granted Critical
Publication of CN111897957B publication Critical patent/CN111897957B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a capsule neural network integrated with multi-scale feature attention and a text classification method, wherein the network comprises a bidirectional circulation layer, a multi-scale feature attention layer, a part of the bidirectional circulation layer is connected with a capsule layer and a category capsule layer; the multi-scale feature attention layer is used for carrying out convolution operation on the global feature representation of the target text sent by the bidirectional circulation layer through a convolution window to obtain multi-element grammatical features, and weighting the multi-element grammatical features of each word under different scales; the partial connection capsule layer comprises a sub-capsule unit and a father capsule unit, the sub-capsule unit receives the weighted multi-element grammatical features and transmits the information to the father capsule unit through a routing method, and finally the feature representation of the father capsule is obtained; the category capsule layer is used for expressing the probability that the target text belongs to one category. The network accurately captures the multi-element grammatical features of the text through attention among the multi-element features of different scales, and avoids parameter scale increase caused by adopting a plurality of similar complete capsule layers.

Description

Capsule neural network integrating multi-scale feature attention and text classification method
Technical Field
The invention belongs to the technical field of text classification of text mining, and particularly relates to a capsule neural network integrating multi-scale feature attention and a text classification method.
Background
Text classification belongs to an important component in text mining applications, and comprises question classification, emotion analysis, topic classification and the like. Many mainstream text classification models today are generally based on Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), and transformers; kim first proposed to encode sentences with multiple convolution kernels for the purpose of text classification, and then various CNN-based models began to appear in the text classification task.
The current research work in the field of text classification based on CNN has grown to maturity, but there are still problems: for example, Yang et al utilize the multi-scale multivariate grammatical features of a text by average pooling, but the fusion mode of the features is unreasonable, neglecting the problem that the grammatical features of each scale corresponding to words in the text should not be equally important but be determined by specific context, and invisibly expanding the parameter scale of the model to 3 times of the original scale; the Capsule neural network model proposed by Zheng et al only utilizes the binary grammatical features of the text, and directly ignores other possible multi-grammatical features in the text; it can be seen that the existing research work based on the CapsNets can not capture rich multi-element grammatical features well, which directly influences the understanding of the model to the whole text, and the model can correctly understand the meaning of the word on the basis of considering the specific context only when the most important multi-element grammatical features are accurately extracted.
In the capsule neural network, information exchange between a child capsule and a parent capsule is also an important research method, in a common routing algorithm, information in the child capsule is routed to each parent capsule, and in this way, some redundant information in the child capsule is transmitted to the parent capsule, so that data is overlarge and system burden is increased.
Disclosure of Invention
In view of the above, it is an object of the present invention to provide a capsule neural network with attention integrated into multi-scale features, which can accurately capture multi-syntactic features of text.
In order to achieve the purpose, the technical scheme of the invention is as follows: the system comprises a bidirectional circulation layer, a multi-scale characteristic attention layer, a part of connection capsule layers and a classification capsule layer; wherein the content of the first and second substances,
the bidirectional loop layer also comprises an RNN encoder for receiving the word vector sequence of the target text and obtaining the feature representation of the front and back contexts corresponding to each word of the target text through the RNN encoder, wherein the feature representation of the front and back contexts corresponding to all the words of the target text forms the global feature representation of the target text;
the multi-scale feature attention layer is connected with the bidirectional circulation layer and used for representing the global features of the received target text through a convolution window to obtain multi-element grammatical features and weighting the multi-element grammatical features of each word under different scales;
the part is connected with the capsule layer, is connected with the multi-scale feature attention layer and comprises a sub-capsule unit and a father capsule unit, wherein the sub-capsule unit receives the weighted multi-element grammatical features and transmits information to the father capsule unit through a route to finally obtain feature representation of the father capsule;
the category capsule layer is connected with the partial connection capsule layer and comprises at least 2 category capsules, and each category capsule corresponds to one category and is used for expressing the probability that the target text belongs to one category.
Further, the multi-scale feature attention layer includes: the system comprises a convolution network unit, a convolution characteristic aggregation unit and a scale characteristic weighting unit;
the convolution network unit receives the global feature representation of the target text sent by the bidirectional circulation layer and obtains the grammatical feature representation of the target text through a plurality of convolution windows;
the convolution characteristic aggregation unit is connected with the convolution network unit and used for enabling the grammatical characteristic representation of the target text to be represented by a convolution kernel to generate a corresponding scalar;
and the scale feature weighting unit is connected with the convolution feature aggregation unit and used for receiving scalar representation of the target text and generating attention weights of all scale multi-language features to obtain weighted representation of the target text.
Further, the convolutional network unit obtains the syntax feature representation by:
Figure BDA0002585408100000031
Figure BDA0002585408100000032
H=[z1;z2...;zm];
i is the ith word of the target text, l is the size of the convolution window,
Figure BDA0002585408100000033
for the grammatical feature representation of the i-th word under a convolution window of size l, zlExpressing all words of the target text by grammatical features of convolution windows with the size of l, and expressing all words of the target text by grammatical features of m convolution windows with different sizes;
the convolution feature aggregation unit obtains a scalar representation by:
Figure BDA0002585408100000034
Fensemdenotes summing the individual components of the input vector, k being the number of convolution kernels,
Figure BDA0002585408100000041
the result of applying convolution operation under the ith word, the syntactic characteristics of the ith word under the jth convolution kernel are represented, and j is a summation index;
the scale feature weighting unit obtains weighting representation by the following method:
Figure BDA0002585408100000042
ai=soft max(MLP(si));
Figure BDA0002585408100000043
Figure BDA0002585408100000044
Figure BDA0002585408100000045
wherein s isiTo be an aggregate feature representation of the ith word,
Figure BDA0002585408100000046
for scalar representation of the ith word under volume window 1, aiFor the weight representation corresponding to the ith word, MLP is a multi-layer perceptron,
Figure BDA0002585408100000047
for attention weight of the ith word under l-gram characteristics,
Figure BDA0002585408100000048
for the weighted representation of the ith word under the characteristics of L-element grammar, L is the number of different convolution windows, ZattenIs a weighted representation of the target text.
Further, the routing method for routing the data to the parent capsule unit is as follows:
obtaining a prediction vector from a sub-capsule layer to a next parent capsule layer in the capsule neural network through a preset weight matrix;
performing routing iteration on the information of the sub-capsule layer, and calculating a coupling coefficient of a dynamic routing algorithm;
comparing the coupling coefficient with a preset threshold value during the last routing iteration;
if the coupling coefficient is smaller than the threshold value, discarding the coupling coefficient, and reweighting the remaining value to keep the sum as 1;
and obtaining a father capsule representation routed to the father capsule layer through the coupling coefficient and the prediction vector, and zooming the father capsule representation obtained through routing by the father capsule layer to obtain a final father capsule representation.
Further, the step of calculating the coupling coefficient of the dynamic routing algorithm specifically includes:
Figure BDA0002585408100000051
bijfor the initial coupling coefficient without weighting, cijThe coupling coefficient obtained by softmax weighting the initial coupling coefficient.
Further, the step of scaling the parentage capsule representation obtained by routing by the parentage capsule layer includes:
Figure BDA0002585408100000052
Figure BDA0002585408100000053
wherein s isjRepresentation of the parent capsule obtained for routing uj|iAs a prediction vector, vjIs the final parent capsule representation obtained after scaling.
In view of the above, it is a second object of the present invention to provide a method for classifying texts using a capsule neural network for one of the above objects.
In order to achieve the purpose, the technical scheme of the invention is as follows: a text classification method using the capsule neural network for the second purpose comprises the following steps:
receiving a word vector sequence corresponding to a target text, and coding the word vector sequence of the target text to obtain global feature representation of the target text; the global feature representation is a feature representation composition of the context before and after all words of the target text;
obtaining multiple grammatical features through a convolution window by the global feature representation, and weighting the multiple grammatical features of each word under different scales to obtain weighted representation of a target text;
and the child capsule unit receives the weighted representation of the target text and transmits the information route to the father capsule unit to obtain the characteristic representation of the father capsule, and then the characteristic representation of the father capsule is sent to the category capsule layer to obtain the probability that the target text belongs to one category.
Further, the step of encoding the word vector sequence of the target text to obtain the global feature representation of the target text specifically includes:
encoding a word vector of the target text to obtain a front context feature representation and a rear context feature representation of the word vector:
Figure BDA0002585408100000061
Figure BDA0002585408100000062
i is the ith word of the target text, wiIs the i-th word vector of the target text, cl(wi) I-th word vector pre-context feature representation, cr(wi) The ith word vector is followed by a contextual feature representation,
Figure BDA0002585408100000063
is an RNN encoder;
connecting the front context feature representation and the back context feature representation corresponding to a word vector to obtain the feature representation of the front context and the back context of the word vector;
coding all word vectors of the target text to finally obtain global feature representation of the target text; the global feature representation is a feature representation composition of the context before and after all words of the target text.
Further, the step of obtaining the multivariate grammatical feature by performing convolution on the global feature representation through a convolution window, and weighting the multivariate grammatical feature of each word under different scales to obtain a weighted representation of the target text specifically includes:
obtaining grammatical feature representation of the target text by the global feature representation of the received target text through a plurality of convolution windows;
generating corresponding scalar representations by using a convolution kernel method for a plurality of convolution windows according to the grammatical feature representation of the target text;
and generating attention weights of the multi-language features of all scales by using the scalar representation of the target text to obtain the weighted representation of the target text.
Further, the category capsule layer comprises at least 2 category capsules, each category capsule corresponds to one category, the category capsule layer receives the characteristic representation of the parent capsule, and each category capsule represents the probability that the target text belongs to the corresponding category.
Advantageous effects
The invention provides a capsule neural network and a text classification method integrating multi-scale feature attention, which have the beneficial effects that: the capsule neural network integrated with multi-scale feature attention provided by the invention can accurately capture the multi-element grammatical features of a text through the attention among the multi-element features with different scales, and avoids the increase of parameter scale caused by adopting a plurality of similar complete capsule layers; meanwhile, the invention also provides an application method based on the capsule neural network.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below. It is obvious that the drawings in the following description are some embodiments of the invention, and that for a person skilled in the art, other drawings can be derived from them without inventive exercise.
FIG. 1 is a flow chart of a routing method of the present invention;
FIG. 2 is a block diagram of a capsule neural network incorporating multi-scale feature attention in accordance with the present invention;
FIG. 3 is a flow chart of a text classification method using capsule neural networks with multi-scale feature attention fused in accordance with the present invention;
FIG. 4 is a schematic structural diagram of an inventive capsule neural network incorporating multi-scale feature attention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention. It is to be understood that the embodiments described are only a few embodiments of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The examples are given for the purpose of better illustration of the invention, but the invention is not limited to the examples. Therefore, those skilled in the art should make insubstantial modifications and adaptations to the embodiments of the present invention in light of the above teachings and remain within the scope of the invention.
Example 1
Referring to fig. 1, there is a flowchart of a routing method used in a capsule neural network, in particular, a flowchart of a routing method used in a capsule neural network, according to the present invention, including the following steps:
s100: obtaining a prediction vector from a sub-capsule layer to a next parent capsule layer in the capsule neural network through a preset weight matrix; then, step S102 is executed;
in this embodiment, between two adjacent capsule layers, in order to obtain the t-th layer of sub-capsules uiTo the t +1 th layer of the father capsule sjIs predicted by vector uj|iThe capsules u of the t layers can be combinediMultiplying by a weight matrix WijAnd obtaining the weight matrix by random initialization: the specific calculation formula of the prediction vector may be:
uj|i=Wijui
the i and j sub-tables represent the serial numbers of the child capsules and the parent capsules;
s102: performing routing iteration on the information of the sub-capsule layer, and calculating a coupling coefficient of a dynamic routing algorithm;
in this embodiment, the method for performing routing iteration includes:
Figure BDA0002585408100000091
wherein, bijFor initial coupling coefficients without weighting, cijIs a coupling coefficient determined by a dynamic routing algorithm, namely a coupling coefficient obtained by softmax weighting of an initial coupling coefficient is obtained by the original bijOn the basis of the operation of the softmax function, the operation is finished; the coupling coefficient in this embodiment can be regarded as the sub-capsule uiCoupled to a parent capsule sjA priori probability;
s104, comparing the coupling coefficient with a preset threshold value during the last routing iteration; if the coupling coefficient is smaller than the threshold value, executing step S106, otherwise, executing step S108;
s106: discarding the coupling coefficient; then, step S110 is executed;
in this embodiment, the threshold may be set according to actual needs, and the coupling coefficient smaller than the threshold is regarded as a weak connection (with a smaller weight) between the parent capsule layer and the child capsule layer, and is discarded, so that the child capsule most closely related to the parent capsule is routed, and the higher-layer capsule can only receive information from the lower-layer capsule most related thereto, which is helpful for reducing redundant information transmission between the child and parent capsules.
S108: other coupling coefficients are reweighted to keep the sum at 1; then, step S110 is executed;
s110: and obtaining a father capsule representation routed to the father capsule layer through the coupling coefficient and the prediction vector, and zooming the father capsule representation obtained by routing by the father capsule layer to obtain a final father capsule table.
In this embodiment, the information routed to the parent capsule is sj
Figure BDA0002585408100000101
Further, the parent capsule layer also scales the information (parent capsule representation) obtained by routing, which is equivalent to an activation function of a vector version, so as to obtain a final parent capsule vj, and the specific method includes the steps of:
Figure BDA0002585408100000102
wherein v isjThe final information obtained after scaling.
Integrating the number and the dimensionality of the parent capsules of the whole parent capsule layer to obtain the output v epsilon of the parent capsule layerm×d(d and m represent the number and dimensions of the parent capsules, respectively) are input to the next layer for final classification decision routing.
Example 2
Referring to fig. 2, a schematic block diagram of a capsule neural network incorporating multi-scale feature attention in the present embodiment is shown, specifically, the capsule neural network incorporating multi-scale feature attention includes a bidirectional circulation layer 2, a multi-scale feature attention layer 3, a partially connected capsule layer 4, and a category capsule layer 5; wherein the content of the first and second substances,
the bidirectional loop layer 2 further comprises an RNN encoder for receiving the word vector sequence of the target text and obtaining the feature representation of the context before and after each word of the target text through the RNN encoder, wherein the feature representation of the context before and after all the words of the target text forms the global feature representation of the target text;
the multi-scale feature attention layer 3 is connected with the bidirectional circulation layer 1 and used for representing the global features of the received target text through a convolution window to obtain multi-element grammatical features and weighting the multi-element grammatical features of each word under different scales;
in this embodiment, the multi-scale feature attention layer 3 further includes: a convolution network unit 301, a convolution feature aggregation unit 302 and a scale feature weighting unit 303;
the convolution network unit 301 receives the global feature representation of the target text sent by the bidirectional loop layer 2, and obtains the grammatical feature representation of the target text through a plurality of convolution windows; the calculation of the syntactic feature representation may be:
Figure BDA0002585408100000111
Figure BDA0002585408100000112
H=[z1;z2...;zm];
i is the ith word of the target text, l is the size of the convolution window,
Figure BDA0002585408100000113
representing the grammatical features of the ith word under a convolution window with the size of l, representing all words of the target text under the convolution window with the size of l by zl, and representing all words of the target text by the grammatical features of m convolution windows with different sizes by H;
the convolution feature aggregation unit 302 is connected to the convolution network unit 301, and is configured to use a convolution kernel to generate a corresponding scalar representation of the grammatical feature representation of the target text; the scalar representation can be obtained by:
Figure BDA0002585408100000121
Fensemdenotes summing the individual components of the input vector, k being the number of convolution kernels,
Figure BDA0002585408100000122
resulting from applying a convolution operation for the ith word,
Figure BDA0002585408100000123
the syntactic characteristics of the ith word under the jth convolution kernel are shown, and j is a summation index;
the scale feature weighting unit 303 is connected to the convolution feature aggregation unit 302, and is configured to receive a scalar representation of the target text and generate an attention weight of each scale multi-lingual feature to obtain a weighted representation of the target text; the scale feature weighting unit 303 obtains a weighted representation by:
Figure BDA0002585408100000124
ai=soft max(MLP(si));
Figure BDA0002585408100000125
Figure BDA0002585408100000126
Figure BDA0002585408100000127
wherein s isiTo be an aggregate feature representation of the ith word,
Figure BDA0002585408100000128
for scalar representation of the ith word under the volume window l, aiFor the weight representation corresponding to the ith word, MLP is a multi-layer perceptron,
Figure BDA0002585408100000129
for attention weight of the ith word under l-gram characteristics,
Figure BDA00025854081000001210
for the weighted representation of the ith word under the characteristics of L-element grammar, L is the number of different convolution windows, ZattenIs a weighted representation of the target text.
The partial connection capsule layer 4 is connected with the multi-scale feature attention layer 3 and comprises a sub-capsule unit 401 and a father capsule unit 402, the sub-capsule unit 401 receives the weighted multi-element grammatical features and transmits the information to the father capsule unit 402 through a routing algorithm, and finally feature representation of the father capsule is obtained; the routing method in this embodiment may specifically refer to the routing method provided in embodiment 1.
The category capsule layer 5 is connected with the partial connection capsule layer 4 and comprises at least 2 category capsules, and each category capsule corresponds to one category and is used for expressing the probability that the target text belongs to one category.
Example 3
Based on the routing algorithm of embodiment 1 and the capsule neural network of embodiment 2, a text classification method is provided in this embodiment, and a flowchart of the method may refer to fig. 4, specifically, the method includes the following steps:
s600: receiving a word vector sequence corresponding to a target text, and coding the word vector sequence of the target text to obtain global feature representation of the target text; then, step S602 is performed;
in this example, the capsule neural network input to example 2 is composed of a series of words w1,w2.....wnThe word vector sequence and word vectors of all words on the left and right are input into an RNN encoder in sequence to obtain global feature representation of the target text, specifically, a word vector of the target text is encoded to obtain front context feature representation and rear context feature representation of the word vector:
Figure BDA0002585408100000131
Figure BDA0002585408100000132
i is the ith word of the target text, wiIs the i-th word vector of the target text, cl(wi) I-th word vector pre-context feature representation, cr(wi) The ith word vector is followed by a contextual feature representation,
Figure BDA0002585408100000141
is an RNN encoder;
then, the characteristic representation x of the front context and the rear context of the word vector is obtained by connecting the front context characteristic representation and the rear context characteristic representation corresponding to the word vectori
xi=[cl(wi),cr(wi)];
Coding all word vectors of the target text to finally obtain the global feature representation X of the target text1,...xi,...xn](ii) a The global feature representation is a feature representation composition of the context before and after all words of the target text.
S602: obtaining multiple grammatical features through a convolution window by the global feature representation, and weighting the multiple grammatical features of each word under different scales to obtain weighted representation of a target text; then, step S604 is executed;
in this step, the global feature obtained in step S600 may enter the multi-scale feature attention layer 3 in embodiment 2, and then the multi-scale feature attention layer 3 obtains the grammatical feature representation of the target text by receiving the global feature representation of the target text through a plurality of convolution windows:
Figure BDA0002585408100000142
Figure BDA0002585408100000143
H=[z1;z2...;zm];
i is the ith word of the target text, l is the size of the convolution window,
Figure BDA0002585408100000144
for the grammatical feature representation of the i-th word under a convolution window of size l, zlExpressing all words of the target text by grammatical features of convolution windows with the size of l, and expressing all words of the target text by grammatical features of m convolution windows with different sizes;
for a word, not all the multi-grammars are important, so it is necessary to determine which multi-grammars are more important for a word through the scale feature weighting unit 303 and the convolution feature aggregation unit 302, and in particular, the grammatical feature representation of the target text is generated by using convolution kernels for a plurality of convolution windows to generate corresponding scalar representations:
Figure BDA0002585408100000151
Fensemdenotes summing the individual components of the input vector, k being the number of convolution kernels,
Figure BDA0002585408100000152
resulting from applying a convolution operation for the ith word,
Figure BDA0002585408100000153
the syntactic characteristics of the ith word under the jth convolution kernel are shown, and j is a summation index;
and finally, generating attention weights of multi-language features of all scales by using scalar representation of the target text to obtain weighted representation of the target text:
Figure BDA0002585408100000154
ai=soft max(MLP(si));
Figure BDA0002585408100000155
Figure BDA0002585408100000156
Figure BDA0002585408100000157
wherein s isiIs the ith wordIs indicative of the aggregate characteristics of (a),
Figure BDA0002585408100000158
for scalar representation of the ith word under volume window 1, aiFor the weight representation corresponding to the ith word, MLP is a multi-layer perceptron,
Figure BDA0002585408100000159
for attention weight of the ith word under l-gram characteristics,
Figure BDA00025854081000001510
for the weighted representation of the ith word under the characteristics of L-element grammar, L is the number of different convolution windows, ZattenIs a weighted representation of the target text, contains accurate and rich multivariate grammatical features, and
Figure BDA00025854081000001511
s604: the child capsule unit 401 receives the weighted representation of the target text and routes the information to the parent capsule unit 402, so as to obtain the characteristic representation of the parent capsule; then, step S606 is executed;
in this embodiment, step S602 is performed to output ZattenIn the classic Capsnets, the information in the child capsule is routed to each parent capsule, and this way also passes some redundant information in the child capsule to the parent capsule, so in this embodiment, referring to the routing method in embodiment 1, we discard some weak connections (with smaller weights) between the parent and child capsules, and only the child capsule most closely related to the parent capsule is routed.
The final information v obtained after scaling can be obtained by referring to the routing algorithm of embodiment 1 specificallyiIt is noted that the different forms of lines of the partial connection routes on the partial connection capsule layer in fig. 4 represent routes between different sub-capsules to the parent capsule.
S606: and sending the feature representation of the father capsule to a category capsule layer to obtain the probability that the target text belongs to one category.
In this embodiment, the category capsule layer includes at least 2 category capsules, each category capsule corresponds to a category, the category capsule layer receives the feature representation of the parent capsule, the length of the vector in each capsule represents the probability that the input text belongs to the category, and the direction of each set of vectors also retains some characteristics of its features, which can be regarded as feature encoding vectors of the input sample, and in order to increase the difference between the category lengths, the model herein uses a separate marginal loss function:
Lj=Gjmax(0,m+-||vj||)2+λ(1-Gj)max(0,m--||vj||2);
wherein m is+、m-Respectively, an upper and a lower boundary; if and only if vjWhen classified correctly, Gj1 is ═ 1; λ is a hyper-parameter, which may be defined as, in one implementation, λ is 0.5.
Example 4
In this embodiment, the validity of the network of embodiment 2 and the method of embodiment 3 is verified, and in this embodiment, 5 large-scale text classification datasets that are commonly used are used: yelp and Amazon corpora are user reviews that predict sentiment, P denotes the polarity of data reviews that need only be predicted, and F denotes the number of stars (1 to 5 stars) that need to be predicted; yah.a is a question and answer dataset, the convolution kernel size and vocabulary size of these 7 data sum are set, and then several common text classification models are selected for testing, including linear text classification models: a classification model using h-softmax proposed in Joulin2017 and an application bag-of-words model proposed in Qiao2018 are used for text classification; RNN and variant models: a generative text classification model built by using a long-short term memory network (LSTM) in Yogatama2018, a hierarchical attention mechanism network for document classification proposed in Yang 2016; CNN and variant models thereof: the accuracy of a text classification algorithm is improved by using a deep-level CNN (29 layers) proposed in Conneau2017, and a variable-length grammar feature in a text is captured by using a multi-scale feature attention CNN proposed in Wang 2018; capsule network model: in Ren2018, a compression coding method is used to simplify parameters of the capsules model, a k-means method is used to improve a routing algorithm, and the application of the capsules in text classification is proposed for the first time in Yang 2018.
In this embodiment, Adam proposed by Zeiler in 2012 was used to optimize all trainable parameters; the dimension of the input vector and the hidden state is set to be 100 or 128, the number of capsules in a part of connected capsule layers is 30, and the characteristic length is 100; the dimension of the category capsule layer is set to 16, besides, in order to reduce the memory and time overhead, the weight in the capsule network is set to be shared, the threshold of the partial connection routing algorithm is set to be 0.05, and the network model of the invention can refer to fig. 4.
The final experimental results are given in table 1 below:
TABLE 1 Classification accuracy of different models under various data sets
Figure BDA0002585408100000181
Referring to table 1, the adopted evaluation index is accuracy (accuracycacy), and it can be seen that in 5 data sets, the networks in the present invention are all the best, and especially, the accuracy on Yahoo and Amaz-F data sets is improved by 0.9 and 0.5 respectively compared with the best CNN model, because the average length of the texts of the two data sets is relatively long and the number of target classes is relatively large, such texts contain a large amount of complex grammatical feature information, and only models with strong feature learning ability can obtain good classification effect; the network of the invention has good effect on the data sets; on the other hand, Mul-capsules are a variant of the neural network in the invention, which does not contain partial connection routes, and from the aspect of data, the complete neural network is improved by about 0.3 percent on each data set compared with the Mul-capsules model, which shows that the connections discarded by the network in the invention are just connections which can deteriorate the effect of the neural network, namely, the connections are intuitively understood as redundant information connections between the sub-capsules and the parent capsules.
In addition, table 1 also performs experiments on the networks of the present invention, and from data, the feature learning capability of the networks of the present invention on the text classification task is far better than that of the CNN and RNN models, while the present invention introduces multi-scale feature attention in the networks of the present invention, and according to table 1, the classification accuracy of the networks of the present invention on 5 data sets is all greater than that of the networks of the present invention, that is, the networks of the present invention have feature learning capability far exceeding that of other networks of models of the networks of the present invention.
On the other hand, in this embodiment, an experiment is also performed on the routing method in embodiment 1, see the Mul-capsules model in table 1, which is a capsule network model that does not use the partial connectivity routing algorithm in embodiment 1, and it can be seen from the result data that the accuracy of the routing method in embodiment 1 is higher than that of the Mul-capsules model.
Preferably, the present embodiment further performs an experiment on the parameter scale of the neural network in the present invention, and selects the following models for comparison with the network in the present invention, where the first model is capsule-B proposed by Yang in 2018, and it utilizes multi-scale multivariate syntactic characteristics, and the convolution window size is 3, 4, 5; the second and third models were tested to extract single-scale multivariate syntactic features, with convolution window sizes of 3 and 2, respectively, as shown in table 2:
TABLE 2 comparison of different model parameter scales
Figure BDA0002585408100000191
Figure BDA0002585408100000201
It can be seen that the multivariate syntactic characteristics utilized in the present invention are the most abundant but the parameters are the least, unlike the classical text classification capsule network, the network of the present invention does not need to employ several similar complete capsule network layers to obtain the comprehensive multivariate syntactic characteristics, because it captures the accurate text syntactic information by using the multiscale characteristic attention before the text characteristic representation is input to the capsule network, and obtains the richer multivariate syntactic characteristics with fewer parameters; for example, the capsule-B uses 24M parameters to obtain the 3, 4, 5-element grammatical features of the text, while the network uses 2M parameters to obtain the 1, 3, 5, 7, 9-element grammatical features of the text; similarly, though Ren2018 reduces parameters by using a compression coding form, it only uses 2-element grammatical features of the text, so that the learning capability of the text features is far lower than that of the invention; on the other hand, the network of the present invention sets a smaller number of capsules than other models because the text features input to the capsule layer are already very refined and accurate after passing through the multi-scale feature attention layer, so theoretically it is sufficient to extract the underlying low-level features with fewer capsules.
While the present invention has been described with reference to the embodiments shown in the drawings, the present invention is not limited to the embodiments, which are illustrative and not restrictive, and it will be apparent to those skilled in the art that various changes and modifications can be made therein without departing from the spirit and scope of the invention as defined in the appended claims.

Claims (10)

1. A capsule neural network integrated with multi-scale feature attention is characterized by comprising a bidirectional circulation layer, a multi-scale feature attention layer, a part of bidirectional circulation layer connected with a capsule layer and a category capsule layer; wherein the content of the first and second substances,
the bidirectional loop layer also comprises an RNN encoder for receiving the word vector sequence of the target text and obtaining the feature representation of the front and back contexts corresponding to each word of the target text through the RNN encoder, wherein the feature representation of the front and back contexts corresponding to all the words of the target text forms the global feature representation of the target text;
the multi-scale feature attention layer is connected with the bidirectional circulation layer and is used for obtaining multi-element grammatical features through convolution operation of received target text global feature representations and weighting the multi-element grammatical features of each word under different scales;
the sub-capsule unit receives the weighted multi-element grammatical features and transmits information to the father capsule unit through a route, and finally characteristic representation of the father capsule is obtained;
the category capsule layer is connected with part of the connecting capsule layer and comprises at least 2 category capsules, and each category capsule corresponds to one category and is used for expressing the probability that the target text belongs to a certain category.
2. The capsule neural network of claim 1, the multi-scale feature attention layer comprising: the system comprises a convolution network unit, a convolution characteristic aggregation unit and a scale characteristic weighting unit;
the convolution network unit receives the global feature representation of the target text sent by the bidirectional circulation layer and obtains the grammatical feature representation of the target text through a plurality of convolution windows;
the convolution characteristic aggregation unit is connected with the convolution network unit and used for enabling the grammatical characteristic representation of the target text to be represented by a convolution kernel to generate a corresponding scalar;
and the scale feature weighting unit is connected with the convolution feature aggregation unit and used for receiving scalar representation of the target text and generating attention weights of all scale multi-language features to obtain weighted representation of the target text.
3. The capsule neural network of claim 2, wherein the convolutional network unit derives the syntactic characteristic representation by:
Figure FDA0002585408090000021
Figure FDA0002585408090000022
H=[z1;z2...;zm];
i is the ith word of the target text and l is the convolution windowThe size of (a) is (b),
Figure FDA0002585408090000023
for the grammatical feature representation of the i-th word under a convolution window of size l, zlExpressing grammatical features of all words of the target text under a convolution window with the size of l, and expressing the grammatical features of all words of the target text through convolution windows with different sizes of m;
the convolution feature aggregation unit obtains a scalar representation by:
Figure FDA0002585408090000024
Fensemdenotes summing the individual components of the input vector, k being the number of convolution kernels,
Figure FDA0002585408090000025
to apply a convolution operation under the ith word,
Figure FDA0002585408090000026
the syntactic characteristics of the ith word under the jth convolution kernel are shown, and j is a summation index;
the scale feature weighting unit obtains weighting representation by the following method:
Figure FDA0002585408090000031
ai=softmax(MLP(si));
Figure FDA0002585408090000032
Figure FDA0002585408090000033
Figure FDA0002585408090000034
wherein s isiTo be an aggregate feature representation of the ith word,
Figure FDA0002585408090000035
for scalar representation of the ith word under volume window 1, aiFor the weight representation corresponding to the ith word, MLP is a multi-layer perceptron,
Figure FDA0002585408090000036
for attention weight of the ith word under l-gram characteristics,
Figure FDA0002585408090000037
for the weighted representation of the ith word under the characteristics of L-element grammar, L is the number of different convolution windows, ZattenIs a weighted representation of the target text.
4. The capsule neural network fused with multi-scale feature attention of claim 3, wherein the routing method for routing the data to the parent capsule unit is as follows:
obtaining a prediction vector from a sub-capsule layer to a next parent capsule layer in the capsule neural network through a preset weight matrix;
performing routing iteration on the information of the sub-capsule layer, and calculating a coupling coefficient of a dynamic routing algorithm;
comparing the coupling coefficient with a preset threshold value during the last routing iteration;
if the coupling coefficient is smaller than the threshold value, discarding the coupling coefficient, and reweighting the remaining value to keep the sum as 1;
and obtaining a father capsule representation routed to the father capsule layer through the coupling coefficient and the prediction vector, and zooming the father capsule representation obtained through routing by the father capsule layer to obtain a final father capsule representation.
5. The capsule neural network integrating multi-scale feature attention of claim 4, wherein the step of calculating the coupling coefficient of the dynamic routing algorithm is specifically calculated as:
Figure FDA0002585408090000041
wherein, bijIs the initial coupling coefficient.
6. The capsule neural network integrating multi-scale feature attention of claim 5, wherein the step of scaling the routed parent capsule representation by the parent capsule layer comprises:
Figure FDA0002585408090000042
Figure FDA0002585408090000043
wherein s isjRepresentation of the parent capsule obtained for routing uj|iAs a prediction vector, vjIs the final parent capsule representation obtained after scaling.
7. A text classification method using the capsule neural network of claims 1-6, comprising the steps of:
receiving a word vector sequence corresponding to a target text, and coding the word vector sequence of the target text to obtain global feature representation of the target text; the global feature representation is a feature representation composition of the context before and after all words of the target text;
obtaining multiple grammatical features through a convolution window by the global feature representation, and weighting the multiple grammatical features of each word under different scales to obtain weighted representation of a target text;
and the child capsule unit receives the weighted representation of the target text and transmits the information route to the father capsule unit to obtain the characteristic representation of the father capsule, and then the characteristic representation of the father capsule is sent to the category capsule layer to obtain the probability that the target text belongs to one category.
8. The method according to claim 7, wherein the step of encoding the sequence of word vectors of the target text to obtain a global feature representation of the target text specifically comprises:
encoding a word vector of the target text to obtain a front context feature representation and a rear context feature representation of the word vector:
Figure FDA0002585408090000051
Figure FDA0002585408090000052
i is the ith word of the target text, wiIs the i-th word vector of the target text, cl(wi) I-th word vector pre-context feature representation, cr(wi) The ith word vector is followed by a contextual feature representation,
Figure FDA0002585408090000053
is an RNN encoder;
connecting the front context feature representation and the back context feature representation corresponding to a word vector to obtain the feature representation of the front context and the back context of the word vector;
coding all word vectors of the target text to finally obtain global feature representation of the target text; the global feature representation is a feature representation composition of the context before and after all words of the target text.
9. The method according to claim 8, wherein the step of obtaining the multivariate syntactic characteristics by convolving the global characteristic representation with a window, and weighting the multivariate syntactic characteristics of each word at different scales to obtain a weighted representation of the target text specifically comprises:
obtaining grammatical feature representation of the target text by the global feature representation of the received target text through a plurality of convolution windows;
generating corresponding scalar representations by using a convolution kernel method for a plurality of convolution windows according to the grammatical feature representation of the target text;
and generating attention weights of the multi-language features of all scales by using the scalar representation of the target text to obtain the weighted representation of the target text.
10. The method of claim 9, wherein the category capsule layer comprises at least 2 category capsules, each category capsule corresponding to a category, the category capsule layer receiving a characterization of a parent capsule, each category capsule representing a probability that the target text belongs to the corresponding category.
CN202010683462.3A 2020-07-15 2020-07-15 Capsule neural network integrating multi-scale feature attention and text classification method Active CN111897957B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010683462.3A CN111897957B (en) 2020-07-15 2020-07-15 Capsule neural network integrating multi-scale feature attention and text classification method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010683462.3A CN111897957B (en) 2020-07-15 2020-07-15 Capsule neural network integrating multi-scale feature attention and text classification method

Publications (2)

Publication Number Publication Date
CN111897957A true CN111897957A (en) 2020-11-06
CN111897957B CN111897957B (en) 2021-03-16

Family

ID=73192060

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010683462.3A Active CN111897957B (en) 2020-07-15 2020-07-15 Capsule neural network integrating multi-scale feature attention and text classification method

Country Status (1)

Country Link
CN (1) CN111897957B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112699215A (en) * 2020-12-24 2021-04-23 齐鲁工业大学 Grading prediction method and system based on capsule network and interactive attention mechanism
CN113190681A (en) * 2021-03-02 2021-07-30 东北大学 Fine-grained text classification method based on capsule network mask memory attention
CN114581965A (en) * 2022-03-04 2022-06-03 长春工业大学 Training method of finger vein recognition model, recognition method, system and terminal
CN116304842A (en) * 2023-05-18 2023-06-23 南京信息工程大学 Capsule network text classification method based on CFC structure improvement

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109558487A (en) * 2018-11-06 2019-04-02 华南师范大学 Document Classification Method based on the more attention networks of hierarchy
CN110046671A (en) * 2019-04-24 2019-07-23 吉林大学 A kind of file classification method based on capsule network
US20190370972A1 (en) * 2018-06-04 2019-12-05 University Of Central Florida Research Foundation, Inc. Capsules for image analysis
US10509860B2 (en) * 2016-02-10 2019-12-17 Weber State University Research Foundation Electronic message information retrieval system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10509860B2 (en) * 2016-02-10 2019-12-17 Weber State University Research Foundation Electronic message information retrieval system
US20190370972A1 (en) * 2018-06-04 2019-12-05 University Of Central Florida Research Foundation, Inc. Capsules for image analysis
CN109558487A (en) * 2018-11-06 2019-04-02 华南师范大学 Document Classification Method based on the more attention networks of hierarchy
CN110046671A (en) * 2019-04-24 2019-07-23 吉林大学 A kind of file classification method based on capsule network

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
PENG H, LI J, WANG S, ET AL.: "Hierarchical taxonomy-aware and attentional graph capsule RCNNs for large-scale multi-label text classification", 《 HIERARCHICAL TAXONOMY-AWARE AND ATTENTIONAL GRAPH CAPSULE RCNNS FOR LARGE-SCALE MULTI-LABEL TEXT CLASSIFICATION》 *
SABOUR S, FROSST N, HINTON G E.: "Dynamic routing between capsules", 《 ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS》 *
赵晓铮: "基于Attention机制的短文本情感分类方法研究", 《中国优秀硕士学位论文全文数据库信息科技辑》 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112699215A (en) * 2020-12-24 2021-04-23 齐鲁工业大学 Grading prediction method and system based on capsule network and interactive attention mechanism
CN113190681A (en) * 2021-03-02 2021-07-30 东北大学 Fine-grained text classification method based on capsule network mask memory attention
CN113190681B (en) * 2021-03-02 2023-07-25 东北大学 Fine granularity text classification method based on capsule network mask memory attention
CN114581965A (en) * 2022-03-04 2022-06-03 长春工业大学 Training method of finger vein recognition model, recognition method, system and terminal
CN114581965B (en) * 2022-03-04 2024-05-14 长春工业大学 Finger vein recognition model training method, finger vein recognition model training system and terminal
CN116304842A (en) * 2023-05-18 2023-06-23 南京信息工程大学 Capsule network text classification method based on CFC structure improvement

Also Published As

Publication number Publication date
CN111897957B (en) 2021-03-16

Similar Documents

Publication Publication Date Title
CN111897957B (en) Capsule neural network integrating multi-scale feature attention and text classification method
CN111897908B (en) Event extraction method and system integrating dependency information and pre-training language model
CN109597891B (en) Text emotion analysis method based on bidirectional long-and-short-term memory neural network
CN110287320B (en) Deep learning multi-classification emotion analysis model combining attention mechanism
CN106980683B (en) Blog text abstract generating method based on deep learning
CN108830287A (en) The Chinese image, semantic of Inception network integration multilayer GRU based on residual error connection describes method
CN111881262B (en) Text emotion analysis method based on multi-channel neural network
CN110866542B (en) Depth representation learning method based on feature controllable fusion
CN111078833B (en) Text classification method based on neural network
CN109977413A (en) A kind of sentiment analysis method based on improvement CNN-LDA
CN111598183B (en) Multi-feature fusion image description method
CN111078866B (en) Chinese text abstract generation method based on sequence-to-sequence model
CN109214006B (en) Natural language reasoning method for image enhanced hierarchical semantic representation
CN112417098A (en) Short text emotion classification method based on CNN-BiMGU model
CN110046223B (en) Film evaluation emotion analysis method based on improved convolutional neural network model
CN112527966B (en) Network text emotion analysis method based on Bi-GRU neural network and self-attention mechanism
CN112732921B (en) False user comment detection method and system
CN112749274A (en) Chinese text classification method based on attention mechanism and interference word deletion
CN113657115A (en) Multi-modal Mongolian emotion analysis method based on ironic recognition and fine-grained feature fusion
CN116383387A (en) Combined event extraction method based on event logic
CN110991515B (en) Image description method fusing visual context
CN116579347A (en) Comment text emotion analysis method, system, equipment and medium based on dynamic semantic feature fusion
CN112052889A (en) Laryngoscope image identification method based on double-gating recursive unit decoding
CN111259147A (en) Sentence-level emotion prediction method and system based on adaptive attention mechanism
CN115544260B (en) Contrast optimization coding and decoding method for text emotion analysis

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant