CN111897957A

CN111897957A - Capsule neural network integrating multi-scale feature attention and text classification method

Info

Publication number: CN111897957A
Application number: CN202010683462.3A
Authority: CN
Inventors: 琚生根; 王超凡; 周刚
Original assignee: Sichuan University
Current assignee: Sichuan University
Priority date: 2020-07-15
Filing date: 2020-07-15
Publication date: 2020-11-06
Anticipated expiration: 2040-07-15
Also published as: CN111897957B

Abstract

The invention provides a capsule neural network integrated with multi-scale feature attention and a text classification method, wherein the network comprises a bidirectional circulation layer, a multi-scale feature attention layer, a part of the bidirectional circulation layer is connected with a capsule layer and a category capsule layer; the multi-scale feature attention layer is used for carrying out convolution operation on the global feature representation of the target text sent by the bidirectional circulation layer through a convolution window to obtain multi-element grammatical features, and weighting the multi-element grammatical features of each word under different scales; the partial connection capsule layer comprises a sub-capsule unit and a father capsule unit, the sub-capsule unit receives the weighted multi-element grammatical features and transmits the information to the father capsule unit through a routing method, and finally the feature representation of the father capsule is obtained; the category capsule layer is used for expressing the probability that the target text belongs to one category. The network accurately captures the multi-element grammatical features of the text through attention among the multi-element features of different scales, and avoids parameter scale increase caused by adopting a plurality of similar complete capsule layers.

Description

Capsule neural network integrating multi-scale feature attention and text classification method

Technical Field

The invention belongs to the technical field of text classification of text mining, and particularly relates to a capsule neural network integrating multi-scale feature attention and a text classification method.

Background

Text classification belongs to an important component in text mining applications, and comprises question classification, emotion analysis, topic classification and the like. Many mainstream text classification models today are generally based on Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), and transformers; kim first proposed to encode sentences with multiple convolution kernels for the purpose of text classification, and then various CNN-based models began to appear in the text classification task.

The current research work in the field of text classification based on CNN has grown to maturity, but there are still problems: for example, Yang et al utilize the multi-scale multivariate grammatical features of a text by average pooling, but the fusion mode of the features is unreasonable, neglecting the problem that the grammatical features of each scale corresponding to words in the text should not be equally important but be determined by specific context, and invisibly expanding the parameter scale of the model to 3 times of the original scale; the Capsule neural network model proposed by Zheng et al only utilizes the binary grammatical features of the text, and directly ignores other possible multi-grammatical features in the text; it can be seen that the existing research work based on the CapsNets can not capture rich multi-element grammatical features well, which directly influences the understanding of the model to the whole text, and the model can correctly understand the meaning of the word on the basis of considering the specific context only when the most important multi-element grammatical features are accurately extracted.

In the capsule neural network, information exchange between a child capsule and a parent capsule is also an important research method, in a common routing algorithm, information in the child capsule is routed to each parent capsule, and in this way, some redundant information in the child capsule is transmitted to the parent capsule, so that data is overlarge and system burden is increased.

Disclosure of Invention

In view of the above, it is an object of the present invention to provide a capsule neural network with attention integrated into multi-scale features, which can accurately capture multi-syntactic features of text.

In order to achieve the purpose, the technical scheme of the invention is as follows: the system comprises a bidirectional circulation layer, a multi-scale characteristic attention layer, a part of connection capsule layers and a classification capsule layer; wherein the content of the first and second substances,

the bidirectional loop layer also comprises an RNN encoder for receiving the word vector sequence of the target text and obtaining the feature representation of the front and back contexts corresponding to each word of the target text through the RNN encoder, wherein the feature representation of the front and back contexts corresponding to all the words of the target text forms the global feature representation of the target text;

the multi-scale feature attention layer is connected with the bidirectional circulation layer and used for representing the global features of the received target text through a convolution window to obtain multi-element grammatical features and weighting the multi-element grammatical features of each word under different scales;

the part is connected with the capsule layer, is connected with the multi-scale feature attention layer and comprises a sub-capsule unit and a father capsule unit, wherein the sub-capsule unit receives the weighted multi-element grammatical features and transmits information to the father capsule unit through a route to finally obtain feature representation of the father capsule;

the category capsule layer is connected with the partial connection capsule layer and comprises at least 2 category capsules, and each category capsule corresponds to one category and is used for expressing the probability that the target text belongs to one category.

Further, the multi-scale feature attention layer includes: the system comprises a convolution network unit, a convolution characteristic aggregation unit and a scale characteristic weighting unit;

the convolution network unit receives the global feature representation of the target text sent by the bidirectional circulation layer and obtains the grammatical feature representation of the target text through a plurality of convolution windows;

the convolution characteristic aggregation unit is connected with the convolution network unit and used for enabling the grammatical characteristic representation of the target text to be represented by a convolution kernel to generate a corresponding scalar;

and the scale feature weighting unit is connected with the convolution feature aggregation unit and used for receiving scalar representation of the target text and generating attention weights of all scale multi-language features to obtain weighted representation of the target text.

Further, the convolutional network unit obtains the syntax feature representation by:

H＝[z₁；z₂...；z_m]；

i is the ith word of the target text, l is the size of the convolution window,

for the grammatical feature representation of the i-th word under a convolution window of size l, z_lExpressing all words of the target text by grammatical features of convolution windows with the size of l, and expressing all words of the target text by grammatical features of m convolution windows with different sizes;

the convolution feature aggregation unit obtains a scalar representation by:

F_ensemdenotes summing the individual components of the input vector, k being the number of convolution kernels,

the result of applying convolution operation under the ith word, the syntactic characteristics of the ith word under the jth convolution kernel are represented, and j is a summation index;

the scale feature weighting unit obtains weighting representation by the following method:

a_i＝soft max(MLP(s_i))；

wherein s is_iTo be an aggregate feature representation of the ith word,

for scalar representation of the ith word under volume window 1, a_iFor the weight representation corresponding to the ith word, MLP is a multi-layer perceptron,

for attention weight of the ith word under l-gram characteristics,

for the weighted representation of the ith word under the characteristics of L-element grammar, L is the number of different convolution windows, Z_attenIs a weighted representation of the target text.

Further, the routing method for routing the data to the parent capsule unit is as follows:

obtaining a prediction vector from a sub-capsule layer to a next parent capsule layer in the capsule neural network through a preset weight matrix;

performing routing iteration on the information of the sub-capsule layer, and calculating a coupling coefficient of a dynamic routing algorithm;

comparing the coupling coefficient with a preset threshold value during the last routing iteration;

if the coupling coefficient is smaller than the threshold value, discarding the coupling coefficient, and reweighting the remaining value to keep the sum as 1;

and obtaining a father capsule representation routed to the father capsule layer through the coupling coefficient and the prediction vector, and zooming the father capsule representation obtained through routing by the father capsule layer to obtain a final father capsule representation.

Further, the step of calculating the coupling coefficient of the dynamic routing algorithm specifically includes:

b_ijfor the initial coupling coefficient without weighting, c_ijThe coupling coefficient obtained by softmax weighting the initial coupling coefficient.

Further, the step of scaling the parentage capsule representation obtained by routing by the parentage capsule layer includes:

wherein s is_jRepresentation of the parent capsule obtained for routing u_j|iAs a prediction vector, v_jIs the final parent capsule representation obtained after scaling.

In view of the above, it is a second object of the present invention to provide a method for classifying texts using a capsule neural network for one of the above objects.

In order to achieve the purpose, the technical scheme of the invention is as follows: a text classification method using the capsule neural network for the second purpose comprises the following steps:

receiving a word vector sequence corresponding to a target text, and coding the word vector sequence of the target text to obtain global feature representation of the target text; the global feature representation is a feature representation composition of the context before and after all words of the target text;

obtaining multiple grammatical features through a convolution window by the global feature representation, and weighting the multiple grammatical features of each word under different scales to obtain weighted representation of a target text;

and the child capsule unit receives the weighted representation of the target text and transmits the information route to the father capsule unit to obtain the characteristic representation of the father capsule, and then the characteristic representation of the father capsule is sent to the category capsule layer to obtain the probability that the target text belongs to one category.

Further, the step of encoding the word vector sequence of the target text to obtain the global feature representation of the target text specifically includes:

encoding a word vector of the target text to obtain a front context feature representation and a rear context feature representation of the word vector:

i is the ith word of the target text, w_iIs the i-th word vector of the target text, c_l(w_i) I-th word vector pre-context feature representation, c_r(w_i) The ith word vector is followed by a contextual feature representation,

is an RNN encoder;

connecting the front context feature representation and the back context feature representation corresponding to a word vector to obtain the feature representation of the front context and the back context of the word vector;

coding all word vectors of the target text to finally obtain global feature representation of the target text; the global feature representation is a feature representation composition of the context before and after all words of the target text.

Further, the step of obtaining the multivariate grammatical feature by performing convolution on the global feature representation through a convolution window, and weighting the multivariate grammatical feature of each word under different scales to obtain a weighted representation of the target text specifically includes:

obtaining grammatical feature representation of the target text by the global feature representation of the received target text through a plurality of convolution windows;

generating corresponding scalar representations by using a convolution kernel method for a plurality of convolution windows according to the grammatical feature representation of the target text;

and generating attention weights of the multi-language features of all scales by using the scalar representation of the target text to obtain the weighted representation of the target text.

Further, the category capsule layer comprises at least 2 category capsules, each category capsule corresponds to one category, the category capsule layer receives the characteristic representation of the parent capsule, and each category capsule represents the probability that the target text belongs to the corresponding category.

Advantageous effects

The invention provides a capsule neural network and a text classification method integrating multi-scale feature attention, which have the beneficial effects that: the capsule neural network integrated with multi-scale feature attention provided by the invention can accurately capture the multi-element grammatical features of a text through the attention among the multi-element features with different scales, and avoids the increase of parameter scale caused by adopting a plurality of similar complete capsule layers; meanwhile, the invention also provides an application method based on the capsule neural network.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below. It is obvious that the drawings in the following description are some embodiments of the invention, and that for a person skilled in the art, other drawings can be derived from them without inventive exercise.

FIG. 1 is a flow chart of a routing method of the present invention;

FIG. 2 is a block diagram of a capsule neural network incorporating multi-scale feature attention in accordance with the present invention;

FIG. 3 is a flow chart of a text classification method using capsule neural networks with multi-scale feature attention fused in accordance with the present invention;

FIG. 4 is a schematic structural diagram of an inventive capsule neural network incorporating multi-scale feature attention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention. It is to be understood that the embodiments described are only a few embodiments of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The examples are given for the purpose of better illustration of the invention, but the invention is not limited to the examples. Therefore, those skilled in the art should make insubstantial modifications and adaptations to the embodiments of the present invention in light of the above teachings and remain within the scope of the invention.

Example 1

Referring to fig. 1, there is a flowchart of a routing method used in a capsule neural network, in particular, a flowchart of a routing method used in a capsule neural network, according to the present invention, including the following steps:

s100: obtaining a prediction vector from a sub-capsule layer to a next parent capsule layer in the capsule neural network through a preset weight matrix; then, step S102 is executed;

in this embodiment, between two adjacent capsule layers, in order to obtain the t-th layer of sub-capsules u_iTo the t +1 th layer of the father capsule s_jIs predicted by vector u_j|iThe capsules u of the t layers can be combined_iMultiplying by a weight matrix W_ijAnd obtaining the weight matrix by random initialization: the specific calculation formula of the prediction vector may be:

u_j|i＝W_iju_i；

the i and j sub-tables represent the serial numbers of the child capsules and the parent capsules;

s102: performing routing iteration on the information of the sub-capsule layer, and calculating a coupling coefficient of a dynamic routing algorithm;

in this embodiment, the method for performing routing iteration includes:

wherein, b_ijFor initial coupling coefficients without weighting, c_ijIs a coupling coefficient determined by a dynamic routing algorithm, namely a coupling coefficient obtained by softmax weighting of an initial coupling coefficient is obtained by the original b_ijOn the basis of the operation of the softmax function, the operation is finished; the coupling coefficient in this embodiment can be regarded as the sub-capsule u_iCoupled to a parent capsule s_jA priori probability;

s104, comparing the coupling coefficient with a preset threshold value during the last routing iteration; if the coupling coefficient is smaller than the threshold value, executing step S106, otherwise, executing step S108;

s106: discarding the coupling coefficient; then, step S110 is executed;

in this embodiment, the threshold may be set according to actual needs, and the coupling coefficient smaller than the threshold is regarded as a weak connection (with a smaller weight) between the parent capsule layer and the child capsule layer, and is discarded, so that the child capsule most closely related to the parent capsule is routed, and the higher-layer capsule can only receive information from the lower-layer capsule most related thereto, which is helpful for reducing redundant information transmission between the child and parent capsules.

S108: other coupling coefficients are reweighted to keep the sum at 1; then, step S110 is executed;

s110: and obtaining a father capsule representation routed to the father capsule layer through the coupling coefficient and the prediction vector, and zooming the father capsule representation obtained by routing by the father capsule layer to obtain a final father capsule table.

In this embodiment, the information routed to the parent capsule is s_j：

Further, the parent capsule layer also scales the information (parent capsule representation) obtained by routing, which is equivalent to an activation function of a vector version, so as to obtain a final parent capsule vj, and the specific method includes the steps of:

wherein v is_jThe final information obtained after scaling.

Integrating the number and the dimensionality of the parent capsules of the whole parent capsule layer to obtain the output v epsilon of the parent capsule layer_m×d(d and m represent the number and dimensions of the parent capsules, respectively) are input to the next layer for final classification decision routing.

Example 2

Referring to fig. 2, a schematic block diagram of a capsule neural network incorporating multi-scale feature attention in the present embodiment is shown, specifically, the capsule neural network incorporating multi-scale feature attention includes a bidirectional circulation layer 2, a multi-scale feature attention layer 3, a partially connected capsule layer 4, and a category capsule layer 5; wherein the content of the first and second substances,

the bidirectional loop layer 2 further comprises an RNN encoder for receiving the word vector sequence of the target text and obtaining the feature representation of the context before and after each word of the target text through the RNN encoder, wherein the feature representation of the context before and after all the words of the target text forms the global feature representation of the target text;

the multi-scale feature attention layer 3 is connected with the bidirectional circulation layer 1 and used for representing the global features of the received target text through a convolution window to obtain multi-element grammatical features and weighting the multi-element grammatical features of each word under different scales;

in this embodiment, the multi-scale feature attention layer 3 further includes: a convolution network unit 301, a convolution feature aggregation unit 302 and a scale feature weighting unit 303;

the convolution network unit 301 receives the global feature representation of the target text sent by the bidirectional loop layer 2, and obtains the grammatical feature representation of the target text through a plurality of convolution windows; the calculation of the syntactic feature representation may be:

H＝[z₁；z₂...；z_m]；

i is the ith word of the target text, l is the size of the convolution window,

representing the grammatical features of the ith word under a convolution window with the size of l, representing all words of the target text under the convolution window with the size of l by zl, and representing all words of the target text by the grammatical features of m convolution windows with different sizes by H;

the convolution feature aggregation unit 302 is connected to the convolution network unit 301, and is configured to use a convolution kernel to generate a corresponding scalar representation of the grammatical feature representation of the target text; the scalar representation can be obtained by:

resulting from applying a convolution operation for the ith word,

the syntactic characteristics of the ith word under the jth convolution kernel are shown, and j is a summation index;

the scale feature weighting unit 303 is connected to the convolution feature aggregation unit 302, and is configured to receive a scalar representation of the target text and generate an attention weight of each scale multi-lingual feature to obtain a weighted representation of the target text; the scale feature weighting unit 303 obtains a weighted representation by:

a_i＝soft max(MLP(s_i))；

wherein s is_iTo be an aggregate feature representation of the ith word,

for scalar representation of the ith word under the volume window l, a_iFor the weight representation corresponding to the ith word, MLP is a multi-layer perceptron,

for attention weight of the ith word under l-gram characteristics,

The partial connection capsule layer 4 is connected with the multi-scale feature attention layer 3 and comprises a sub-capsule unit 401 and a father capsule unit 402, the sub-capsule unit 401 receives the weighted multi-element grammatical features and transmits the information to the father capsule unit 402 through a routing algorithm, and finally feature representation of the father capsule is obtained; the routing method in this embodiment may specifically refer to the routing method provided in embodiment 1.

The category capsule layer 5 is connected with the partial connection capsule layer 4 and comprises at least 2 category capsules, and each category capsule corresponds to one category and is used for expressing the probability that the target text belongs to one category.

Example 3

Based on the routing algorithm of embodiment 1 and the capsule neural network of embodiment 2, a text classification method is provided in this embodiment, and a flowchart of the method may refer to fig. 4, specifically, the method includes the following steps:

s600: receiving a word vector sequence corresponding to a target text, and coding the word vector sequence of the target text to obtain global feature representation of the target text; then, step S602 is performed;

in this example, the capsule neural network input to example 2 is composed of a series of words w₁，w₂.....w_nThe word vector sequence and word vectors of all words on the left and right are input into an RNN encoder in sequence to obtain global feature representation of the target text, specifically, a word vector of the target text is encoded to obtain front context feature representation and rear context feature representation of the word vector:

is an RNN encoder;

then, the characteristic representation x of the front context and the rear context of the word vector is obtained by connecting the front context characteristic representation and the rear context characteristic representation corresponding to the word vector_i：

x_i＝[c_l(w_i)，c_r(w_i)]；

Coding all word vectors of the target text to finally obtain the global feature representation X of the target text₁，...x_i，...x_n](ii) a The global feature representation is a feature representation composition of the context before and after all words of the target text.

S602: obtaining multiple grammatical features through a convolution window by the global feature representation, and weighting the multiple grammatical features of each word under different scales to obtain weighted representation of a target text; then, step S604 is executed;

in this step, the global feature obtained in step S600 may enter the multi-scale feature attention layer 3 in embodiment 2, and then the multi-scale feature attention layer 3 obtains the grammatical feature representation of the target text by receiving the global feature representation of the target text through a plurality of convolution windows:

H＝[z₁；z₂...；z_m]；

i is the ith word of the target text, l is the size of the convolution window,

for a word, not all the multi-grammars are important, so it is necessary to determine which multi-grammars are more important for a word through the scale feature weighting unit 303 and the convolution feature aggregation unit 302, and in particular, the grammatical feature representation of the target text is generated by using convolution kernels for a plurality of convolution windows to generate corresponding scalar representations:

resulting from applying a convolution operation for the ith word,

and finally, generating attention weights of multi-language features of all scales by using scalar representation of the target text to obtain weighted representation of the target text:

a_i＝soft max(MLP(s_i))；

wherein s is_iIs the ith wordIs indicative of the aggregate characteristics of (a),

for attention weight of the ith word under l-gram characteristics,

for the weighted representation of the ith word under the characteristics of L-element grammar, L is the number of different convolution windows, Z_attenIs a weighted representation of the target text, contains accurate and rich multivariate grammatical features, and

s604: the child capsule unit 401 receives the weighted representation of the target text and routes the information to the parent capsule unit 402, so as to obtain the characteristic representation of the parent capsule; then, step S606 is executed;

in this embodiment, step S602 is performed to output Z_attenIn the classic Capsnets, the information in the child capsule is routed to each parent capsule, and this way also passes some redundant information in the child capsule to the parent capsule, so in this embodiment, referring to the routing method in embodiment 1, we discard some weak connections (with smaller weights) between the parent and child capsules, and only the child capsule most closely related to the parent capsule is routed.

The final information v obtained after scaling can be obtained by referring to the routing algorithm of embodiment 1 specifically_iIt is noted that the different forms of lines of the partial connection routes on the partial connection capsule layer in fig. 4 represent routes between different sub-capsules to the parent capsule.

S606: and sending the feature representation of the father capsule to a category capsule layer to obtain the probability that the target text belongs to one category.

In this embodiment, the category capsule layer includes at least 2 category capsules, each category capsule corresponds to a category, the category capsule layer receives the feature representation of the parent capsule, the length of the vector in each capsule represents the probability that the input text belongs to the category, and the direction of each set of vectors also retains some characteristics of its features, which can be regarded as feature encoding vectors of the input sample, and in order to increase the difference between the category lengths, the model herein uses a separate marginal loss function:

L_j＝G_jmax(0，m⁺-||v_j||)²+λ(1-G_j)max(0，m^--||v_j||²)；

wherein m is⁺、m^-Respectively, an upper and a lower boundary; if and only if v_jWhen classified correctly, G_j1 is ═ 1; λ is a hyper-parameter, which may be defined as, in one implementation, λ is 0.5.

Example 4

In this embodiment, the validity of the network of embodiment 2 and the method of embodiment 3 is verified, and in this embodiment, 5 large-scale text classification datasets that are commonly used are used: yelp and Amazon corpora are user reviews that predict sentiment, P denotes the polarity of data reviews that need only be predicted, and F denotes the number of stars (1 to 5 stars) that need to be predicted; yah.a is a question and answer dataset, the convolution kernel size and vocabulary size of these 7 data sum are set, and then several common text classification models are selected for testing, including linear text classification models: a classification model using h-softmax proposed in Joulin2017 and an application bag-of-words model proposed in Qiao2018 are used for text classification; RNN and variant models: a generative text classification model built by using a long-short term memory network (LSTM) in Yogatama2018, a hierarchical attention mechanism network for document classification proposed in Yang 2016; CNN and variant models thereof: the accuracy of a text classification algorithm is improved by using a deep-level CNN (29 layers) proposed in Conneau2017, and a variable-length grammar feature in a text is captured by using a multi-scale feature attention CNN proposed in Wang 2018; capsule network model: in Ren2018, a compression coding method is used to simplify parameters of the capsules model, a k-means method is used to improve a routing algorithm, and the application of the capsules in text classification is proposed for the first time in Yang 2018.

In this embodiment, Adam proposed by Zeiler in 2012 was used to optimize all trainable parameters; the dimension of the input vector and the hidden state is set to be 100 or 128, the number of capsules in a part of connected capsule layers is 30, and the characteristic length is 100; the dimension of the category capsule layer is set to 16, besides, in order to reduce the memory and time overhead, the weight in the capsule network is set to be shared, the threshold of the partial connection routing algorithm is set to be 0.05, and the network model of the invention can refer to fig. 4.

The final experimental results are given in table 1 below:

TABLE 1 Classification accuracy of different models under various data sets

Referring to table 1, the adopted evaluation index is accuracy (accuracycacy), and it can be seen that in 5 data sets, the networks in the present invention are all the best, and especially, the accuracy on Yahoo and Amaz-F data sets is improved by 0.9 and 0.5 respectively compared with the best CNN model, because the average length of the texts of the two data sets is relatively long and the number of target classes is relatively large, such texts contain a large amount of complex grammatical feature information, and only models with strong feature learning ability can obtain good classification effect; the network of the invention has good effect on the data sets; on the other hand, Mul-capsules are a variant of the neural network in the invention, which does not contain partial connection routes, and from the aspect of data, the complete neural network is improved by about 0.3 percent on each data set compared with the Mul-capsules model, which shows that the connections discarded by the network in the invention are just connections which can deteriorate the effect of the neural network, namely, the connections are intuitively understood as redundant information connections between the sub-capsules and the parent capsules.

In addition, table 1 also performs experiments on the networks of the present invention, and from data, the feature learning capability of the networks of the present invention on the text classification task is far better than that of the CNN and RNN models, while the present invention introduces multi-scale feature attention in the networks of the present invention, and according to table 1, the classification accuracy of the networks of the present invention on 5 data sets is all greater than that of the networks of the present invention, that is, the networks of the present invention have feature learning capability far exceeding that of other networks of models of the networks of the present invention.

On the other hand, in this embodiment, an experiment is also performed on the routing method in embodiment 1, see the Mul-capsules model in table 1, which is a capsule network model that does not use the partial connectivity routing algorithm in embodiment 1, and it can be seen from the result data that the accuracy of the routing method in embodiment 1 is higher than that of the Mul-capsules model.

Preferably, the present embodiment further performs an experiment on the parameter scale of the neural network in the present invention, and selects the following models for comparison with the network in the present invention, where the first model is capsule-B proposed by Yang in 2018, and it utilizes multi-scale multivariate syntactic characteristics, and the convolution window size is 3, 4, 5; the second and third models were tested to extract single-scale multivariate syntactic features, with convolution window sizes of 3 and 2, respectively, as shown in table 2:

TABLE 2 comparison of different model parameter scales

It can be seen that the multivariate syntactic characteristics utilized in the present invention are the most abundant but the parameters are the least, unlike the classical text classification capsule network, the network of the present invention does not need to employ several similar complete capsule network layers to obtain the comprehensive multivariate syntactic characteristics, because it captures the accurate text syntactic information by using the multiscale characteristic attention before the text characteristic representation is input to the capsule network, and obtains the richer multivariate syntactic characteristics with fewer parameters; for example, the capsule-B uses 24M parameters to obtain the 3, 4, 5-element grammatical features of the text, while the network uses 2M parameters to obtain the 1, 3, 5, 7, 9-element grammatical features of the text; similarly, though Ren2018 reduces parameters by using a compression coding form, it only uses 2-element grammatical features of the text, so that the learning capability of the text features is far lower than that of the invention; on the other hand, the network of the present invention sets a smaller number of capsules than other models because the text features input to the capsule layer are already very refined and accurate after passing through the multi-scale feature attention layer, so theoretically it is sufficient to extract the underlying low-level features with fewer capsules.

While the present invention has been described with reference to the embodiments shown in the drawings, the present invention is not limited to the embodiments, which are illustrative and not restrictive, and it will be apparent to those skilled in the art that various changes and modifications can be made therein without departing from the spirit and scope of the invention as defined in the appended claims.

Claims

1. A capsule neural network integrated with multi-scale feature attention is characterized by comprising a bidirectional circulation layer, a multi-scale feature attention layer, a part of bidirectional circulation layer connected with a capsule layer and a category capsule layer; wherein the content of the first and second substances,

the multi-scale feature attention layer is connected with the bidirectional circulation layer and is used for obtaining multi-element grammatical features through convolution operation of received target text global feature representations and weighting the multi-element grammatical features of each word under different scales;

the sub-capsule unit receives the weighted multi-element grammatical features and transmits information to the father capsule unit through a route, and finally characteristic representation of the father capsule is obtained;

the category capsule layer is connected with part of the connecting capsule layer and comprises at least 2 category capsules, and each category capsule corresponds to one category and is used for expressing the probability that the target text belongs to a certain category.

2. The capsule neural network of claim 1, the multi-scale feature attention layer comprising: the system comprises a convolution network unit, a convolution characteristic aggregation unit and a scale characteristic weighting unit;

3. The capsule neural network of claim 2, wherein the convolutional network unit derives the syntactic characteristic representation by:

H＝[z₁；z₂...；z_m]；

i is the ith word of the target text and l is the convolution windowThe size of (a) is (b),

for the grammatical feature representation of the i-th word under a convolution window of size l, z_lExpressing grammatical features of all words of the target text under a convolution window with the size of l, and expressing the grammatical features of all words of the target text through convolution windows with different sizes of m;

the convolution feature aggregation unit obtains a scalar representation by:

to apply a convolution operation under the ith word,

a_i＝softmax(MLP(s_i))；

wherein s is_iTo be an aggregate feature representation of the ith word,

for attention weight of the ith word under l-gram characteristics,

4. The capsule neural network fused with multi-scale feature attention of claim 3, wherein the routing method for routing the data to the parent capsule unit is as follows:

5. The capsule neural network integrating multi-scale feature attention of claim 4, wherein the step of calculating the coupling coefficient of the dynamic routing algorithm is specifically calculated as:

wherein, b_ijIs the initial coupling coefficient.

6. The capsule neural network integrating multi-scale feature attention of claim 5, wherein the step of scaling the routed parent capsule representation by the parent capsule layer comprises:

7. A text classification method using the capsule neural network of claims 1-6, comprising the steps of:

8. The method according to claim 7, wherein the step of encoding the sequence of word vectors of the target text to obtain a global feature representation of the target text specifically comprises:

is an RNN encoder;

9. The method according to claim 8, wherein the step of obtaining the multivariate syntactic characteristics by convolving the global characteristic representation with a window, and weighting the multivariate syntactic characteristics of each word at different scales to obtain a weighted representation of the target text specifically comprises:

10. The method of claim 9, wherein the category capsule layer comprises at least 2 category capsules, each category capsule corresponding to a category, the category capsule layer receiving a characterization of a parent capsule, each category capsule representing a probability that the target text belongs to the corresponding category.