CN112836056A

CN112836056A - Text classification method based on network feature fusion

Info

Publication number: CN112836056A
Application number: CN202110266934.XA
Authority: CN
Inventors: 覃晓; 廖兆琪; 元昌安; 乔少杰
Original assignee: Chengdu University of Information Technology; Nanning Normal University
Current assignee: Chengdu University of Information Technology; Nanning Normal University
Priority date: 2021-03-12
Filing date: 2021-03-12
Publication date: 2021-05-25
Anticipated expiration: 2041-03-12
Also published as: CN112836056B

Abstract

The invention discloses a text classification method based on network feature fusion, and provides a model based on Res2Net and BilSTM network fusion aiming at the problems that the traditional convolutional neural network cannot pay attention to text context meaning and short-term memory and gradient disappearance exist in the traditional recurrent neural network, so that the problems of the network can be effectively solved, and texts can be better classified. The method utilizes the multi-scale residual error network Res2Net to extract local features of the text, simultaneously extracts context features of the text by combining the bidirectional long-time and short-time memory network BilSTM, and simultaneously predicts the relation between the labels by adding a traditional machine learning method, namely conditional random field CRF, behind the BilSTM network layer, thereby achieving the effect of correctly classifying the text. The method can effectively improve the accuracy of text classification through fusion under the condition of not increasing network parameters too much.

Description

Text classification method based on network feature fusion

Technical Field

The invention belongs to the technical field of deep learning and natural language processing, and particularly relates to a design of a text classification method based on network feature fusion.

Background

With the large-scale use of the internet in today's society, information resources on the network are growing at an exponential rate, and among various forms of information, unstructured text information is one of the most important information resources. In various massive text information, how to acquire the most effective information resources is a problem to be solved urgently, and text classification can better help people to manage and classify the complicated text information by using an efficient and simple algorithm or a model, so that the required information can be acquired quickly and accurately. However, the traditional machine learning text classification algorithm needs a large amount of preprocessing operations such as manual design features, and the complexity is improved. Based on the method, the text features are extracted by utilizing the deep learning model, the text classification speed can be obviously improved, a large amount of manual preprocessing is not needed, and the better classification effect than the traditional text classification is achieved.

In a plurality of deep learning network models, the traditional convolutional neural network has the capability of processing the problem of high-dimensional and nonlinear mapping relation, and can use the preprocessed word vector as input and realize sentence-level classification by using the convolutional neural network. However, the traditional convolutional neural network focuses more on the local features of the input vector, ignores the context meaning of the word, and thus has an influence on the accuracy of text classification. From a context-aware point of view, the problem can be solved by using a recurrent neural network. The traditional recurrent neural network considers the previous output for the current output, and forms a memory-like function aiming at the time series problem, which is particularly shown in the fact that the recurrent neural network applies the network state information at the previous moment to the network state at the next moment. However, although the context of the text is noticed by memorizing, the conventional recurrent neural network only concerns the network state at the previous moment and involves a large number of derivative operations on the time series during the solution, so that the problems of incapability of memorizing information on the long-time series, disappearance of gradients and the like are caused.

Disclosure of Invention

The invention aims to solve the problems that the context meaning of a text cannot be concerned by a traditional convolutional neural network and short-time memory and gradient of the traditional convolutional neural network disappear, provides a text classification method based on network feature fusion, adopts a model based on Res2Net (multi-scale residual error network) and BilSTM (bidirectional long-time memory network) fusion, can effectively solve the problems of the network and better classifies the text.

The technical scheme of the invention is as follows: a text classification method based on network feature fusion comprises the following steps:

s1, preprocessing the text to be classified, and processing the preprocessed text data set into a word vector set by a word vector representation method.

And S2, splicing the word vector set into a matrix, inputting the matrix into a Res2Net network for training, and outputting to obtain the local features of the text data set.

And S3, inputting the word vector set into a BilSTM network for training, and outputting to obtain the context characteristics of the text data set.

S4, scoring the context characteristics of the text data set by adopting a CRF conditional random field scoring mechanism, and selecting the label sequence set with the highest score as the optimal context characteristic sequence set of the text data set.

And S5, splicing and fusing the local features and the optimal context features of the text data set to obtain fused features.

And S6, inputting the fusion features into a softmax classifier for classification to obtain a text classification result.

Further, the method for preprocessing the text to be classified in step S1 specifically includes: removing useless symbols, keeping the text dataset only containing Chinese, and removing stop words.

Further, the Res2Net network in step S2 includes a first 1 × 1 convolutional layer, a 3 × 3 convolutional layer, and a second 1 × 1 convolutional layer connected in sequence, each convolutional layer includes a relu activation function, and the relu activation functions of the second 1 × 1 convolutional layers are connected with a residual block before.

The number of channels of the first 1 × 1 convolutional layer is n, the feature map of the input matrix is equally divided into s groups of features according to the number of channels, if the number of channels of each group of features is w, n is s × w, and each group of features after being equally divided is recorded as x_iWhere i ∈ {1,2,..., s }.

The 3 x 3 convolutional layer for each set of the features x after the equalization_iExcept that the first set of features is not subjected to convolution operation, all the other sets of features are subjected to convolution operation k correspondingly_i(. to) note y_iFor convolution operations k_iThe output of (c) then starts with the second set of features, each convolution operation k_iBefore (v), the output y of the previous group will be_i-1With the current feature x_iResidual concatenation is performed and used as convolution operation k_iInput of (·) up to the last set of features.

The second 1 × 1 convolutional layer outputs each group of 3 × 3 convolutional layers y_iAnd performing channel splicing, fusing the multi-scale features and outputting to obtain the local features of the text data set.

Further, the objective function of the Res2Net network in step S2 is:

wherein x_iRepresenting the equipartition i-th set of features, k_i(. h) represents a convolution operation on the i-th set of features, y_iRepresenting the output of the i-th set of features after the convolution operation.

Further, the basic expression of the BilSTM network in step S3 is:

wherein

Representing the forward LSTM current layer hidden state,

representing the forward LSTM input gate weight matrix,

representing the forward LSTM current input cell state weight matrix,

representing a hidden state of a layer above the forward LSTM,

representing the forward LSTM input cell bias term,

representing the backward LSTM current layer hidden state,

representing the backward LSTM input gate weight matrix,

representing the backward LSTM current input cell state weight matrix,

representing the hidden state of the next layer of backward LSTM,

representing backward LSTM input cell bias terms, U representing forward backward output cell stitching matrix, c representing total output cell bias terms, x_tRepresenting the input value of the hidden layer of the BilSTM network, and f (-) representing the calculation of the hidden layer of the BilSTM networkActivation function, g (-) represents the activation function when computing the output layer of the BilsTM network, y_tRepresenting the output value of the BiLSTM network.

Further, in step S4, the formula for scoring the context features of the text data set by using the CRF conditional random field scoring mechanism is as follows:

wherein S (X, y) represents the score of the input word vector sequence X of the BilSTM network corresponding to the output tag sequence y,

indicates the ith label tag_iTransfer to the i +1 st tag_i+1The transition probability of (a) is,

representing the ith word v in the input word vector sequence X_iMapping to ith tag_iThe probability is quantized.

Normalizing the scores of a plurality of output label sequences y of the input word vector sequence X to obtain:

where p (Y | X) denotes the score for a number of output tag sequences Y of the input word vector sequence X, Y-denotes a particular output tag sequence belonging to all possible output tag sequences, Y_xRepresenting all possible output label sequences, and optimizing a log-likelihood function to obtain the following formula:

further, in step S5, the local feature and the optimal context feature of the text data set are merged and fused by using a concat () method in the tensoflow frame, so as to obtain a fused feature.

Further, step S6 is specifically: and storing the fusion features as input of a first full-connection layer, introducing a dropout mechanism between the first full-connection layer and a second full-connection layer, giving up part of trained parameters each time of iteration, enabling weight updating not to depend on part of inherent features, preventing overfitting, and finally inputting iteration results into a softmax classifier for classification to obtain text classification results.

Further, the probability P (y) that the softmax classifier classifies the text x into the category j in step S6⁽ⁱ⁾＝j|x⁽ⁱ⁾(ii) a θ) is:

wherein x⁽ⁱ⁾Representing the input, y, of each category⁽ⁱ⁾Representing the probability value of each class j, theta represents the training model parameters, with the aim of maximizing the likelihood function exp (-) and,

represents the training parameters when training each class j in order to maximize the likelihood function exp (-) and k represents the number of training model parameters θ.

The invention has the beneficial effects that:

(1) the invention utilizes the method of multi-network feature fusion to extract the features of the text in all aspects, overcomes the defects of the traditional single network in extracting the text features and improves the precision of text classification.

(2) The method adopts the Res2Net residual error network to extract the local features of the text data set, and compared with the traditional CNN network, the network can better extract the local features of the text through multi-scale feature learning.

(3) Compared with the traditional RNN and LSTM networks, the method can extract the context characteristics of the text data set by adopting the BilSTM network, and simultaneously, notices the influence of information after the current word on the whole sentence, so that the text context characteristics are more accurately extracted.

(4) The invention also adopts the traditional machine learning method, namely a CRF conditional random field and a softmax classifier. The CRF conditional random field can score vectors output by the BilSTM network, and reorder sentences to obtain a more reasonable ordered text; and the softmax classifier scores each classified sample, calculates a probability value through a function, and determines the category of the text according to the final probability value.

(5) The invention solves the problem of low text extraction precision of the traditional single network by utilizing the feature fusion of the deep learning network and combining the traditional machine learning method, and provides a powerful basis for better classifying texts.

Drawings

Fig. 1 is a flowchart of a text classification method based on network feature fusion according to an embodiment of the present invention.

Fig. 2 is a general architecture diagram of a text classification method based on network feature fusion according to an embodiment of the present invention.

Detailed Description

Exemplary embodiments of the present invention will now be described in detail with reference to the accompanying drawings. It is to be understood that the embodiments shown and described in the drawings are merely exemplary and are intended to illustrate the principles and spirit of the invention, not to limit the scope of the invention.

The embodiment of the invention provides a text classification method based on network feature fusion, which comprises the following steps S1-S6 as shown in fig. 1 and fig. 2 together:

In the embodiment of the present invention, the method for preprocessing the text to be classified specifically includes: removing useless symbols, keeping the text dataset only containing Chinese, and removing stop words.

As shown in fig. 2, the Res2Net network includes a first 1 × 1 convolutional layer, a 3 × 3 convolutional layer, and a second 1 × 1 convolutional layer connected in sequence, each convolutional layer includes a relu activation function, and the relu activation functions of the second 1 × 1 convolutional layers are connected with a residual block in advance.

The number of channels of the first 1 × 1 convolutional layer is n, the feature map of the input matrix is equally divided into s groups of features according to the number of channels, if the number of channels of each group of features is w, n is s × w, and each group of features after being equally divided is recorded as x_iWhere i ∈ {1,2,..., s }. As shown in fig. 2, s is 4 in the embodiment of the present invention.

In summary, the objective function of Res2Net network is:

For the traditional LSTM network, it can only learn the information before the current word, but cannot utilize the information after the current word, so the embodiment of the present invention uses the BiLSTM-bidirectional LSTM network to extract the information after the current word.

As shown in fig. 2, the basic expression of the BiLSTM network is:

wherein

Representing the forward LSTM current layer hidden state,

representing the forward LSTM input gate weight matrix,

representing the forward LSTM current input cell state weight matrix,

representing a hidden state of a layer above the forward LSTM,

representing the forward LSTM input cell bias term,

representing the backward LSTM current layer hidden state,

representing the backward LSTM input gate weight matrix,

representing the backward LSTM current input cell state weight matrix,

representing the hidden state of the next layer of backward LSTM,

representing backward LSTM input cell bias terms, U representing forward backward output cell stitching matrix, c representing total output cell bias terms, x_tRepresenting the input value of a BilSTM network hidden layer, f (-) representing the activation function when the BilSTM network hidden layer is calculated, g (-) representing the activation function when the BilSTM network output layer is calculated, y (-) representing the activation function when the BilSTM network hidden layer is calculated_tRepresenting the output value of the BiLSTM network.

In the embodiment of the invention, a formula for scoring the context characteristics of the text data set by adopting a CRF conditional random field scoring mechanism is as follows:

representing the ith word v in the input word vector sequence X_iMapping to ith tag_iScalar quantity ofAnd (6) probability is normalized.

where p (y | X) represents the score for a plurality of output tag sequences y of the input word vector sequence X, y^～Indicating a particular output tag sequence, Y, belonging to all possible output tag sequences_xRepresenting all possible output label sequences, and optimizing a log-likelihood function to obtain the following formula:

since the label sequence output by the BiLSTM network is based on the maximum probability value obtained by softmax, and the word order problem of the label is not considered, so that the output word order is unreasonable, in step S4, the maximum likelihood probability log (p (y | X)) of p (y | X) is obtained, and by this probability, the CRF considers the sequentiality between the output label sequences, and adds a constraint rule to the last predicted label to make the predicted label word order reasonable.

In the embodiment of the invention, the local feature and the optimal context feature of the text data set are spliced and fused by adopting a concat () method in a tensorflow framework to obtain the fusion feature.

In the embodiment of the invention, the fusion characteristics are stored and used as the input of the first full connection layer, a dropout mechanism is introduced between the first full connection layer and the second full connection layer, part of trained parameters are abandoned each time iteration is carried out, so that weight updating does not depend on part of inherent characteristics any more, overfitting is prevented, and finally the iteration result is input into a softmax classifier for classification to obtain a text classification result.

In the embodiment of the invention, probability P (y) of classifying text x into category j by softmax classifier⁽ⁱ⁾＝j|x⁽ⁱ⁾(ii) a θ) is:

It will be appreciated by those of ordinary skill in the art that the embodiments described herein are intended to assist the reader in understanding the principles of the invention and are to be construed as being without limitation to such specifically recited embodiments and examples. Those skilled in the art can make various other specific changes and combinations based on the teachings of the present invention without departing from the spirit of the invention, and these changes and combinations are within the scope of the invention.

Claims

1. A text classification method based on network feature fusion is characterized by comprising the following steps:

s1, preprocessing the text to be classified, and processing the preprocessed text data set into a word vector set by a word vector representation method;

s2, splicing the word vector set into a matrix, inputting the matrix into a Res2Net network for training, and outputting to obtain local characteristics of the text data set;

s3, inputting the word vector set into a BilSTM network for training, and outputting to obtain the context characteristics of the text data set;

s4, scoring the context characteristics of the text data set by adopting a CRF conditional random field scoring mechanism, and selecting the label sequence set with the highest score as the optimal context characteristic sequence set of the text data set;

s5, splicing and fusing the local features and the optimal context features of the text data set to obtain fused features;

2. The method for classifying texts according to claim 1, wherein the method for preprocessing the texts to be classified in step S1 specifically comprises: removing useless symbols, keeping the text dataset only containing Chinese, and removing stop words.

3. The text classification method according to claim 1, wherein the Res2Net network in step S2 includes a first 1 x 1 convolutional layer, a 3 x 3 convolutional layer, and a second 1 x 1 convolutional layer connected in sequence, each convolutional layer includes a relu activation function, and the relu activation functions of the second 1 x 1 convolutional layers are connected by a residual block before;

the number of channels of the first 1 × 1 convolutional layer is n, the feature map of the input matrix is equally divided into s groups of features according to the number of channels, if the number of channels of each group of features is w, n is s × w, and each group of equally divided features is denoted as x_iWherein i ∈ {1,2,..., s };

the 3 x 3 convolutional layer for each set of features x after equipartition_iExcept that the first set of features is not subjected to convolution operation, all the other sets of features are subjected to convolution operation k correspondingly_i(. to) note y_iFor convolution operations k_iThe output of (c) then starts with the second set of features, each convolution operation k_iBefore (v), the output y of the previous group will be_i-1With the current feature x_iResidual concatenation is performed and used as convolution operation k_iInput of (·) up to the last set of features;

the second 1 x 1 convolutional layer outputs y for each group of 3 x 3 convolutional layers_iAnd performing channel splicing, fusing the multi-scale features and outputting to obtain the local features of the text data set.

4. The text classification method according to claim 3, wherein the objective function of the Res2Net network in the step S2 is:

5. The text classification method according to claim 1, wherein the basic expression of the BilSTM network in step S3 is as follows:

wherein

Representing the forward LSTM current layer hidden state,

representing the forward LSTM input gate weight matrix,

representing the forward LSTM current input cell state weight matrix,

representing a hidden state of a layer above the forward LSTM,

representing the forward LSTM input cell bias term,

representing the backward LSTM current layer hidden state,

representing the backward LSTM input gate weight matrix,

representing the backward LSTM current input cell state weight matrix,

representing the hidden state of the next layer of backward LSTM,

6. The method for classifying text according to claim 1, wherein the formula for scoring the context characteristics of the text data set by using the CRF conditional random field scoring mechanism in step S4 is as follows:

representing the ith word v in the input word vector sequence X_iMapping to ith tag_iThe scalar probability of (2);

7. the method for classifying text according to claim 1, wherein in step S5, a concat () method in a tensoflow framework is used to splice and fuse the local features and the best context features of the text data set, so as to obtain a fused feature.

8. The text classification method according to claim 1, wherein the step S6 is specifically: and storing the fusion features as input of a first full-connection layer, introducing a dropout mechanism between the first full-connection layer and a second full-connection layer, giving up part of trained parameters each time of iteration, enabling weight updating not to depend on part of inherent features, preventing overfitting, and finally inputting iteration results into a softmax classifier for classification to obtain text classification results.

9. The text classification method according to claim 1, wherein the probability P (y) that the softmax classifier classifies the text x into the category j in the step S6⁽ⁱ⁾＝j|x⁽ⁱ⁾(ii) a θ) is: