CN111144094A

CN111144094A - Text classification method based on CNN and Bi-GRU

Info

Publication number: CN111144094A
Application number: CN201911247824.8A
Authority: CN
Inventors: 姬少培; 颜亮; 董贵山; 刘栋
Original assignee: CETC 30 Research Institute
Current assignee: CETC 30 Research Institute
Priority date: 2019-12-09
Filing date: 2019-12-09
Publication date: 2020-05-12

Abstract

The invention discloses a text classification method based on CNN and Bi-GRU, which comprises the following steps: performing convolutional neural network modeling on text data to obtain a first text characteristic expression containing local hidden information; step two, performing Bi-GRU neural network modeling on the text data to obtain a second text characteristic expression containing sequence information of the whole sentence in two directions; and step three, performing feature fusion on the two text feature expressions obtained in the step one and the step two, and classifying by using an LSSVM classifier. By utilizing the method of the invention, not only the local characteristics and the context semantic information of the sentence are captured, but also more diversified and abundant characteristic expressions of the text are obtained by fusing two different text characteristic expressions, thereby further improving the accuracy of classification.

Description

Text classification method based on CNN and Bi-GRU

Technical Field

The invention relates to a text classification method based on CNN and Bi-GRU.

Background

The text classification technology is an important basis for information retrieval and text mining, and the main task of the text classification technology is to determine the category of the text according to the content of the text under a preset category label set. Text classification has wide application in the fields of natural language processing and understanding, information organization and management, content information filtering and the like. There are many common methods for text classification, such as dictionary and rule based unsupervised methods, machine learning based supervised methods. The dictionary-based method utilizes an authoritative dictionary, constructs features manually according to experience, and is high in model accuracy, but low in model recall rate due to low dictionary coverage rate. The machine learning method is based on a machine learning supervised method, and modeling is carried out by utilizing machine learning methods such as a maximum entropy model, naive Bayes, KNN and the like. These machine learning methods are mature, the theoretical basis is firm, the application is wide, the classification effect is good, but the method is limited by the text scale. Because the machine learning-based method requires the text with class labels as the input of training, and the labels of the text take a lot of manpower and material resources, the data size is generally small. Recently, the deep learning-based method attracts attention of broad scholars. The deep learning-based method only needs a small amount of marked texts and a large amount of unmarked texts. Different from the traditional machine learning method, the deep learning-based method does not need to manually construct features, but automatically learns the features through a hierarchical structure, the features at the high level are constructed in different combination modes of the features at the bottom level, and the obtained features have richer abstract expression capability.

Methods of obtaining sentence vectors of an input text sentence can be divided into two categories, one of which is a method based on word vectors, and sentence vectors are constructed in different combination ways, which is called a combination method. The other method is to directly train sentences to obtain sentence vectors without word vectors, and is called a distribution method.

Different neural network structures may be used in the combinatorial approach to combine the sentence vectors, such as convolutional neural networks, recurrent neural networks, and the like. The convolutional neural network is a classical neural network structure, has the characteristics of local perception and parameter sharing, and can better capture local information. However, the common convolutional neural network sets a fixed filter and a pooling operation type, and the granularity of the captured local information is fixed, rigid and lacks diversity. The problem of gradient disappearance can occur in a cyclic neural network based on a time sequence. Therefore, to solve this problem, LSTM and GRU are proposed, which solve the long-term dependence problem by introducing a forgetting gate mechanism, and can better capture sequence information. However, the common recurrent neural network models the sequence information only in a single direction, and the text has no directionality, so that the captured sequence information is one-sided.

Convolutional Neural Networks (CNN) and Recurrent Neural Networks (RNN) are widely applied to natural language processing, but because natural languages have a front-back dependency on structures, text classification realized by only the convolutional neural networks ignores the context meaning of words, and the traditional recurrent neural networks have the problems of gradient disappearance or gradient explosion, so that the accuracy of text classification is limited.

Disclosure of Invention

In order to overcome the defects of the prior art, the invention provides a text classification method based on CNN and Bi-GRU. According to the method, rich feature expression of the text is obtained by using the CNN and Bi-GRU neural networks, a mature LSSVM classifier is used for replacing the last softmax layer of the neural network to serve as the text classifier, and the characteristics that abstract high-level feature expression can be obtained by deep learning and the advantages that a machine learning method is mature, the theoretical foundation is solid, the classification effect is good and the application is wide are combined. By utilizing the method of the invention, not only the local characteristics and the context semantic information of the sentence are captured, but also more diversified and abundant characteristic expressions of the text are obtained by fusing two different text characteristic expressions, thereby further improving the accuracy of classification.

The technical scheme adopted by the invention for solving the technical problems is as follows: a text classification method based on CNN and Bi-GRU comprises the following steps:

performing convolutional neural network modeling on text data to obtain a first text characteristic expression containing local hidden information;

step two, performing Bi-GRU neural network modeling on the text data to obtain a second text characteristic expression containing sequence information of the whole sentence in two directions;

and step three, performing feature fusion on the two text feature expressions obtained in the step one and the step two, and classifying by using an LSSVM classifier.

Compared with the prior art, the invention has the following positive effects:

(1) the convolution neural network used by the invention obtains the text characteristic expression containing the local hidden information, and can capture more comprehensive local information.

(2) The invention uses Bi-GRU recurrent neural network to obtain the text characteristic expression of the sequence information of the whole sentence in two directions, and can capture more sufficient sequence information.

(3) The invention uses the convolutional neural network and the cyclic neural network to obtain rich feature expression of the text, uses a mature LSSVM classifier to replace the last softmax layer of the neural network and is used as a text classifier, and combines the characteristic that deep learning can obtain abstract high-level feature expression and the advantages of mature machine learning method, firm theoretical basis, good classification effect and wide application.

(4) The invention fuses two different text feature expressions through a feature fusion mode to obtain more diversified and abundant feature expressions of texts.

Drawings

The invention will now be described, by way of example, with reference to the accompanying drawings, in which:

FIG. 1 is a schematic diagram of a text classification algorithm framework;

FIG. 2 is a basic structure diagram of a Bi-GRU recurrent neural network.

Detailed Description

A text classification method based on CNN and Bi-GRU is provided, the framework is shown in figure 1. The method mainly obtains two abstract high-level feature expressions of a text through two neural network structures, namely a convolutional neural network and a bidirectional GRU (generalized regression unit) recurrent neural network, and classifies the text by using a classifier through a feature fusion mode.

The method comprises the following steps:

1) and modeling the text in a multi-angle convolutional neural network, wherein the modeling comprises different filter types and pooling types, and removing the last softmax layer to obtain the characteristic expression of the local hidden information. The method comprises the following specific steps:

1.1) establishing two different types of filters, wherein one type of filter is an integral filter, namely a filter matched with the whole word vector, and the other type of filter is a single-dimensional filter, namely the filter matched on each dimension of the word vector; suppose a sentence Input ∈ R_length×DimIs a sequence of length words, each word being represented by a Dim-dimensional word vector, Input_i∈R_DimRepresenting the ith word vector, Input, in the word sequence_i:jRepresenting the i to j connections comprising the j word vector,

representing the mth dimension of the ith word vector,

representing the ith through jth dimensions comprising the jth word vector; assume an integral filter F as a quadruple<ws,wf,bf,hf>Wherein ws is the width of the sliding window, wf is the element R_ws×DimThe weight vector of the filter F is used, bf belongs to R as bias, and hf is an activation function; when the filter F is applied to the Input word sequence Input, wf and each word vector window with length ws in the Input are subjected to inner product, bias bf is added, the activation function hf is applied, and the output vector out is obtained_F∈R_1+length-ws(ii) a Wherein the ith term out_F[i]＝hf(wf·Input_i:i+ws-1+ bf) where i ∈ [1,1+ len-ws)](ii) a Suppose a single-dimensional filter F^[m]Applied to the m-th dimension of the word vector, of<ws,wf_m,bf_m,hf_m>Tuple representation, where ws is the width of the sliding window, wf_m∈R_wsAs a filter F^[m]Weight vector of, bf_mTo be offset, hf_mIs an activation function; filter F^[m]Output vector of

Wherein item i

1.2) using different pooling operations on the convolution layer output vectors; let's assume that group (ws, posing, Input) is the operation object of the convolution operation and pooling operation of the sliding window width ws for the Input sentence Input, where posing ∈ { max, min, mean }; assume that for group (ws, posing, Input), its convolutional layer consists of Num filters, wherein the filters include both monolithic filters and single-dimensional filters; assume that the output vector of the pooling layer is oG ∈ R^numWherein item j

1.3) establishing a multi-angle convolutional neural network, and inputting a text for training;

1.4) after the training is finished, removing the last softmax layer, inputting a text, and outputting the text as a first feature expression of the text;

2) and carrying out Bi-GRU neural network modeling on the text data. The method comprises the following specific steps:

2.1) building a Bi-GRU recurrent neural network model structure (as shown in figure 2), respectively training a GRU recurrent neural network layer for the forward and backward sequences of an input text, connecting the GRU recurrent neural networks with the same output layer, and training the text by using the structure.

2.2) after the training is finished, removing the last softmax layer, inputting a text, and outputting the text as a second feature expression of the text.

3) And performing feature fusion on the two feature expressions of the text, and classifying by using an LSSVM classifier.

The method comprises the following specific steps:

and 3.1) performing feature fusion on the two text feature expressions by using a connected fusion mode. Suppose Feature₁Feature for the expression of the first Feature₂For the expression of the second Feature, Feature₁⊕Feature₂For the fused feature expression, ⊕ represents the vector direct join operation.

And 3.2) inputting the fused feature expression by using an LSSVM classifier, and training to obtain a classification model.

In summary, after the method is adopted, the invention provides a new classification idea for the text classification method based on deep learning and machine learning: and obtaining two text abstract high-level feature expressions by using a convolutional neural network and a Bi-GRU recurrent neural network, and establishing an LSSVM classification model by a feature fusion mode to obtain a text classification result. The method can capture different text high-level feature expressions, obtains richer feature expressions of the text through feature fusion, establishes a classification model, improves the classification effect, and is worthy of popularization.

Claims

1. A text classification method based on CNN and Bi-GRU is characterized in that: the method comprises the following steps:

2. The text classification method based on CNN and Bi-GRU as claimed in claim 1, wherein: step one, the method for performing convolutional neural network modeling on text data comprises the following steps:

1.1) establishing two different types of filters, wherein one type of filter is an integral filter and is used for matching the whole word vector, and the other type of filter is a single-dimensional filter and is used for matching in each dimension of the word vector;

1.2) using different pooling operations on the convolution layer output vectors;

1.4) after the training is finished, removing the last softmax layer, inputting a text, and outputting the text as a first text characteristic expression.

3. The text classification method based on CNN and Bi-GRU as claimed in claim 2, wherein: the method for matching the whole word vector by the integral filter comprises the following steps:

for sentence Input ∈ R_length×DimIs a sequence of length words, each word being represented by a Dim-dimensional word vector, Input_i∈R_DimRepresenting the ith word vector, Input, in the word sequence_i:jRepresenting the i to j connections comprising the j word vector,

setting an integral filter F as a quadruple<ws,wf,bf,hf>Wherein ws is the width of the sliding window, wf is the element R_ws×DimThe weight vector of the filter F is used, bf belongs to R as bias, and hf is an activation function;

when the filter F is applied to the Input word sequence Input, wf and each word vector window with length ws in the Input are subjected to inner product, bias bf is added, the activation function hf is applied, and the output vector out is obtained_F∈R_1+length-ws(ii) a Wherein the ith term out_F[i]＝hf(wf·Input_i:i+ws-1+ bf) where i ∈ [1,1+ len-ws)]。

4. The text classification method based on CNN and Bi-GRU according to claim 3, wherein: the method for matching each dimension word vector by the single-dimension filter comprises the following steps:

setting a single-dimensional filter F^[m]By<ws,wf_m,bf_m,hf_m>Tuple representation, where ws is the width of the sliding window, wf_m∈R_wsAs a filter F^[m]Weight vector of, bf_mTo be offset, hf_mIs an activation function;

when the filter F^[m]When the method is applied to the m-th dimension of the word vector, an output vector is obtained

Wherein item i

Wherein the content of the first and second substances,

representing the mth dimension of the ith word vector,

representing the ith through jth dimensions that include the jth word vector.

5. The text classification method based on CNN and Bi-GRU according to claim 4, wherein: the method for using different pooling operations for convolutional layer output vectors is as follows:

setting group (ws, posing, Input) as an operation object for performing convolution operation and pooling operation of the sliding window width ws on an Input sentence, wherein posing belongs to { max, min, mean }; the convolutional layer of group (ws, posing, Input) has Num filters, and the output vector of the pooling layer is oG ∈ R^numWherein item j

6. The text classification method based on CNN and Bi-GRU as claimed in claim 2, wherein: step two, the method for modeling the Bi-GRU neural network for the text data comprises the following steps:

2.1) building a Bi-GRU (bidirectional-generalized regression) recurrent neural network model structure, respectively training a GRU recurrent neural network for forward and backward sequences of an input text, connecting the GRU recurrent neural network with the same output layer, and training the text by using the structure;

and 2.2) after the training is finished, removing the last softmax layer, inputting a text, and outputting a second text characteristic expression.

7. The text classification method based on CNN and Bi-GRU as claimed in claim 1, wherein: step three, the two texts are alignedThe text feature expression after feature fusion of the feature expression is as follows:

wherein

Representing vector direct join operations, Feature₁Feature for the first text Feature expression₂Is a second text feature expression.

8. The text classification method based on CNN and Bi-GRU as claimed in claim 7, wherein: step three, the method for classifying by using the LSSVM classifier comprises the following steps: using LSSVM classifier to replace the last softmax layer of the neural network as a text classifier, and inputting

And training to obtain a classification model.