CN111144094A - Text classification method based on CNN and Bi-GRU - Google Patents

Text classification method based on CNN and Bi-GRU Download PDF

Info

Publication number
CN111144094A
CN111144094A CN201911247824.8A CN201911247824A CN111144094A CN 111144094 A CN111144094 A CN 111144094A CN 201911247824 A CN201911247824 A CN 201911247824A CN 111144094 A CN111144094 A CN 111144094A
Authority
CN
China
Prior art keywords
text
gru
neural network
input
filter
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911247824.8A
Other languages
Chinese (zh)
Inventor
姬少培
颜亮
董贵山
刘栋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
CETC 30 Research Institute
Original Assignee
CETC 30 Research Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by CETC 30 Research Institute filed Critical CETC 30 Research Institute
Priority to CN201911247824.8A priority Critical patent/CN111144094A/en
Publication of CN111144094A publication Critical patent/CN111144094A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a text classification method based on CNN and Bi-GRU, which comprises the following steps: performing convolutional neural network modeling on text data to obtain a first text characteristic expression containing local hidden information; step two, performing Bi-GRU neural network modeling on the text data to obtain a second text characteristic expression containing sequence information of the whole sentence in two directions; and step three, performing feature fusion on the two text feature expressions obtained in the step one and the step two, and classifying by using an LSSVM classifier. By utilizing the method of the invention, not only the local characteristics and the context semantic information of the sentence are captured, but also more diversified and abundant characteristic expressions of the text are obtained by fusing two different text characteristic expressions, thereby further improving the accuracy of classification.

Description

Text classification method based on CNN and Bi-GRU
Technical Field
The invention relates to a text classification method based on CNN and Bi-GRU.
Background
The text classification technology is an important basis for information retrieval and text mining, and the main task of the text classification technology is to determine the category of the text according to the content of the text under a preset category label set. Text classification has wide application in the fields of natural language processing and understanding, information organization and management, content information filtering and the like. There are many common methods for text classification, such as dictionary and rule based unsupervised methods, machine learning based supervised methods. The dictionary-based method utilizes an authoritative dictionary, constructs features manually according to experience, and is high in model accuracy, but low in model recall rate due to low dictionary coverage rate. The machine learning method is based on a machine learning supervised method, and modeling is carried out by utilizing machine learning methods such as a maximum entropy model, naive Bayes, KNN and the like. These machine learning methods are mature, the theoretical basis is firm, the application is wide, the classification effect is good, but the method is limited by the text scale. Because the machine learning-based method requires the text with class labels as the input of training, and the labels of the text take a lot of manpower and material resources, the data size is generally small. Recently, the deep learning-based method attracts attention of broad scholars. The deep learning-based method only needs a small amount of marked texts and a large amount of unmarked texts. Different from the traditional machine learning method, the deep learning-based method does not need to manually construct features, but automatically learns the features through a hierarchical structure, the features at the high level are constructed in different combination modes of the features at the bottom level, and the obtained features have richer abstract expression capability.
Methods of obtaining sentence vectors of an input text sentence can be divided into two categories, one of which is a method based on word vectors, and sentence vectors are constructed in different combination ways, which is called a combination method. The other method is to directly train sentences to obtain sentence vectors without word vectors, and is called a distribution method.
Different neural network structures may be used in the combinatorial approach to combine the sentence vectors, such as convolutional neural networks, recurrent neural networks, and the like. The convolutional neural network is a classical neural network structure, has the characteristics of local perception and parameter sharing, and can better capture local information. However, the common convolutional neural network sets a fixed filter and a pooling operation type, and the granularity of the captured local information is fixed, rigid and lacks diversity. The problem of gradient disappearance can occur in a cyclic neural network based on a time sequence. Therefore, to solve this problem, LSTM and GRU are proposed, which solve the long-term dependence problem by introducing a forgetting gate mechanism, and can better capture sequence information. However, the common recurrent neural network models the sequence information only in a single direction, and the text has no directionality, so that the captured sequence information is one-sided.
Convolutional Neural Networks (CNN) and Recurrent Neural Networks (RNN) are widely applied to natural language processing, but because natural languages have a front-back dependency on structures, text classification realized by only the convolutional neural networks ignores the context meaning of words, and the traditional recurrent neural networks have the problems of gradient disappearance or gradient explosion, so that the accuracy of text classification is limited.
Disclosure of Invention
In order to overcome the defects of the prior art, the invention provides a text classification method based on CNN and Bi-GRU. According to the method, rich feature expression of the text is obtained by using the CNN and Bi-GRU neural networks, a mature LSSVM classifier is used for replacing the last softmax layer of the neural network to serve as the text classifier, and the characteristics that abstract high-level feature expression can be obtained by deep learning and the advantages that a machine learning method is mature, the theoretical foundation is solid, the classification effect is good and the application is wide are combined. By utilizing the method of the invention, not only the local characteristics and the context semantic information of the sentence are captured, but also more diversified and abundant characteristic expressions of the text are obtained by fusing two different text characteristic expressions, thereby further improving the accuracy of classification.
The technical scheme adopted by the invention for solving the technical problems is as follows: a text classification method based on CNN and Bi-GRU comprises the following steps:
performing convolutional neural network modeling on text data to obtain a first text characteristic expression containing local hidden information;
step two, performing Bi-GRU neural network modeling on the text data to obtain a second text characteristic expression containing sequence information of the whole sentence in two directions;
and step three, performing feature fusion on the two text feature expressions obtained in the step one and the step two, and classifying by using an LSSVM classifier.
Compared with the prior art, the invention has the following positive effects:
(1) the convolution neural network used by the invention obtains the text characteristic expression containing the local hidden information, and can capture more comprehensive local information.
(2) The invention uses Bi-GRU recurrent neural network to obtain the text characteristic expression of the sequence information of the whole sentence in two directions, and can capture more sufficient sequence information.
(3) The invention uses the convolutional neural network and the cyclic neural network to obtain rich feature expression of the text, uses a mature LSSVM classifier to replace the last softmax layer of the neural network and is used as a text classifier, and combines the characteristic that deep learning can obtain abstract high-level feature expression and the advantages of mature machine learning method, firm theoretical basis, good classification effect and wide application.
(4) The invention fuses two different text feature expressions through a feature fusion mode to obtain more diversified and abundant feature expressions of texts.
Drawings
The invention will now be described, by way of example, with reference to the accompanying drawings, in which:
FIG. 1 is a schematic diagram of a text classification algorithm framework;
FIG. 2 is a basic structure diagram of a Bi-GRU recurrent neural network.
Detailed Description
A text classification method based on CNN and Bi-GRU is provided, the framework is shown in figure 1. The method mainly obtains two abstract high-level feature expressions of a text through two neural network structures, namely a convolutional neural network and a bidirectional GRU (generalized regression unit) recurrent neural network, and classifies the text by using a classifier through a feature fusion mode.
The method comprises the following steps:
1) and modeling the text in a multi-angle convolutional neural network, wherein the modeling comprises different filter types and pooling types, and removing the last softmax layer to obtain the characteristic expression of the local hidden information. The method comprises the following specific steps:
1.1) establishing two different types of filters, wherein one type of filter is an integral filter, namely a filter matched with the whole word vector, and the other type of filter is a single-dimensional filter, namely the filter matched on each dimension of the word vector; suppose a sentence Input ∈ Rlength×DimIs a sequence of length words, each word being represented by a Dim-dimensional word vector, Inputi∈RDimRepresenting the ith word vector, Input, in the word sequencei:jRepresenting the i to j connections comprising the j word vector,
Figure BDA0002308188040000041
representing the mth dimension of the ith word vector,
Figure BDA0002308188040000042
representing the ith through jth dimensions comprising the jth word vector; assume an integral filter F as a quadruple<ws,wf,bf,hf>Wherein ws is the width of the sliding window, wf is the element Rws×DimThe weight vector of the filter F is used, bf belongs to R as bias, and hf is an activation function; when the filter F is applied to the Input word sequence Input, wf and each word vector window with length ws in the Input are subjected to inner product, bias bf is added, the activation function hf is applied, and the output vector out is obtainedF∈R1+length-ws(ii) a Wherein the ith term outF[i]=hf(wf·Inputi:i+ws-1+ bf) where i ∈ [1,1+ len-ws)](ii) a Suppose a single-dimensional filter F[m]Applied to the m-th dimension of the word vector, of<ws,wfm,bfm,hfm>Tuple representation, where ws is the width of the sliding window, wfm∈RwsAs a filter F[m]Weight vector of, bfmTo be offset, hfmIs an activation function; filter F[m]Output vector of
Figure BDA0002308188040000045
Wherein item i
Figure BDA0002308188040000043
1.2) using different pooling operations on the convolution layer output vectors; let's assume that group (ws, posing, Input) is the operation object of the convolution operation and pooling operation of the sliding window width ws for the Input sentence Input, where posing ∈ { max, min, mean }; assume that for group (ws, posing, Input), its convolutional layer consists of Num filters, wherein the filters include both monolithic filters and single-dimensional filters; assume that the output vector of the pooling layer is oG ∈ RnumWherein item j
Figure BDA0002308188040000044
1.3) establishing a multi-angle convolutional neural network, and inputting a text for training;
1.4) after the training is finished, removing the last softmax layer, inputting a text, and outputting the text as a first feature expression of the text;
2) and carrying out Bi-GRU neural network modeling on the text data. The method comprises the following specific steps:
2.1) building a Bi-GRU recurrent neural network model structure (as shown in figure 2), respectively training a GRU recurrent neural network layer for the forward and backward sequences of an input text, connecting the GRU recurrent neural networks with the same output layer, and training the text by using the structure.
2.2) after the training is finished, removing the last softmax layer, inputting a text, and outputting the text as a second feature expression of the text.
3) And performing feature fusion on the two feature expressions of the text, and classifying by using an LSSVM classifier.
The method comprises the following specific steps:
and 3.1) performing feature fusion on the two text feature expressions by using a connected fusion mode. Suppose Feature1Feature for the expression of the first Feature2For the expression of the second Feature, Feature1⊕Feature2For the fused feature expression, ⊕ represents the vector direct join operation.
And 3.2) inputting the fused feature expression by using an LSSVM classifier, and training to obtain a classification model.
In summary, after the method is adopted, the invention provides a new classification idea for the text classification method based on deep learning and machine learning: and obtaining two text abstract high-level feature expressions by using a convolutional neural network and a Bi-GRU recurrent neural network, and establishing an LSSVM classification model by a feature fusion mode to obtain a text classification result. The method can capture different text high-level feature expressions, obtains richer feature expressions of the text through feature fusion, establishes a classification model, improves the classification effect, and is worthy of popularization.

Claims (8)

1. A text classification method based on CNN and Bi-GRU is characterized in that: the method comprises the following steps:
performing convolutional neural network modeling on text data to obtain a first text characteristic expression containing local hidden information;
step two, performing Bi-GRU neural network modeling on the text data to obtain a second text characteristic expression containing sequence information of the whole sentence in two directions;
and step three, performing feature fusion on the two text feature expressions obtained in the step one and the step two, and classifying by using an LSSVM classifier.
2. The text classification method based on CNN and Bi-GRU as claimed in claim 1, wherein: step one, the method for performing convolutional neural network modeling on text data comprises the following steps:
1.1) establishing two different types of filters, wherein one type of filter is an integral filter and is used for matching the whole word vector, and the other type of filter is a single-dimensional filter and is used for matching in each dimension of the word vector;
1.2) using different pooling operations on the convolution layer output vectors;
1.3) establishing a multi-angle convolutional neural network, and inputting a text for training;
1.4) after the training is finished, removing the last softmax layer, inputting a text, and outputting the text as a first text characteristic expression.
3. The text classification method based on CNN and Bi-GRU as claimed in claim 2, wherein: the method for matching the whole word vector by the integral filter comprises the following steps:
for sentence Input ∈ Rlength×DimIs a sequence of length words, each word being represented by a Dim-dimensional word vector, Inputi∈RDimRepresenting the ith word vector, Input, in the word sequencei:jRepresenting the i to j connections comprising the j word vector,
setting an integral filter F as a quadruple<ws,wf,bf,hf>Wherein ws is the width of the sliding window, wf is the element Rws×DimThe weight vector of the filter F is used, bf belongs to R as bias, and hf is an activation function;
when the filter F is applied to the Input word sequence Input, wf and each word vector window with length ws in the Input are subjected to inner product, bias bf is added, the activation function hf is applied, and the output vector out is obtainedF∈R1+length-ws(ii) a Wherein the ith term outF[i]=hf(wf·Inputi:i+ws-1+ bf) where i ∈ [1,1+ len-ws)]。
4. The text classification method based on CNN and Bi-GRU according to claim 3, wherein: the method for matching each dimension word vector by the single-dimension filter comprises the following steps:
setting a single-dimensional filter F[m]By<ws,wfm,bfm,hfm>Tuple representation, where ws is the width of the sliding window, wfm∈RwsAs a filter F[m]Weight vector of, bfmTo be offset, hfmIs an activation function;
when the filter F[m]When the method is applied to the m-th dimension of the word vector, an output vector is obtained
Figure FDA0002308188030000021
Wherein item i
Figure FDA0002308188030000022
Wherein the content of the first and second substances,
Figure FDA0002308188030000023
representing the mth dimension of the ith word vector,
Figure FDA0002308188030000024
representing the ith through jth dimensions that include the jth word vector.
5. The text classification method based on CNN and Bi-GRU according to claim 4, wherein: the method for using different pooling operations for convolutional layer output vectors is as follows:
setting group (ws, posing, Input) as an operation object for performing convolution operation and pooling operation of the sliding window width ws on an Input sentence, wherein posing belongs to { max, min, mean }; the convolutional layer of group (ws, posing, Input) has Num filters, and the output vector of the pooling layer is oG ∈ RnumWherein item j
Figure FDA0002308188030000025
6. The text classification method based on CNN and Bi-GRU as claimed in claim 2, wherein: step two, the method for modeling the Bi-GRU neural network for the text data comprises the following steps:
2.1) building a Bi-GRU (bidirectional-generalized regression) recurrent neural network model structure, respectively training a GRU recurrent neural network for forward and backward sequences of an input text, connecting the GRU recurrent neural network with the same output layer, and training the text by using the structure;
and 2.2) after the training is finished, removing the last softmax layer, inputting a text, and outputting a second text characteristic expression.
7. The text classification method based on CNN and Bi-GRU as claimed in claim 1, wherein: step three, the two texts are alignedThe text feature expression after feature fusion of the feature expression is as follows:
Figure FDA0002308188030000031
wherein
Figure FDA0002308188030000032
Representing vector direct join operations, Feature1Feature for the first text Feature expression2Is a second text feature expression.
8. The text classification method based on CNN and Bi-GRU as claimed in claim 7, wherein: step three, the method for classifying by using the LSSVM classifier comprises the following steps: using LSSVM classifier to replace the last softmax layer of the neural network as a text classifier, and inputting
Figure FDA0002308188030000033
Figure FDA0002308188030000034
And training to obtain a classification model.
CN201911247824.8A 2019-12-09 2019-12-09 Text classification method based on CNN and Bi-GRU Pending CN111144094A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911247824.8A CN111144094A (en) 2019-12-09 2019-12-09 Text classification method based on CNN and Bi-GRU

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911247824.8A CN111144094A (en) 2019-12-09 2019-12-09 Text classification method based on CNN and Bi-GRU

Publications (1)

Publication Number Publication Date
CN111144094A true CN111144094A (en) 2020-05-12

Family

ID=70517943

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911247824.8A Pending CN111144094A (en) 2019-12-09 2019-12-09 Text classification method based on CNN and Bi-GRU

Country Status (1)

Country Link
CN (1) CN111144094A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111563159A (en) * 2020-07-16 2020-08-21 智者四海(北京)技术有限公司 Text sorting method and device
CN113590818A (en) * 2021-06-30 2021-11-02 中国电子科技集团公司第三十研究所 Government affair text data classification method based on integration of CNN, GRU and KNN

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107943967A (en) * 2017-11-28 2018-04-20 华南理工大学 Algorithm of documents categorization based on multi-angle convolutional neural networks and Recognition with Recurrent Neural Network

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107943967A (en) * 2017-11-28 2018-04-20 华南理工大学 Algorithm of documents categorization based on multi-angle convolutional neural networks and Recognition with Recurrent Neural Network

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
JINGREN ZHANG 等: "Feature Fusion Text Classification Model Combining CNN and BiGRU with Multi-Attention Mechanism", 《FUTURE INTERNET》 *
张姮妤 等: "基于 LS-SVM 的中文文本情感分类", 《科学技术创新》 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111563159A (en) * 2020-07-16 2020-08-21 智者四海(北京)技术有限公司 Text sorting method and device
CN111563159B (en) * 2020-07-16 2021-05-07 智者四海(北京)技术有限公司 Text sorting method and device
CN113590818A (en) * 2021-06-30 2021-11-02 中国电子科技集团公司第三十研究所 Government affair text data classification method based on integration of CNN, GRU and KNN
CN113590818B (en) * 2021-06-30 2023-05-26 中国电子科技集团公司第三十研究所 Government text data classification method based on integration of CNN (carbon fiber network), GRU (grid-like network) and KNN (K-nearest neighbor network)

Similar Documents

Publication Publication Date Title
CN107943967B (en) Text classification algorithm based on multi-angle convolutional neural network and cyclic neural network
CN108664632B (en) Text emotion classification algorithm based on convolutional neural network and attention mechanism
CN110929030B (en) Text abstract and emotion classification combined training method
CN112487143B (en) Public opinion big data analysis-based multi-label text classification method
CN106919646B (en) Chinese text abstract generating system and method
CN112507699B (en) Remote supervision relation extraction method based on graph convolution network
CN109858041B (en) Named entity recognition method combining semi-supervised learning with user-defined dictionary
CN110442684A (en) A kind of class case recommended method based on content of text
CN111914085B (en) Text fine granularity emotion classification method, system, device and storage medium
CN108388651A (en) A kind of file classification method based on the kernel of graph and convolutional neural networks
CN110298403B (en) Emotion analysis method and system for enterprise main body in financial news
CN111597347A (en) Knowledge embedded defect report reconstruction method and device
CN111651566B (en) Multi-task small sample learning-based referee document dispute focus extraction method
CN108427665A (en) A kind of text automatic generation method based on LSTM type RNN models
CN112417098A (en) Short text emotion classification method based on CNN-BiMGU model
CN110807101A (en) Scientific and technical literature big data classification method
CN112487200B (en) Improved deep recommendation method containing multi-side information and multi-task learning
CN111144094A (en) Text classification method based on CNN and Bi-GRU
CN112800222B (en) Multi-task auxiliary limit multi-label short text classification method using co-occurrence information
CN115099338A (en) Power grid master equipment-oriented multi-source heterogeneous quality information fusion processing method and system
CN111401003B (en) Method for generating humor text with enhanced external knowledge
Yue et al. Chinese Relation Extraction on Forestry Knowledge Graph Construction.
CN112231476A (en) Improved graph neural network scientific and technical literature big data classification method
Luo Research and implementation of text topic classification based on text CNN
CN116775855A (en) Automatic TextRank Chinese abstract generation method based on Bi-LSTM

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20200512

RJ01 Rejection of invention patent application after publication