CN110399482B

CN110399482B - Text classification method, model and device

Info

Publication number: CN110399482B
Application number: CN201910492286.2A
Authority: CN
Inventors: 杨志明
Original assignee: Ideepwise Artificial Intelligence Robot Technology Beijing Co ltd
Current assignee: Ideepwise Artificial Intelligence Robot Technology Beijing Co ltd
Priority date: 2019-06-06
Filing date: 2019-06-06
Publication date: 2021-12-03
Anticipated expiration: 2039-06-06
Also published as: CN110399482A

Abstract

The invention discloses a text classification method, a model and a device, wherein the method comprises the following steps: converting the text to be classified into a word vector V1; inputting the word vector V1 into a convolution part of the CNN model, outputting a feature vector V3 by the convolution part of the CNN model, inputting the feature vector V3 into a first pooling layer, and outputting a feature vector V4 by the first pooling layer; and inputting the word vector V1 into a second pooling layer, the second pooling layer outputting a feature vector V5; merging the feature vector V4 and the feature vector V5 into a feature vector V6; inputting the feature vector V6 into a full connection layer; and the full connection layer outputs the text classification of the text to be classified. The method combines the RNN model and the CNN model, and improves the accuracy of the text classification result.

Description

Text classification method, model and device

Technical Field

The invention relates to the field of computers, in particular to a text classification method, a text classification model and a text classification device.

Background

With the development of the internet and social media, at present, a great amount of text information including wikipedia entries, academic articles, news reports and various after-sales service comments exists on the network, and a great amount of valuable information is contained in the text information, specific information in the text information can be roughly extracted by the existing text classification technology, for example, the satisfaction degree of a consumer on the product or service can be known by performing sentiment analysis on the after-sales comments, the field of the news reports can be roughly distinguished by classifying news data, and the relation in a knowledge graph can be obtained by classifying sentences of the wikipedia data. In summary, text classification is an extremely important technology, and at present, more common methods include traditional text classification methods such as svm, nearest neighbor, and decision tree, and deep learning models.

Currently, popular deep learning models include rnn (current Neural networks), cnn (volumetric Neural networks), and transform.

RNN is good at text classification of long sequences of text. CNN is applied to image processing first and then to the field of artificial intelligence, and has an advantage in that local text information can be better recognized. the transformer is a new generation of encoder proposed by google, which overcomes the dependence of RNN on the state before sequence information and performs better than RNN and CNN in most artificial intelligence processing tasks, but the transformer performs worse on small and medium data sets, and the training is very unstable, the long-distance dependence capability is not as good as that of the traditional RNN.

Disclosure of Invention

In view of the above, the present invention provides a text classification method, a text classification model and a text classification device, so as to solve the disadvantages of the existing deep learning model.

The invention provides a text classification method, which comprises the following steps

Converting the text to be classified into a word vector V1;

inputting the word vector V1 into a convolution part of the CNN model, outputting a feature vector V3 by the convolution part of the CNN model, inputting the feature vector V3 into a first pooling layer, and outputting a feature vector V4 by the first pooling layer; and inputting the word vector V1 into a second pooling layer, the second pooling layer outputting a feature vector V5;

merging the feature vector V4 and the feature vector V5 into a feature vector V6;

and inputting the feature vector V6 into a full-link layer, and outputting the text classification of the text to be classified by the full-link layer.

The present invention also provides a text classification model, which includes:

vector conversion layer: for converting the text to be classified into a word vector V1;

a feature extraction layer: for inputting the word vector V1 into the convolution part of the CNN model, which outputs the feature vector V3, and the feature vector V3 into the first pooling layer, which outputs the feature vector V4; and, for inputting the word vector V1 into the second pooling layer, the second pooling layer outputting the feature vector V5;

a characteristic merging layer: for merging the feature vector V4 and the feature vector V5 into a feature vector V6;

full connection layer: and inputting the feature vector V6 into a full-link layer, and outputting the text classification of the text to be classified by the full-link layer.

The present invention also provides a non-transitory computer readable storage medium storing instructions that, when executed by a processor, cause the processor to perform the steps in the text classification method described above.

The invention also provides a text classification device which is characterized by comprising a processor and the non-transitory computer readable storage medium.

According to the text classification method, the series structure of the RNN and the CNN and the modeling mode with the characteristics of the series structure are utilized to obtain richer word vector characteristics with different semantic levels, and the classification accuracy is improved.

The method or the model combines the excellent long sequence modeling capability of the RNN and the advantages of the local modeling of the CNN, and the classification effect in most text classification tasks is superior to that of the traditional RNN and CNN models.

Compared with a transformer, the text classification method or the model training of the invention is stable, and because the model parameters are fewer, the invention only needs less hardware resource overhead.

Drawings

FIG. 1 is a first flowchart of a text classification method according to the present invention;

FIG. 2 is a second flowchart of the text classification method of the present invention;

FIG. 3 is a first block diagram of the text classification method of the present invention;

fig. 4 is a second structural diagram of the text classification method of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in detail with reference to the accompanying drawings and specific embodiments.

The invention provides a text classification method, as shown in figure 1, the method comprises

S11: converting the text to be classified into a word vector V1;

s13 includes S13-1, S13-2 and S13-3;

s13-1: inputting the word vector V1 into a convolution part of the CNN model, and outputting a feature vector V3 by the convolution part of the CNN model;

s13-2: inputting the feature vector V3 into a first pooling layer, the first pooling layer outputting a feature vector V4;

s13-3: inputting the word vector V1 into a second pooling layer, the second pooling layer outputting a feature vector V5;

s15: merging the feature vector V4 and the feature vector V5 into a feature vector V6;

s17: and inputting the feature vector V6 into a full-link layer, and outputting the text classification of the text to be classified by the full-link layer.

The convolution part of the CNN model comprises one convolution layer or a plurality of series convolution layers, and each convolution layer consists of a prior multi-scale convolution kernel convolution and a subsequent pooling layer.

Wherein the first pooling layer and the second pooling layer may be a maximum pooling layer or an average pooling layer.

Optionally, as shown in fig. 2, it is also possible to add between S11 and S13:

s12, inputting the word vector V1 into a BLSTM (Bidirectional Long Short-term Memory) model, and outputting a feature vector V1' by the BLSTM model;

accordingly, S13-1 is adjusted to: inputting the feature vector V1' into a convolution part of the CNN model, and outputting a feature vector V3 by the convolution part of the CNN model;

s13-3 is adjusted as follows: inputting the feature vector V1' into a second pooling layer, the second pooling layer outputting a feature vector V5;

the BLSTM can bidirectionally extract the correlation of long-distance words in the text to be classified, and is favorable for improving the accuracy of later text classification.

The method combines the excellent long sequence modeling capability of the RNN and the advantages of the local modeling of the CNN, and the classification effect in most text classification tasks is superior to that of the traditional RNN and CNN models.

Compared with a transformer, the text classification method is stable in training and only needs less hardware resource overhead because of less model parameters.

The present invention also provides a text classification model, as shown in fig. 3, comprising: the device comprises a vector conversion layer, a feature extraction layer, a feature merging layer and a full connection layer.

the characteristic extraction layer comprises a convolution part of the CNN model, a first pooling layer and a second pooling layer;

convolution part of CNN model: for inputting the word vector V1 into the convolution portion of the CNN model, which outputs a feature vector V3;

a first pooling layer: for inputting the feature vector V3 into the first pooling layer, which outputs the feature vector V4;

a second pooling layer: for inputting the word vector V1 into the second pooling layer, which outputs the feature vector V5;

full connection layer: inputting the feature vector V6 into a full connection layer; and the full connection layer outputs the text classification of the text to be classified.

As shown in fig. 4, between the vector conversion layer and the feature extraction layer, there may be further included:

the BLMST model, the BLMST model input word vector V1, outputs the feature vector V1'.

Accordingly, the applicability of the convolution portion of the CNN model is adjusted to: the convolution part is used for inputting the characteristic vector V1' into the CNN model and outputting a characteristic vector V3;

the second pooling layer suitability was adjusted to: for inputting the feature vector V1' into the second pooling layer, which outputs the feature vector V5.

It should be noted that the embodiment of the text classification model or apparatus of the present invention has the same principle as the embodiment of the text classification method, and the related parts can be referred to each other.

The above description is only exemplary of the present invention and should not be taken as limiting the scope of the present invention, and any modifications, equivalents, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims

1. A method of text classification, the method comprising:

converting the text to be classified into a word vector V1;

inputting the word vector V1 into a convolution part of a CNN model, the convolution part of the CNN model outputting a feature vector V3, inputting the feature vector V3 into a first pooling layer, the first pooling layer outputting a feature vector V4; and, inputting the word vector V1 into a second pooling layer, the second pooling layer outputting a feature vector V5;

inputting the feature vector V6 into a full-link layer, and outputting the text classification of the text to be classified by the full-link layer;

the inputting of the word vector V1 into the convolution portion of the CNN model includes: inputting the word vector V1 into a BLSTM model, the BLSTM model outputting a feature vector V1 ', and inputting the feature vector V1' into a convolution part of a CNN model;

and/or, the inputting the word vector V1 into the second pooling layer comprises: inputting the word vector V1 into a BLSTM model, the BLSTM model outputting a feature vector V1 ', the feature vector V1' into a second pooling layer;

the BLSTM model is a shared model.

2. A text classification model, the model comprising:

a feature extraction layer: for inputting the word vector V1 into the convolution part of a CNN model, which outputs a feature vector V3, inputting the feature vector V3 into a first pooling layer, which outputs a feature vector V4; and, for inputting the word vector V1 into a second pooling layer, the second pooling layer outputting a feature vector V5;

a characteristic merging layer: for merging the feature vector V4 and feature vector V5 into a feature vector V6;

full connection layer: inputting the feature vector V6 into the full-link layer, and outputting the text classification of the text to be classified by the full-link layer;

and/or, the inputting the word vector V1 into the second pooling layer comprises: inputting the word vector V1 into a BLSTM model, which outputs a feature vector V1 ', inputting the feature vector V1' into a second pooling layer:

the BLSTM model is a shared model.

3. A non-transitory computer readable storage medium storing instructions that, when executed by a processor, cause the processor to perform the steps in the text classification method of claim 1.

4. A text classification apparatus comprising a processor and the non-transitory computer readable storage medium of claim 3.