CN111597340A

CN111597340A - Text classification method and device and readable storage medium

Info

Publication number: CN111597340A
Application number: CN202010442697.3A
Authority: CN
Inventors: 杜渂; 邱祥平; 雷霆; 王聚全; 王孟轩; 王月; 彭明喜; 陈健; 林永生; 杨博; 刘冉东; 索涛; 和传志; 曹若麟; 李帅帅
Original assignee: Ds Information Technology Co ltd
Current assignee: Ds Information Technology Co ltd
Priority date: 2020-05-22
Filing date: 2020-05-22
Publication date: 2020-08-28

Abstract

The invention provides a text classification method and device and a readable storage medium, wherein the text classification method comprises the following steps: acquiring a text to be classified; inputting the text to be classified into a pre-constructed text classification model to obtain the type of the text to be classified; the processing step of the text classification model to the text to be classified comprises the following steps: performing word embedding operation on the text to be classified to obtain a corresponding digital vector matrix; inputting the digital vector matrix into a CNN model to obtain a first characteristic vector; inputting the first feature vector into a BilSTM model to obtain a second feature vector; and inputting the second feature vector into a classifier to obtain the type of the text to be classified. According to the invention, the CNN, the BilSTM and the MLP are organically combined together, a text classification model suitable for the characteristics of the warning text is built, and the accuracy of text classification is improved.

Description

Text classification method and device and readable storage medium

Technical Field

The present invention relates to the field of natural language processing technologies, and in particular, to a text classification method and apparatus, and a readable storage medium.

Background

After receiving the telephone alarm, the alarm receiving platforms 110 in each area need to fill in an alarm receiving list, record the description of the alarm person, and preliminarily determine the alarm condition category in a short time according to the description of the alarm person. The policeman receives the alarm and goes to the place of affairs. And filling an alarm processing list by the on-site disposal policemen according to the on-site alarm output and processing conditions, wherein the alarm processing list comprises more specific description of the alarm condition, and the alarm condition type is judged according to the actual condition feedback.

It can be seen that at present, the judgment of the alarm types mainly depends on manpower classification, the flow is complex, the timeliness is low, the accurate classification of the alarms is difficult to ensure, and the alarm processing efficiency is difficult to promote.

In recent years, the public security industry constructs a public security big data application platform containing various police through various technologies utilizing big data, and collects mass data, wherein the mass data contains a huge amount of police situation recording text information. The text information provides data support for a text classification method based on a machine learning algorithm.

Many neural network-based text classification methods have emerged with the development of deep learning. Because the text is time-series data, the academic world mainly adopts a recurrent neural network to capture text information. Goles et al improve the conventional forward neural network by using the cyclic recursion of a hidden layer, and provide an RNN, which uses the cyclic recursion characteristic to mine sequence information of data, and due to the cyclic recursion characteristic of the RNN, the network structure has high complexity, so that the data processing is time-consuming, and the RNN has the problems of gradient explosion, gradient disappearance and the like. Schuster et al propose a variant BilSTM network of RNN, which not only can obtain longer sequence information, but also can better express context information through a bidirectional structure, and the improved BilSTM network relieves the problems of gradient explosion and gradient disappearance to a certain extent, but also further increases the amount of calculation.

A Text-CNN model is proposed in 2014 by Yoon Kim, a Convolutional Neural Network (CNN) is applied to a Text classification task, the CNN has the characteristics of sparse connection and parameter sharing, so that the time cost problem is remarkably improved, but the CNN mainly extracts key features with different lengths through the size of a convolution kernel, sentences with obvious reference relations in context need to have a larger convolution kernel to define the reference relations, and the reference relations are obviously unreasonable for the CNN. So while CNN can efficiently mine local semantic features of text data and train very fast, it cannot acquire context information.

From the above analysis, each network structure has its adaptive processing range and its own characteristics. Therefore, it is necessary to improve the prior art to select an appropriate classification model according to the characteristics of the alert text, so as to improve the classification accuracy of the alert text.

Disclosure of Invention

One of the objectives of the present invention is to provide a text classification method and apparatus, and a readable storage medium, to overcome some of the disadvantages in the prior art.

The technical scheme provided by the invention is as follows:

a method of text classification, comprising: acquiring a text to be classified; inputting the text to be classified into a pre-constructed text classification model to obtain the type of the text to be classified; the processing step of the text classification model to the text to be classified comprises the following steps: performing word embedding operation on the text to be classified to obtain a corresponding digital vector matrix; inputting the digital vector matrix into a CNN model to obtain a first characteristic vector; inputting the first feature vector into a BilSTM model to obtain a second feature vector; and inputting the second feature vector into a classifier to obtain the type of the text to be classified.

Further, the word embedding operation on the text to be classified includes: performing word embedding operation on the text to be classified according to the BERT model

Further, the inputting the digital vector matrix into a CNN model to obtain a first feature vector includes: adopting various convolution kernels with different lengths to respectively carry out sliding convolution on the digital vector matrix to obtain convolution results with different scales; respectively executing completion operation on the convolution results with different scales to obtain convolution results with the same scale; correcting the convolution result with the same scale by adopting a correction nonlinear unit; and performing pooling operation on the corrected convolution result to obtain a first feature vector.

Further, after the inputting the first feature vector into the BiLSTM model to obtain a second feature vector, the method includes: inputting the second feature vector into an attention mechanism model to obtain a second feature vector carrying attention weight; the inputting the second feature vector into the classifier includes: and inputting the second feature vector carrying the attention weight into a classifier.

Further, the inputting the second feature vector into a classifier to obtain the type of the text to be classified includes: and inputting the second feature vector into a multilayer perceptron to obtain the type of the text to be classified.

The present invention also provides a text classification apparatus, comprising: the text acquisition module is used for acquiring texts to be classified; the text classification module is used for inputting the text to be classified into a pre-constructed text classification model to obtain the type of the text to be classified; the text classification module comprises: the word embedding unit is used for carrying out word embedding operation on the text to be classified to obtain a corresponding digital vector matrix; the convolution unit is used for inputting the digital vector matrix into a CNN model to obtain a first characteristic vector; the long-time and short-time memory unit is used for inputting the first feature vector into a BilSTM model to obtain a second feature vector; and the classification unit is used for inputting the second feature vector into a classifier to obtain the type of the text to be classified.

Further, the word embedding unit is further configured to perform a word embedding operation on the text to be classified according to a BERT model.

Further, the convolution unit is further configured to perform sliding convolution on the digital vector matrix by using multiple convolution kernels with different lengths, so as to obtain convolution results with different scales; respectively executing completion operation on the convolution results with different scales to obtain convolution results with the same scale; correcting the convolution result with the same scale by adopting a correction nonlinear unit; and performing pooling operation on the corrected convolution result to obtain a first feature vector.

Further, the text classification module further comprises: the attention unit is used for inputting the second feature vector into an attention mechanism model to obtain a second feature vector carrying attention weight; the classification unit is further configured to input the second feature vector carrying the attention weight into a classifier, so as to obtain the type of the text to be classified.

The invention also provides a computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the text classification method as described above.

The text classification method, the text classification device and the readable storage medium provided by the invention at least have the following beneficial effects:

1. the method organically combines the CNN model, the BilSTM model and the MLP together, utilizes the unique advantages of various deep learning networks in the feature extraction process of unstructured data, optimizes short texts, maintains the context correlation in the warning text, builds a text classification model suitable for the characteristics of the warning text, and improves the accuracy of text classification.

2. According to the invention, word embedding operation is carried out by adopting the BERT model, and an Attention mechanism is introduced behind the BilSTM model, so that the accuracy of text classification is further improved.

Drawings

The above features, technical features, advantages and implementations of a text classification method and apparatus, a readable storage medium will be further described in the following detailed description of preferred embodiments in a clearly understandable manner, with reference to the accompanying drawings.

FIG. 1 is a flow diagram of one embodiment of a text classification method of the present invention;

FIG. 2 is a flow diagram of another embodiment of a text classification method of the present invention;

FIG. 3 is a schematic structural diagram of one of the text classification models of FIGS. 2 and 5;

FIG. 4 is a schematic structural diagram of an embodiment of a text classification apparatus according to the present invention;

fig. 5 is a schematic structural diagram of another embodiment of a text classification apparatus according to the present invention.

The reference numbers illustrate:

100. the system comprises a text acquisition module, a text classification module, a word embedding unit, a convolution unit, a long-time memory unit, a attention unit, a classification unit, a model construction module, a preprocessing unit, a model training unit, a model evaluation unit and a text classification module, wherein the text acquisition module comprises 200, the text classification module comprises 210, the word embedding unit comprises 220, the convolution unit comprises 230, the long-time memory unit comprises 240, the attention unit.

Detailed Description

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the following description will be made with reference to the accompanying drawings. It is obvious that the drawings in the following description are only some examples of the invention, and that for a person skilled in the art, other drawings and embodiments can be derived from them without inventive effort.

For the sake of simplicity, the drawings only schematically show the parts relevant to the present invention, and they do not represent the actual structure as a product. In addition, in order to make the drawings concise and understandable, components having the same structure or function in some of the drawings are only schematically depicted, or only one of them is labeled. In this document, "one" means not only "only one" but also a case of "more than one".

One embodiment of the present invention, as shown in fig. 1, is a text classification method, including:

step S100 acquires a text to be classified.

Specifically, the original text is subjected to word segmentation processing to obtain a text to be classified. The word segmentation process can be performed by using a Chinese word segmentation tool in the prior art, such as: NLPIR word segmentation tool, Jieba word segmentation tool, Pkuseg word segmentation tool.

Taking the alert situation text as an example, the original text can be derived from the description of the alert situation in the alarm receiving list or the alarm handling list; and obtaining the text to be classified of the alert situation category to be determined after word segmentation processing.

Step S200, inputting the text to be classified into a pre-constructed text classification model to obtain the type of the text to be classified.

The text classification model is a supervised learning model that, through training of the model, automatically classifies documents into predefined categories. The text classification model comprises a word embedding layer, a CNN layer, a BilSTM layer and a classifier. The word embedding layer is used for completing word embedding operation, the CNN layer is used for extracting local features of the text, the BilSTM layer is used for extracting time sequence features and context correlation of the text, and the classifier is used for classifying the text into a predefined category.

The step S200 includes:

step S210 performs word embedding operation on the text to be classified to obtain a corresponding digital vector matrix.

Specifically, a precondition for processing text classification using natural language technology is to convert text data into a vectorized form that can be recognized and processed by a computer. The Word Embedding (Word Embedding) operation is the conversion of each Word in the text into a corresponding number vector. Optionally, Word embedding operation is performed on the text after Word segmentation by using a Word2vec model (Word to Vector generation model).

The Word2vec model solves the problem of high dimensionality and high sparsity of discrete representation models (such as One-hot models and bag-of-words models), but the model cannot identify ambiguous words.

Further optionally, word embedding operation is performed on the text to be classified according to a BERT model.

The BERT model performs Embedding operation (namely text encoding operation) on the word contribution, the position and the sentence segmentation of each word in the text respectively, and splices word vectors (namely Token Embedding, PositionEmbedding and Segment Embedding) obtained by the three Embedding results to obtain a final BERT word vector.

The Token Embedding converts each word into a vector with fixed dimensionality, the Segment Embedding is used for judging whether two text semantemes are similar, and the Position Embedding is used for solving the situation that the same word has different meanings at different positions.

The model can accurately represent the characteristics of each word in the text under different conditions, so that the model further increases the generalization capability of word vectors and solves the expression problem of ambiguous words.

The adoption of the BERT model to map the warning text features to a high-dimensional space can further reduce the defect of semantic information loss caused by the traditional coding.

After the word embedding operation, the text is converted into a matrix of numeric vectors. Assuming that the text to be classified contains n words, each word corresponds to a d-dimensional vector, and the dimensionality of the corresponding digital vector matrix obtained through word embedding operation is n x d.

Step S220 inputs the number vector matrix into the CNN model to obtain a first feature vector.

In particular, the convolutional neural network CNN can well capture local correlation (i.e., local features, short-distance correlation) by extracting key information in a sentence through a convolution operation using a convolution kernel (also called filter).

The warning text mostly takes short text as the main part, so the characteristic that the CNN model is good at extracting the short-distance correlation of the text can be utilized to well solve the problem that the keywords in the short text contribute to the judgment of the whole text type.

Convolution kernels of different lengths may be used to extract different types of local features.

Preferably, a plurality of convolution kernels with different lengths are adopted to respectively carry out sliding convolution on the digital vector matrix to obtain convolution results with different scales; respectively executing completion operation on the convolution results with different scales to obtain convolution results with the same scale; correcting the convolution result with the same scale by adopting a correction nonlinear unit; and performing pooling operation on the corrected convolution result to obtain a first feature vector.

For example, for the alert text data, four convolution kernels, i.e., 1 x 3, 1 x 4, 1 x 5 and 1 x 6, are designed to perform sliding convolution on the text to obtain convolution results of different scales. Of course, convolution kernels of other sizes may be designed according to actual text classification requirements. Wherein, the convolution general formula of a single channel is as follows:

C_i＝f(∑w_i·X_i:i+h-1+b)

wherein w represents a convolution kernel weight matrix, h represents a convolution kernel size, b is an offset, and X_i:i+h-1The word vector representing the i-th to i + h-1-th positions, and f the activation function.

And adding a nonlinear factor into the neural network by using an activation function, so that the expression capability of the model is improved. Optionally, the activation function f is selected as a modified linear unit (ReLU).

After convolution of different scales is carried out on the same input text, a mode of performing completion operation and splicing according to the maximum size result enables text feature matrixes passing through different convolution kernels to still keep uniform size features, and the effect of a CNN layer in the whole text classification model is improved.

In order to prevent the network parameters from being too large and unfavorable for calculation, the dimensionality of a text feature matrix obtained after convolution operation is reduced, a pooling layer is introduced after the convolution operation, the size of a model is reduced, the calculation speed is improved, and meanwhile the robustness of extracted features is improved. Optionally, one pooling approach is selected from a maximum pooling approach, an average pooling approach, and a sum pooling approach. And obtaining a first feature vector according to the text feature matrix after the pooling treatment.

Step S230 inputs the first feature vector into the BiLSTM model to obtain a second feature vector.

In particular, because a large number of features in text data have a certain time sequence, the convolutional neural network cannot effectively extract the features related to the time sequence. And the LSTM network (long-time and short-time memory network) can effectively extract the time sequence related characteristics of the text and learn the long-distance correlation of the sequence.

The BilSTM (bidirectional long-and-short-term memory network) is an extension of a unidirectional LSTM network, can respectively extract time sequence related characteristics of texts from the forward direction and the reverse direction, and can better capture context information in sentences. Therefore, after the first feature vector carrying the short-distance correlation (i.e. local feature) is input into the BiLSTM model, the second feature vector carrying both the long-distance correlation and the short-distance correlation can be obtained.

The following equations represent the relationship between the various control gates and states in the LSTM:

i_t＝σ(W_i·[h_t-1,x_t]+b_i)；

f_t＝σ(W_f·[h_t-1,x_t]+b_f)；

o_t＝σ(W_o·[h_t-1,x_t]+b_o)；

h_t＝o_t*tanh(C_t)；

in the formula i_tAs a function of the input gate control, f_tTo forget the gating function, o_tFor the output gate control function, σ is sigmoid activation function, h_tThe hidden state of the LSTM is indicated,

and C_tRefers to the state of the cell at time step t, W represents the parameter to be trained by the model, x_tAnd the dimension of the first feature vector extracted for the CNN at the time t is consistent with the output dimension of the CNN. The forward LSTM and the backward LSTM are combined to form the BiLSTM, and the long-sequence remote dependency relationship can be effectively maintained.

In order to highlight the influence degree of each part of the input on the output, the LSTM model effect is optimized, and an attention mechanism is introduced. The input portion that contributes a large amount to the output is weighted more heavily.

Optionally, a self-attentive mechanism layer is introduced after the BiLSTM layer. And splicing the attention weight matrix in the self-attention mechanism model and the feature vector (namely the second feature vector) output by the BilSTM together to obtain the second feature vector carrying the attention weight.

The set of vectors for the LSTM layer input is represented as H: { h₁,h₂,…,h_T}. The weight matrix obtained by the Self-Attention layer (Self-Attention layer) is obtained by the following formula:

M＝tanh(H)；

α＝softmax(w^TM)；

γ＝Ha^T；

wherein, w^TIs a parameter matrix obtained by BilSTM layer learningAnd (4) transposition.

Step S240 inputs the second feature vector into a classifier to obtain the type of the text to be classified.

Specifically, if an Attention layer is introduced behind a BilSTM layer, a second feature vector carrying Attention weight is input into a classifier for text classification. And if the Attention layer is not introduced behind the BilSTM layer, inputting the second feature vector into a classifier for text classification.

The classifier includes an MLP (Multi-Layer Perceptron). The MLP layer is composed of a sense (full connection layer) and other additional layers, and the core of the MLP layer is a sense layer which is used for gathering network information to obtain an output result.

In the embodiment, the CNN model, the BilSTM model and the MLP are organically combined together, unique advantages of various deep learning networks in the feature extraction process of unstructured data are utilized, the key feature extraction process is optimized, short texts are optimized, context correlation in the warning situation texts is kept, text classification accuracy is improved, and therefore the alarm receiving and dealing efficiency is further improved.

Another embodiment of the present invention, as shown in fig. 2, is a text classification method, including:

step S100 acquires a text to be classified.

And performing word segmentation on the obtained warning situation text by adopting a Jieba word segmentation tool to obtain the text to be classified.

The text classification model comprises a word embedding layer, a CNN layer, a BilSTM layer, an attention mechanism layer and a classifier, and the structure of the text classification model is shown in FIG. 3.

The step S200 includes:

step S210, according to the BERT model, word embedding operation is carried out on the text to be classified to obtain a corresponding digital vector matrix.

In particular, selecting an appropriate text representation model during the word embedding stage may improve the accuracy of text classification. The BERT model is a deep text representation method, the network structure of the BERT model is a typical deep network, the characteristics of each word in the text under different conditions can be accurately represented, and good effect is achieved in the test and verification of the alert text classification.

Step S221, adopting various convolution kernels with different lengths to respectively carry out sliding convolution on the digital vector matrix to obtain convolution results with different scales;

step S222, performing completion operation on the convolution results of different scales respectively to obtain convolution results of the same scale;

step S223, correcting the convolution result with the same scale by adopting a correction nonlinear unit;

step S224 performs pooling operation on the corrected convolution result to obtain a first feature vector.

Specifically, in the CNN layer, different types of local features in the warning situation text are extracted by using convolution kernels with different lengths.

Aiming at the alarm text data, four convolution kernels, namely 1 x 3, 1 x 4, 1 x 5 and 1 x 6, are respectively designed for the convolution layer to carry out sliding convolution on the text to obtain convolution results with different scales. Of course, convolution kernels of other sizes may be designed according to actual text classification requirements.

The activation function is selected as a modified linear element. In order to reduce the dimensionality of the text feature matrix obtained after the convolution operation, a pooling layer is introduced after the convolution operation, and a maximum pooling mode is adopted. And obtaining a first feature vector according to the text feature matrix after the pooling treatment.

Specifically, in the BilSTM layer, the characteristics of the BilSTM model are utilized to extract the long-distance correlation in the text, namely the context correlation in the warning text is kept.

The BilSTM network includes a forward Bi-LSTM network and a reverse Bi-LSTM network. In the context of figure 3, it is shown,

the input vector xi that is the ith word passes throughForward semantic features derived from forward Bi-LSTM networks,

is a reverse semantic feature obtained by the input vector xi through a reverse Bi-LSTM network,

the forward semantic features and the reverse semantic features are integrated.

Step S231 inputs the second feature vector into the attention mechanism model to obtain a second feature vector carrying attention weight.

Specifically, different weights are applied according to the contribution degree of each word in the text to be classified to the text classification result by using the characteristic of an attention mechanism, and words with large contribution degrees are given higher weights.

Step S241 inputs the second feature vector carrying the attention weight into a classifier to obtain the type of the text to be classified.

Specifically, the classifier includes Dropout and MLP. The Dropout layer adopts a Dropout algorithm and is used for solving the problem that in the deep learning process, when training samples are few and model parameters are too much, overfitting phenomenon easily occurs to the model during training.

The MLP layer is composed of a sense (full connection layer) and other additional layers, and the core of the MLP layer is a sense layer which is used for gathering network information to obtain an output result.

Optionally, the classifier further includes a Batch Normalization (BN) layer, which is added after the MLP layer, and is used to solve the problem that the training speed of the text classification model is too slow when the value of the learning rate is set to be large. The BN layer can improve the regularization strategy, thoroughly disorder training data and improve the gradient flowing through the network, so that the training speed of the model can be improved.

Further comprising: step S300 builds a text classification model.

The step S300 includes:

step S310 acquires a text data set, and preprocesses data of the text data set.

The preprocessing comprises data cleaning, word segmentation and labeling. The data cleaning is used for screening out information irrelevant to classification, including deleting irrelevant data, repeating data, processing abnormal values and missing value data and the like. And performing word segmentation processing on the cleaned data, and marking correct classification labels.

The preprocessed data set is divided into a training set, a validation set and a test set.

Step S320 builds a text classification model.

The text classification model comprises (1) a word embedding layer, (2) a CNN layer, (3) a BilSTM layer, (4) an attention mechanism layer, (5) a classification layer and (6) an output layer (namely label), and is structured as shown in FIG. 3.

Step S330, an objective function is established, and a designed text classification model is trained.

In order to facilitate evaluation of the model proposed herein, a text classification common evaluation index is adopted: accuracy (Precision), Recall (Recall), and F1 values.

Classifying the classifier into class C_iThe value of F1 is defined as follows;

the F1 value combines the results of accuracy and recall. When the value of F1 is higher, the classification result of the classifier is considered to be more reliable, or the "accuracy" of the classifier is considered to be higher.

The text classification model provided by the embodiment is applied to specific cases:

the method comprises the steps that the types of 9 common cases in criminal police are selected, 18000 pieces of data are obtained, and a data set is divided into a training set, a verification set and a test set, wherein the training set, the verification set and the test set are shown in table 1.

TABLE 1 data set partitioning information

The Python language was used under the Keras framework. The network model during training selects the Adam optimizer, and the network parameter settings are shown in table 2.

Table 2 network parameter settings

The loss function utilizes a cross-entropy loss function in the Keras model.

And (3) testing results of the model: after training, the prediction accuracy of the model on the test set reaches 97%, and the F1 value reaches 97%.

In the embodiment, word embedding operation is performed by adopting the BERT model, and an Attention mechanism is introduced behind the BilSTM model, so that the accuracy of text classification is further improved.

In an embodiment of the present invention, as shown in fig. 4, a text classification apparatus includes:

the text obtaining module 100 is configured to obtain a text to be classified.

And the text classification module 200 is configured to input the text to be classified into a pre-constructed text classification model, so as to obtain the type of the text to be classified.

The text classification model is a supervised learning model that, through training of the model, automatically classifies documents into predefined categories. The text classification model comprises a word embedding layer, a CNN layer, a BilSTM layer and a classifier.

The text classification module 200 includes:

the word embedding unit 210 is configured to perform word embedding operation on the text to be classified to obtain a corresponding digital vector matrix.

And a convolution unit 220, configured to input the digital vector matrix into a CNN model to obtain a first feature vector.

C_i＝f(∑w_i·X_i:i+h-1+b)

And the long-time and short-time memory unit 230 is configured to input the first feature vector into a BiLSTM model to obtain a second feature vector.

In particular, because a large number of features in text data have a certain time sequence, the convolutional neural network cannot effectively extract the features related to the time sequence. And the LSTM network can effectively extract the time sequence related characteristics of the text and learn the long-distance correlation of the sequence.

The BilSTM is an extension of a unidirectional LSTM network, can respectively extract time sequence related characteristics of texts from the forward direction and the reverse direction, and can better capture context information in sentences. So the first eigenvector carrying the short-range correlation (i.e. local feature) is input into the BiLSTM model to obtain the second eigenvector carrying both the long-range correlation and the short-range correlation.

i_t＝σ(W_i·[h_t-1,x_t]+b_i)；

f_t＝σ(W_f·[h_t-1,x_t]+b_f)；

o_t＝σ(W_o·[h_t-1,x_t]+b_o)；

h_t＝o_t*tanh(C_t)；

in the formula i_tAs a function of the input gate control, f_tTo forget the gating function, o_tFor the output gate control function, σ is sigmoid activation function, h_tIndicating LSTMThe state of the electronic device is hidden from view,

M＝tanh(H)；

α＝softmax(w^TM)；

γ＝Ha^T；

wherein, w^TIs the transpose of the parameter matrix obtained by the learning of the BilSTM layer.

And the classification unit 250 is configured to input the second feature vector into a classifier, so as to obtain the type of the text to be classified.

In the embodiment, the CNN model, the BilSTM model and the MLP are organically combined together, and the unique advantages of various deep learning networks in the feature extraction process of unstructured data are utilized, so that the short text is optimized, the context correlation in the alert text is kept, the text classification accuracy is improved, and the alarm receiving and processing efficiency is further improved.

Another embodiment of the present invention, as shown in fig. 5, is a text classification apparatus including:

the text obtaining module 100 is configured to obtain a text to be classified.

The text classification module 200 includes:

and the word embedding unit 210 is configured to perform word embedding operation on the text to be classified according to the BERT model to obtain a corresponding digital vector matrix.

A convolution unit 220, configured to perform sliding convolution on the digital vector matrix by using multiple convolution kernels with different lengths, respectively, to obtain convolution results with different scales; respectively executing completion operation on the convolution results with different scales to obtain convolution results with the same scale; correcting the convolution result with the same scale by adopting a correction nonlinear unit; and performing pooling operation on the corrected convolution result to obtain a first feature vector.

And the attention unit 240 is configured to input the second feature vector into an attention mechanism model, so as to obtain a second feature vector carrying attention weight.

And the classifying unit 250 is configured to input the second feature vector carrying the attention weight into a classifier, so as to obtain the type of the text to be classified.

Optionally, the classifier further includes a Batch Normalization (BN) layer, which is added after the MLP layer, and is used to solve the problem that the training speed of the text classification model is too slow when the value of the learning rate is set to be large.

And the model building module 300 is used for building a text classification model.

The model building module 300 includes:

the preprocessing unit 310 is configured to obtain a text data set and preprocess data of the text data set.

And the model training unit 320 is used for establishing an objective function and training the designed text classification model.

And the evaluation unit 330 is used for evaluating the trained model.

It should be noted that the embodiment of the text classification apparatus provided by the present invention and the embodiment of the text classification method provided by the foregoing are all based on the same inventive concept, and can achieve the same technical effects. Therefore, other specific contents of the embodiment of the text classification device can refer to the description of the embodiment of the text classification method.

In an embodiment of the present invention, a computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, is adapted to carry out a text classification method as described in the preceding embodiment. That is, when part or all of the technical solutions of the embodiments of the present invention contributing to the prior art are embodied by means of a computer software product, the computer software product is stored in a computer-readable storage medium. The computer readable storage medium can be any portable computer program code entity apparatus or device. For example, the computer readable storage medium may be a U disk, a removable magnetic disk, a magnetic diskette, an optical disk, a computer memory, a read-only memory, a random access memory, etc.

It should be noted that the above embodiments can be freely combined as necessary. The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims

1. A method of text classification, comprising:

acquiring a text to be classified;

inputting the text to be classified into a pre-constructed text classification model to obtain the type of the text to be classified;

the processing step of the text classification model to the text to be classified comprises the following steps:

performing word embedding operation on the text to be classified to obtain a corresponding digital vector matrix;

inputting the digital vector matrix into a CNN model to obtain a first characteristic vector;

inputting the first feature vector into a BilSTM model to obtain a second feature vector;

and inputting the second feature vector into a classifier to obtain the type of the text to be classified.

2. The method for classifying texts according to claim 1, wherein the performing word embedding operation on the texts to be classified comprises:

and performing word embedding operation on the text to be classified according to a BERT model.

3. The method of claim 1, wherein the inputting the number vector matrix into a CNN model to obtain a first feature vector comprises:

adopting various convolution kernels with different lengths to respectively carry out sliding convolution on the digital vector matrix to obtain convolution results with different scales;

respectively executing completion operation on the convolution results with different scales to obtain convolution results with the same scale;

correcting the convolution result with the same scale by adopting a correction nonlinear unit;

and performing pooling operation on the corrected convolution result to obtain a first feature vector.

4. The text classification method according to claim 1, characterized in that:

after the first feature vector is input into the BilSTM model to obtain a second feature vector, the method comprises the following steps:

inputting the second feature vector into an attention mechanism model to obtain a second feature vector carrying attention weight;

the inputting the second feature vector into the classifier includes: and inputting the second feature vector carrying the attention weight into a classifier.

5. The method of claim 1, wherein the inputting the second feature vector into a classifier to obtain the type of the text to be classified comprises:

and inputting the second feature vector into a multilayer perceptron to obtain the type of the text to be classified.

6. A text classification apparatus, comprising:

the text acquisition module is used for acquiring texts to be classified;

the text classification module is used for inputting the text to be classified into a pre-constructed text classification model to obtain the type of the text to be classified;

the text classification module comprises:

the word embedding unit is used for carrying out word embedding operation on the text to be classified to obtain a corresponding digital vector matrix;

the convolution unit is used for inputting the digital vector matrix into a CNN model to obtain a first characteristic vector;

the long-time and short-time memory unit is used for inputting the first feature vector into a BilSTM model to obtain a second feature vector;

and the classification unit is used for inputting the second feature vector into a classifier to obtain the type of the text to be classified.

7. The text classification apparatus according to claim 6, characterized in that:

and the word embedding unit is further used for carrying out word embedding operation on the text to be classified according to the BERT model.

8. The text classification apparatus according to claim 6, characterized in that:

the convolution unit is further used for respectively performing sliding convolution on the digital vector matrix by adopting a plurality of convolution kernels with different lengths to obtain convolution results with different scales; respectively executing completion operation on the convolution results with different scales to obtain convolution results with the same scale; correcting the convolution result with the same scale by adopting a correction nonlinear unit; and performing pooling operation on the corrected convolution result to obtain a first feature vector.

9. The text classification device of claim 8, wherein the text classification module further comprises:

the attention unit is used for inputting the second feature vector into an attention mechanism model to obtain a second feature vector carrying attention weight;

the classification unit is further configured to input the second feature vector carrying the attention weight into a classifier, so as to obtain the type of the text to be classified.

10. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the text classification method according to any one of claims 1 to 5.