CN115034299A

CN115034299A - Text classification method and device based on convolutional neural network multi-channel feature representation

Info

Publication number: CN115034299A
Application number: CN202210628402.0A
Authority: CN
Inventors: 邹瑶; 何思略; 杜嘉浩; 林志腾
Original assignee: Industrial and Commercial Bank of China Ltd ICBC
Current assignee: Industrial and Commercial Bank of China Ltd ICBC
Priority date: 2022-06-06
Filing date: 2022-06-06
Publication date: 2022-09-09

Abstract

The invention provides a text classification method and device based on convolutional neural network multi-channel feature representation, relates to the technical field of artificial intelligence, and can be applied to the technical field of finance or other technical fields. The text classification method based on the convolutional neural network multi-channel feature representation comprises the following steps: respectively inputting text data to be classified into two word vector models to obtain two word vector matrixes; inputting the two word vector matrixes into corresponding text information extraction models respectively to obtain multi-channel text characteristic data; and inputting the multi-channel text characteristic data into a text classification model to obtain a text type. The invention can enrich text representation and improve the accuracy of text classification.

Description

Text classification method and device based on convolutional neural network multi-channel feature representation

Technical Field

The invention relates to the technical field of artificial intelligence, in particular to a text classification method and device based on convolutional neural network multi-channel feature representation.

Background

Fig. 1 is a schematic diagram of a conventional text classification scheme. As shown in fig. 1, the prior art proposes a text classification scheme combining a long-short term memory network and a convolutional neural network, and firstly performs vector representation on an input text by using word embedding, extracts local features of the text through three layers of CNNs, and further integrates full-text semantics; meanwhile, using the characteristics of historical information in the LSTM storage text to acquire text context association semantics; secondly, respectively outputting and fusing the input vectors in each layer of CNN to realize the reuse of the original characteristics; and then, performing feature fusion on the text context associated semantics and the text local features of the obtained text, and classifying the final text feature vector by using a Softmax classifier after passing through the full connection layer. A Dropout mechanism is used before the Softmax classifier to prevent the algorithm from overfitting.

The prior art has the following defects:

1. the text representation adopts a single word vector which cannot comprehensively represent semantic information in the corpus, so that the text semantic information is not abundant, and the final classification effect is influenced;

2. the extraction of the text time sequence characteristics only adopts a conventional LSTM method, and although the long-distance time sequence relation of the text can be obtained, the potential semantic relation of the time sequence characteristics is not fully considered, so that the classification effect is not ideal.

Disclosure of Invention

The embodiment of the invention mainly aims to provide a text classification method and device based on convolutional neural network multi-channel feature representation, so that the text is represented in a rich manner, and the text classification accuracy is improved.

In order to achieve the above object, an embodiment of the present invention provides a text classification method based on a convolutional neural network multi-channel feature representation, including:

respectively inputting text data to be classified into two word vector models to obtain two word vector matrixes;

inputting the two word vector matrixes into corresponding text information extraction models respectively to obtain multi-channel text characteristic data;

and inputting the multi-channel text characteristic data into a text classification model to obtain a text type.

In one embodiment, the step of inputting the two word vector matrices into corresponding text information extraction models respectively to obtain multi-channel feature representation data includes:

inputting the first word vector matrix into a first text information extraction model to obtain first forward-sequence text characteristic data and first reverse-sequence text characteristic data;

inputting the second word vector matrix into a second text information extraction model to obtain second forward-sequence text characteristic data and second reverse-sequence text characteristic data;

and obtaining multi-channel feature representation data according to the first forward-sequence text feature data, the first reverse-sequence text feature data, the second forward-sequence text feature data and the second reverse-sequence text feature data.

In one embodiment, inputting the first word vector matrix into the first text information extraction model to obtain the first forward-order text feature data and the first reverse-order text feature data includes:

inputting the first word vector matrix into a first text information extraction model to respectively obtain a first forward-sequence text feature vector and a first reverse-sequence text feature vector;

vertically stacking the first positive sequence text feature vectors in sequence to obtain first positive sequence text feature data;

and vertically stacking the first reverse-order text feature vectors in sequence to obtain first reverse-order text feature data.

In one embodiment, inputting the second word vector matrix into the second text information extraction model to obtain second forward-order text feature data and second reverse-order text feature data includes:

inputting the second word vector matrix into a second text information extraction model to respectively obtain a second forward-sequence text feature vector and a second reverse-sequence text feature vector;

vertically stacking the second forward-sequence text feature vectors in sequence to obtain second forward-sequence text feature data;

and vertically stacking the second reverse-order text feature vectors in sequence to obtain second reverse-order text feature data.

In one embodiment, inputting the multi-channel text feature data into the text classification model to obtain the text type includes:

inputting the multi-channel text characteristic data into a multi-scale convolution network layer to obtain multi-dimensional text characteristic data;

and inputting the multidimensional text characteristic data into a full-connection output network to obtain the text type.

In one embodiment, the method further comprises the following steps:

the following iterative process is performed:

respectively inputting training text data into two word vector models to obtain two training word vector matrixes;

inputting the two training word vector matrixes into corresponding bidirectional long-short term memory artificial neural network models respectively to obtain multi-channel text characteristic training data;

inputting multi-channel text characteristic training data into a multi-channel convolution network model to obtain a prediction type;

determining a loss function according to the prediction type and the corresponding actual type;

and when the loss function is converged, determining the bidirectional long-short term memory artificial neural network model as a text information extraction model, determining the multi-channel convolution network model as a text classification model, and otherwise, updating the bidirectional long-short term memory artificial neural network model and the multi-channel convolution network model according to the loss function.

The embodiment of the invention also provides a text classification device based on the convolutional neural network multi-channel feature representation, which comprises the following steps:

the word vector matrix module is used for respectively inputting the text data to be classified into two word vector models to obtain two word vector matrices;

the multi-channel text characteristic data module is used for respectively inputting the two word vector matrixes into corresponding text information extraction models to obtain multi-channel text characteristic data;

and the text type module is used for inputting the multi-channel text characteristic data into the text classification model to obtain the text type.

In one embodiment, the multi-channel text feature data module comprises:

the first text characteristic data unit is used for inputting the first word vector matrix into the first text information extraction model to obtain first forward-sequence text characteristic data and first reverse-sequence text characteristic data;

the second text characteristic data unit is used for inputting a second word vector matrix into a second text information extraction model to obtain second forward-sequence text characteristic data and second reverse-sequence text characteristic data;

and the multichannel feature representation data unit is used for obtaining multichannel feature representation data according to the first forward-sequence text feature data, the first reverse-sequence text feature data, the second forward-sequence text feature data and the second reverse-sequence text feature data.

In one embodiment, the first text feature data unit includes:

the first text feature vector subunit is used for inputting the first word vector matrix into the first text information extraction model to respectively obtain a first forward-sequence text feature vector and a first reverse-sequence text feature vector;

the first positive sequence text feature data subunit is used for vertically stacking the first positive sequence text feature vectors in sequence to obtain first positive sequence text feature data;

and the first reverse-order text characteristic data subunit is used for vertically stacking the first reverse-order text characteristic vectors in sequence to obtain first reverse-order text characteristic data.

In one embodiment, the second text feature data unit includes:

the second text feature vector subunit is used for inputting the second word vector matrix into a second text information extraction model to respectively obtain a second forward-sequence text feature vector and a second reverse-sequence text feature vector;

the second positive sequence text feature data subunit is used for vertically stacking the second positive sequence text feature vectors in sequence to obtain second positive sequence text feature data;

and the second reverse-order text characteristic data subunit is used for vertically stacking the second reverse-order text characteristic vectors in sequence to obtain second reverse-order text characteristic data.

In one embodiment, the text type module comprises:

the multi-dimensional text characteristic data unit is used for inputting the multi-channel text characteristic data into the multi-scale convolution network layer to obtain multi-dimensional text characteristic data;

and the text type unit is used for inputting the multi-dimensional text characteristic data into the full-connection output network to obtain a text type.

In one embodiment, the method further comprises the following steps:

the training word vector matrix module is used for respectively inputting training text data into the two word vector models to obtain two training word vector matrices;

the text characteristic training data module is used for respectively inputting the two training word vector matrixes into corresponding bidirectional long-short term memory artificial neural network models to obtain multi-channel text characteristic training data;

the prediction type module is used for inputting the multi-channel text characteristic training data into the multi-channel convolution network model to obtain a prediction type;

the loss function module is used for determining a loss function according to the prediction type and the corresponding actual type;

and the iteration module is used for determining the bidirectional long-short term memory artificial neural network model as a text information extraction model and determining the multi-channel convolution network model as a text classification model when the loss function is converged, and otherwise, updating the bidirectional long-short term memory artificial neural network model and the multi-channel convolution network model according to the loss function.

The embodiment of the invention also provides computer equipment which comprises a memory, a processor and a computer program stored on the memory and operated on the processor, wherein the processor realizes the steps of the text classification method based on the convolutional neural network multi-channel feature representation when executing the computer program.

The embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the steps of the text classification method based on the convolutional neural network multi-channel feature representation.

Embodiments of the present invention further provide a computer program product, which includes a computer program/instruction, and the computer program/instruction, when executed by a processor, implements the steps of the text classification method based on the convolutional neural network multi-channel feature representation.

The text classification method and device based on the convolutional neural network multichannel feature representation of the embodiment of the invention respectively input text data to be classified into two word vector models to obtain two word vector matrixes so as to respectively input the two word vector matrixes into corresponding text information extraction models to obtain multichannel text feature data, and then input the multichannel text feature data into the text classification model to obtain a text type, so that text representation can be enriched, and text classification accuracy is improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained based on these drawings without creative efforts.

FIG. 1 is a schematic diagram of a prior art text classification scheme;

FIG. 2 is a flowchart of a text classification method based on a convolutional neural network multi-channel feature representation in an embodiment of the present invention;

FIG. 3 is a schematic diagram of a text classification method based on a convolutional neural network multi-channel feature representation in an embodiment of the present invention;

fig. 4 is a flowchart of S102 in the embodiment of the present invention;

FIG. 5 is a flowchart of S201 in the embodiment of the present invention;

FIG. 6 is a flowchart of S202 in the embodiment of the present invention;

fig. 7 is a flowchart of S103 in the embodiment of the present invention;

FIG. 8 is a flow chart of creating a text information extraction model and a text classification model in an embodiment of the present invention;

FIG. 9 is a flow diagram of creating a convolutional neural network text classification model in an embodiment of the present invention;

FIG. 10 is a schematic illustration of a representation of a multi-channel text feature in an embodiment of the invention;

FIG. 11 is a schematic diagram of the layers of a multi-scale convolutional network in an embodiment of the present invention;

FIG. 12 is a block diagram of a text classification apparatus based on a convolutional neural network multi-channel feature representation according to an embodiment of the present invention;

fig. 13 is a block diagram of a computer device in the embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

As will be appreciated by one skilled in the art, embodiments of the present invention may be embodied as a system, apparatus, device, method, or computer program product. Accordingly, the present disclosure may be embodied in the form of: entirely hardware, entirely software (including firmware, resident software, micro-code, etc.), or a combination of hardware and software.

The key terms involved in the present invention are as follows:

convolutional Neural Network (CNN): one class of feed-forward neural networks, which includes convolution calculations and has a deep structure, is one of the representative algorithms for deep learning.

Long Short-Term Memory network (LSTM): a time-cycle neural network is specially designed for solving the long-term dependence problem of a general RNN (recurrent neural network), so that the RNN has a chain form of repeated neural network modules.

Word2Vec and Glove: a cluster of correlation models for generating word vectors, which are shallow, two-level neural networks, are trained to reconstruct linguistic word text.

Word vector: a collective term for a set of language modeling and feature learning techniques in natural language processing, where words or phrases from a vocabulary are mapped to vectors of real numbers. Conceptually, it involves mathematical embedding from a one-dimensional space of each word to a continuous vector space with lower dimensions.

In view of the fact that the prior art does not fully consider the potential semantic relationship of the time sequence features, so that the classification effect is not ideal, the embodiment of the invention provides a text classification method based on the convolutional neural network multichannel feature representation, and the text representation is enriched by using word vectors pre-trained by third parties with different sources: one is a Word2Vec model which better describes local information based on a prediction method, and the other is a Glove model which better utilizes global information based on a counting method, and Word vector representation obtained by fusing the two models can enrich text semantic information; and then performing forward and reverse time sequence information feature extraction on the text sequences represented by the word vectors from different sources through two bidirectional long-term and short-term memory networks. In the prior art, in order to obtain a text context dependency relationship, forward and reverse order output feature vectors at each moment are spliced to be used as a final output result at each moment; different from the prior art, the method vertically stacks the outputs at all the moments in sequence to form a multi-channel mode, and specifically realizes the mode that the output vectors at all the moments in the forward direction are stacked and the output vectors at all the moments in the reverse direction are stacked, so that two channels can be formed, four channels can be formed by two different word vector representation modes, and text semantic information is further enriched while text time sequence characteristics are kept; and then, performing multi-scale convolution operation on the text multi-channel feature representation, fully considering information before and after the current moment, excavating potential semantic relations among the time sequence features, obtaining rich text features, and sending the rich text features to a following full-connection layer and a Softmax classifier to improve the classification accuracy. The present invention will be described in detail below with reference to the accompanying drawings.

FIG. 2 is a flowchart of a text classification method based on a convolutional neural network multi-channel feature representation in an embodiment of the present invention. FIG. 3 is a schematic diagram of a text classification method based on a convolutional neural network multi-channel feature representation in an embodiment of the present invention. As shown in fig. 2-3, the text classification method based on the convolutional neural network multi-channel feature representation includes:

s101: and respectively inputting the text data to be classified into two word vector models to obtain two word vector matrixes.

In specific implementation, two trained word vector models are used for mapping the text data to be classified, each word is mapped into a d-dimensional vector, and each sample sentence is mapped into a vector matrix. Wherein W ₁ Represents Word2Vec trained Word vector model, W ₂ Representing the Glove trained word vector model, W ₁ ，W ₂ ∈R ^d*v D is the word vector dimension and | v | represents the dictionary (internet lexicon) size. The Word vector matrix output by the Word2Vec Word vector model is J ₁ ＝W ₁ V; the word vector matrix output by the Glove word vector model is J ₂ ＝W ₂ V，V∈R ^v*n N is the length of the text data to be classified.

S102: and respectively inputting the two word vector matrixes into corresponding text information extraction models to obtain multi-channel text characteristic data.

Fig. 8 is a flowchart of creating a text information extraction model and a text classification model in the embodiment of the present invention. FIG. 9 is a flow chart of creating a convolutional neural network text classification model in an embodiment of the present invention. As shown in fig. 8 to 9, before executing S102, the method further includes:

the following iterative process is performed:

s601: and respectively inputting the training text data into the two word vector models to obtain two training word vector matrixes.

In specific implementation, the training text data may include three long text data sets and three short text data sets, where the six data sets are standard english data sets, and the AG is a data set for classifying news article types; yelp _ P is a data set of user comment sentiment classification, which has only positive and negative categories; yelp _ F is also a user comment data set for emotion classification, which is a finer-grained emotion classification data set with a total of five categories; SST-2 is a film comment emotion classification data set which is divided into a negative category and a positive category; TREC is a data set relating to classification of problem types, specifically six categories of abbreviations, entities, people, places and numbers; MR is a photographic evaluation classification dataset, divided into two major categories, positive and negative. In order to reduce the interference of noise information, a series of prepositions which exist in a large amount in the text but have no actual meaning are removed firstly; secondly, deleting a series of special characters without functions by using a regular expression method; then, performing part-of-speech reduction on the abbreviated form words such as I'm and the like; finally, all characters are converted into a lower case form. The original text length after preprocessing operation is not uniform, in order to optimize the model training condition, the text length is set as a default value and is adjusted through a hyper-parameter max _ length, the part of the sample sentence with the length lower than the default value is filled with a specific word "< PDA >" to the uniform length, and the redundant part of the sample sentence with the length higher than the default value is intercepted.

S602: and respectively inputting the two training word vector matrixes into corresponding bidirectional long-short term memory artificial neural network models to obtain multi-channel text characteristic training data.

The core idea of the long-short term memory network (LSTM) is to forget history information and filter input information by introducing an adaptive "gate" mechanism, and three gate structures are called an input gate (input gate), a forgetting gate (forget gate), and an output gate (output gate), respectively, and relate to the following formulas:

wherein i _t 、f _t 、o _t Input gate representing time t and time tThe forgetting gate and the output gate at time t in the above-mentioned embodiment, the forgetting gate controls the forgetting degree at the previous time (time t-1) and the input gate controls the candidate memory information at time t

The degree of writing the long-term memory, the output gate controls how the short-term memory is affected by the long-term memory. Aiming at the network falling of the layer, the input at the time t is a training word vector matrix x _t The hidden layer state vector at time t-1 is h _t-1 . Calculating forgetting door f _t Selecting information to be forgotten at the time t: f. of _t ＝σ(W _f ·[h _t-1 ,c _t ]+b _f ) (ii) a Calculation memory gate i _t Selecting the information i to be memorized at time t _t ＝σ(W _i ·[h _t-1 ,c _t ]+b _i ) And candidate memory information at time t:

then passing the cell state c at the previous time (time t-1) _t-1 And information to be forgotten at time t f _t Information i to be memorized at time t _t Candidate memory information of sum time t

Updating the cell status at time t

Output gate o for calculating time t _t ＝σ(W _o ·[h _t-1 ,c _t ]+b _o ) And hidden layer state h at time t _t ＝o _t *tanh(C _t )。i _t 、f _t And

are all hidden layers h from the previous moment _t-1 And training word vector matrix x at time t _t A function of the correlation. And i is _t 、f _t Is a function of sigmoid and has a value range of [0, 1 ]]；

Is a function of tanh and has a value range of [ -1, 1 [ ]]，W _i To input gate weights, b _i For input gate biasing, W _f To forget the door weight, b _f To forget the door bias, W _o To output the gate weights, b _o For output gate offset, W _C To memorize the weight, b _C To memorize the bias. An output y, y ═ h is obtained at each time point via a forward LSTM network _t . The internal core structure of the bidirectional long-short term memory neural network is the same as that of the LSTM, and only the text context information is considered from the positive direction and the negative direction.

S603: and inputting the multi-channel text characteristic training data into a multi-channel convolution network model to obtain a prediction type.

S604: and determining a loss function according to the prediction type and the corresponding actual type.

During specific implementation, a dropout mechanism can be adopted to prevent the algorithm from generating an overfitting phenomenon, cross entropy is adopted as a target function in an experiment, a network model is optimized by minimizing a cross entropy loss function, a gradient descent algorithm is used for parameter optimization, and L2 regularization is introduced into the target function to enable the fitting effect of the algorithm to be better. The cross entropy loss function is as follows:

wherein loss is a loss function, D is the size of the training text data, C is the number of classes of the sample, y is the probability that the p-th training text data predicts the class as q,

for the probability that the actual class of the pth training text data is q, lambda | theta | survival ² Is a regular term, lambda is a penalty term, and theta is a parameter.

S605: and judging whether the loss function is converged.

S606: and when the loss function is converged, determining the bidirectional long-short term memory artificial neural network model as a text information extraction model, and determining the multi-channel convolution network model as a text classification model.

The convolutional neural network text classification model is an MC-CNN model and consists of a text information extraction model and a text classification model.

S607: and when the loss function is not converged, updating the bidirectional long-short term memory artificial neural network model and the multi-channel convolution network model according to the loss function.

Fig. 4 is a flowchart of S102 in the embodiment of the present invention. FIG. 10 is a schematic diagram of a representation of a multi-channel text feature in an embodiment of the invention. As shown in fig. 4 and fig. 10, after passing through the bidirectional long-short term memory network, the positive and negative sequence output results at one time are obtained, and the feature vectors at the one time are vertically stacked in sequence to form a multi-channel feature representation. S102 includes:

s201: and inputting the first word vector matrix into a first text information extraction model to obtain first forward-sequence text characteristic data and first reverse-sequence text characteristic data.

Fig. 5 is a flowchart of S201 in the embodiment of the present invention. As shown in fig. 5, S201 includes:

s301: and inputting the first word vector matrix into a first text information extraction model to respectively obtain a first forward-sequence text feature vector and a first reverse-sequence text feature vector.

S302: and vertically stacking the first forward-sequence text feature vectors in sequence to obtain first forward-sequence text feature data.

In specific implementation, the first forward text feature data can be obtained through the following formula:

wherein,

is the first positive-order text feature data,

is the first positive of the first timeThe feature vectors of the sequence text are,

the first positive-sequence text feature vector at the second time instant,

is the first positive-sequence text feature vector at the ith time.

S303: and vertically stacking the first reverse-order text feature vectors in sequence to obtain first reverse-order text feature data.

In specific implementation, the first reverse-order text feature data can be obtained through the following formula:

wherein,

for the first reverse-order text feature data,

a first reverse-order text feature vector at a first time instant,

is the first reverse-order text feature vector at the second time instant,

is the first reverse order text feature vector at the ith time.

S202: and inputting the second word vector matrix into a second text information extraction model to obtain second forward-sequence text characteristic data and second reverse-sequence text characteristic data.

Fig. 6 is a flowchart of S202 in the embodiment of the present invention. As shown in fig. 6, S202 includes:

s401: and inputting the second word vector matrix into a second text information extraction model to respectively obtain a second forward-sequence text feature vector and a second reverse-sequence text feature vector.

S402: and vertically stacking the second forward-sequence text feature vectors in sequence to obtain second forward-sequence text feature data.

In specific implementation, the second forward text feature data can be obtained through the following formula:

wherein,

is the second positive-order text characteristic data,

is the second positive-sequence text feature vector at the first time instant,

is the second positive-sequence text feature vector at the second time instant,

is the second positive sequence text feature vector at the ith time.

S403: and vertically stacking the second reverse-order text feature vectors in sequence to obtain second reverse-order text feature data.

In specific implementation, the second reverse-order text feature data can be obtained through the following formula:

wherein,

for the second reverse-order text feature data,

a second reverse-order text of the first timeThe feature vector is a vector of the feature vector,

a second inverted text feature vector at a second time instant,

is the second reverse order text feature vector at the ith time.

S203: and obtaining multi-channel feature representation data according to the first forward-sequence text feature data, the first reverse-sequence text feature data, the second forward-sequence text feature data and the second reverse-sequence text feature data.

The first forward-sequence text characteristic data, the first reverse-sequence text characteristic data, the second forward-sequence text characteristic data and the second reverse-sequence text characteristic data are characteristic matrixes formed by vertically stacking in sequence, and each matrix represents one characteristic channel so as to form multi-channel characteristic representation of the text. The text context dependency relationship is obtained, and meanwhile, the purpose of reserving the potential semantic information of the time sequence feature is achieved.

S103: and inputting the multi-channel text characteristic data into a text classification model to obtain a text type.

In specific implementation, S103 is based on multi-channel feature representation of the text, enhancement is performed by using multi-scale convolution operation, and potential semantic information of the text time sequence feature is further mined by considering the capability of context information of the text at the current moment.

Fig. 7 is a flowchart of S103 in the embodiment of the present invention. FIG. 11 is a diagram of a multi-scale convolutional network layer in an embodiment of the present invention. As shown in fig. 7 and 11, S103 includes:

s501: and inputting the multi-channel text characteristic data into a multi-scale convolution network layer to obtain multi-dimensional text characteristic data.

Under the action of convolution kernels with different window sizes, the dimensionality of feature vectors output after convolution operation is different. In specific implementation, the multidimensional text characteristic data is obtained through the following formula:

wherein, a _i For the ith word feature, W is the weight of the multi-scale convolutional network layer, b is the bias of the multi-scale convolutional network layer, f represents the nonlinear activation function, H _i:l And forming a matrix from the ith word feature vector to the ith word feature vector in the multi-channel text feature data. Therefore, after convolution processing is performed on the sequence with the length of l, the following word feature matrix is obtained:

A＝[a ₁ ,a ₂ ,…,a _l ]；

therefore, K word feature vectors [ A ] with different dimensions can be obtained after convolution kernel processing of K different window sizes ₁ ,A ₂ ,…,A _K ]Extracting the most important characteristics corresponding to each convolution kernel with different sizes through maximum pooling operation, namely:

finally, obtaining K-dimensional text characteristic data after multi-scale convolution network layer processing:

wherein Y is K-dimensional text characteristic data,

is a word feature vector which is subjected to maximum pooling in one dimension,

is a two-dimensional maximally pooled word feature vector,

and the word feature vector subjected to the maximum pooling in the K dimension is obtained.

S502: and inputting the multidimensional text characteristic data into a full-connection output network to obtain the text type.

In specific implementation, the multi-dimensional text feature data is classified by softmax after passing through a three-layer full-connection network to obtain the text type.

The implementation subject of the text classification method based on the convolutional neural network multi-channel feature representation shown in fig. 2 can be a computer. As can be seen from the process shown in fig. 2, in the text classification method based on the convolutional neural network multi-channel feature representation according to the embodiment of the present invention, text data to be classified are respectively input into two word vector models to obtain two word vector matrices, which are respectively input into corresponding text information extraction models to obtain multi-channel text feature data, and then the multi-channel text feature data are input into a text classification model to obtain a text type, so that text representation can be enriched, and text classification accuracy can be improved.

The specific process of the embodiment of the invention is as follows:

1. and respectively inputting the training text data into the two word vector models to obtain two training word vector matrixes.

2. And inputting the two training word vector matrixes into corresponding bidirectional long-short term memory artificial neural network models respectively to obtain multi-channel text characteristic training data.

3. And inputting the multi-channel text characteristic training data into a multi-channel convolution network model to obtain a prediction type.

4. And determining a loss function according to the prediction type and the corresponding actual type.

5. And when the loss function is converged, determining the bidirectional long-short term memory artificial neural network model as a text information extraction model, determining the multi-channel convolution network model as the text classification model, otherwise, updating the bidirectional long-short term memory artificial neural network model and the multi-channel convolution network model according to the loss function, and returning to the step 1.

6. And respectively inputting the text data to be classified into two word vector models to obtain two word vector matrixes.

7. And inputting the first word vector matrix into a first text information extraction model to respectively obtain a first forward-sequence text feature vector and a first reverse-sequence text feature vector.

8. And vertically stacking the first forward-sequence text feature vectors in sequence to obtain first forward-sequence text feature data, and vertically stacking the first reverse-sequence text feature vectors in sequence to obtain first reverse-sequence text feature data.

9. And inputting the second word vector matrix into a second text information extraction model to respectively obtain a second forward-sequence text feature vector and a second reverse-sequence text feature vector.

10. And vertically stacking the second forward-sequence text feature vectors in sequence to obtain second forward-sequence text feature data, and vertically stacking the second reverse-sequence text feature vectors in sequence to obtain second reverse-sequence text feature data.

11. And obtaining multi-channel feature representation data according to the first forward-sequence text feature data, the first reverse-sequence text feature data, the second forward-sequence text feature data and the second reverse-sequence text feature data.

12. And inputting the multi-channel text characteristic data into a multi-scale convolution network layer to obtain multi-dimensional text characteristic data.

13. And inputting the multi-dimensional text characteristic data into a full-connection output network to obtain the text type.

TABLE 1

Table 1 is a comparison table of the classification results of the MC-CNN model adopted by the invention and other models. As shown in table 1, the DSCNN model first performs feature extraction on the timing information through LSTM, and then extracts spatial features by using CNN, so as to solve the disadvantage that LSTM cannot consider future time information; the VeryDeep-CNN model uses a 29-layer CNN network to acquire the long-distance dependency of the text; the textCNN model firstly solves the natural language processing task of the convolutional neural network, and only uses the convolutional layer to extract the text information characteristics so as to complete the text classification task; the fastText model is a simple and efficient linear classification model, can realize mass text classification in a short time, supports hundreds of millions of data volumes, performs average pooling processing on word vectors of all words to perform vectorization representation on the text, and further completes classification tasks; the WE model is similar to the fastText model, except that the average pooling is simply replaced by the maximum pooling; the MCTC-dense model has the core idea that convolution kernels with different sizes are adopted to extract information of different scales of texts in different time dimensions, and meanwhile, the extracted features of the convolution kernels with different sizes can also be used for solving the dependency relationship among long sequences.

In table 1, MR is a film review classification dataset divided into two major categories, positive and negative, SST-2 is a film review emotion classification dataset divided into two major categories, negative and positive, and TREC is a data set related to problem type classification, specifically including six major categories of abbreviations, descriptions, entities, persons, locations and numbers. AG is a data set for classifying news article types, Yelp _ F is a user comment data set also used for sentiment classification, which is a finer-grained sentiment classification data set having five categories in total, and Yelp _ P is a data set for user comment sentiment classification, which has only two categories, positive and negative. By comparing the model results with the model results, particularly on a long text data set, the method achieves the best effect, so that the method can be shown to excavate the potential semantic information of the text sequence while obtaining the long-time dependency relationship of the text, and provides a good solution for the text classification task.

TABLE 2

Table 2 is a comparison table of the classification results of the MC-CNN model and the self-variant model adopted by the invention. As shown in Table 2, MC-CNN-w2v is a model that classifies using only Word2vec pre-training Word vectors as input; MC-CNN-glo is a model that classifies only by using Glove pre-training word vectors; SC-CNN is a model that is finally fused into one channel by assigning different weights to the four channels.

From the final result, the MC-CNN model adopted by the invention has better results on the six data sets than the self variant model, although the improvement effect is not obvious, the advantages of the multi-channel feature representation provided by the invention can be embodied on the whole, when convolution processing is carried out after the algorithm, the algorithm can simultaneously consider the information before and after the current moment, and simultaneously can also mine the potential semantic information of the text sequence, so the invention still obtains an optimal performance.

In summary, the present invention enriches the text representation by using the pre-trained word vectors from different sources from the third party, so as to enrich the text semantic information, and simultaneously, two different bidirectional long-short term memory networks are used to extract the time sequence characteristics in the positive and negative order of the text sequence represented by the two word vectors, so as to obtain the text context information at the current moment. Different from the prior art that the positive and negative sequence outputs are spliced to be used as the final result, the method has the advantages that the output at each moment is vertically stacked in sequence to form a multi-channel mode, the multi-scale convolution operation is utilized after the multi-channel characteristic representation, the potential semantic relation between the time sequence characteristics is further excavated while the information before and after the current moment is fully considered, the problems that the semantic information is not rich in the text representation and the information before and after the current moment and the potential semantic information of the time sequence characteristics cannot be fully considered are solved, and a good solution is provided for the text classification task.

Based on the same inventive concept, the embodiment of the invention also provides a text classification device based on the convolutional neural network multi-channel feature representation, and as the problem solving principle of the device is similar to the text classification method based on the convolutional neural network multi-channel feature representation, the implementation of the device can refer to the implementation of the method, and repeated parts are not repeated.

Fig. 12 is a block diagram of a text classification device based on a convolutional neural network multi-channel feature representation in an embodiment of the present invention. As shown in fig. 12, the text classification apparatus based on the convolutional neural network multi-channel feature representation includes:

In one embodiment, the multi-channel text feature data module comprises:

the second text characteristic data unit is used for inputting the second word vector matrix into the second text information extraction model to obtain second forward-sequence text characteristic data and second reverse-sequence text characteristic data;

In one embodiment, the first text feature data unit includes:

and the first reverse-order text feature data subunit is used for vertically stacking the first reverse-order text feature vectors in sequence to obtain first reverse-order text feature data.

In one embodiment, the second text feature data unit includes:

In one embodiment, the text type module comprises:

In one embodiment, the method further comprises the following steps:

In summary, the text classification device based on convolutional neural network multichannel feature representation according to the embodiment of the present invention inputs text data to be classified into two word vector models to obtain two word vector matrices, and inputs the two word vector matrices into corresponding text information extraction models to obtain multichannel text feature data, and then inputs the multichannel text feature data into a text classification model to obtain a text type, so that text representation can be enriched, and text classification accuracy can be improved.

The embodiment of the invention also provides a specific implementation mode of computer equipment capable of realizing all the steps in the text classification method based on the convolutional neural network multi-channel feature representation in the embodiment. Fig. 13 is a block diagram of a computer device in an embodiment of the present invention, and referring to fig. 13, the computer device specifically includes the following contents:

a processor (processor)1301 and a memory (memory) 1302.

The processor 1301 is configured to invoke a computer program in the memory 1302, and the processor implements all the steps in the text classification method based on the convolutional neural network multi-channel feature representation in the above embodiments when executing the computer program, for example, the processor implements the following steps when executing the computer program:

To sum up, the computer device of the embodiment of the present invention inputs the text data to be classified into two word vector models respectively to obtain two word vector matrices, and inputs the two word vector matrices into corresponding text information extraction models respectively to obtain multi-channel text feature data, and then inputs the multi-channel text feature data into a text classification model to obtain a text type, which can enrich text representation and improve text classification accuracy.

An embodiment of the present invention further provides a computer-readable storage medium capable of implementing all the steps in the text classification method based on convolutional neural network multi-channel feature representation in the foregoing embodiments, where the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, the computer program implements all the steps of the text classification method based on convolutional neural network multi-channel feature representation in the foregoing embodiments, for example, when the processor executes the computer program, the processor implements the following steps:

To sum up, the computer-readable storage medium according to the embodiment of the present invention respectively inputs text data to be classified into two word vector models to obtain two word vector matrices, respectively inputs the two word vector matrices into corresponding text information extraction models to obtain multi-channel text feature data, and then inputs the multi-channel text feature data into a text classification model to obtain a text type, so that text representation can be enriched, and text classification accuracy can be improved.

An embodiment of the present invention further provides a computer program product capable of implementing all the steps in the text classification method based on the convolutional neural network multi-channel feature representation in the above embodiment, where the computer program product includes a computer program/instruction, and when the computer program/instruction is executed by a processor, the computer program/instruction implements all the steps of the text classification method based on the convolutional neural network multi-channel feature representation in the above embodiment, for example, when the processor executes the computer program, the processor implements the following steps:

inputting the two word vector matrixes into corresponding text information extraction models respectively to obtain multichannel text characteristic data;

To sum up, the computer program product of the embodiment of the present invention inputs the text data to be classified into two word vector models respectively to obtain two word vector matrices, and inputs the two word vector matrices into corresponding text information extraction models respectively to obtain multi-channel text feature data, and then inputs the multi-channel text feature data into a text classification model to obtain a text type, which can enrich text representation and improve text classification accuracy.

The above-mentioned embodiments are intended to illustrate the objects, technical solutions and advantages of the present invention in further detail, and it should be understood that the above-mentioned embodiments are only exemplary embodiments of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Those of skill in the art will further appreciate that the various illustrative logical blocks, units, and steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both. To clearly illustrate the interchangeability of hardware and software, various illustrative components, elements, and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as hardware or software depends upon the particular application and design requirements of the overall system. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present embodiments.

The various illustrative logical blocks, or elements, or devices described in connection with the embodiments disclosed herein may be implemented or performed with a general purpose processor, a digital signal processor, an Application Specific Integrated Circuit (ASIC), a field programmable gate array or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or any combination thereof designed to perform the functions described herein. A general-purpose processor may be a microprocessor, but in the alternative, the processor may be any conventional processor, controller, microcontroller, or state machine. A processor may also be implemented as a combination of computing devices, e.g., a digital signal processor and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a digital signal processor core, or any other similar configuration.

The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may be stored in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art. For example, a storage medium may be coupled to the processor such the processor can read information from, and write information to, the storage medium. In the alternative, the storage medium may be integral to the processor. The processor and the storage medium may reside in an ASIC, which may be located in a user terminal. In the alternative, the processor and the storage medium may reside as discrete components in a user terminal.

In one or more exemplary designs, the functions described above in connection with the embodiments of the invention may be implemented in hardware, software, firmware, or any combination of the three. If implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium. Computer-readable media includes both computer storage media and communication media that facilitate transfer of a computer program from one place to another. Storage media may be any available media that can be accessed by a general purpose or special purpose computer. For example, such computer-readable media can comprise, but is not limited to, RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium that can be used to carry or store program code in the form of instructions or data structures and that can be read by a general-purpose or special-purpose computer, or a general-purpose or special-purpose processor. Additionally, any connection is properly termed a computer-readable medium, and, thus, is included if the software is transmitted from a website, server, or other remote source via a coaxial cable, fiber optic cable, twisted pair, Digital Subscriber Line (DSL), or wirelessly, e.g., infrared, radio, and microwave. Such discs (disk) and disks (disc) include compact disks, laser disks, optical disks, DVDs, floppy disks and blu-ray disks where disks usually reproduce data magnetically, while disks usually reproduce data optically with lasers. Combinations of the above may also be included in the computer-readable medium.

Claims

1. A text classification method based on convolutional neural network multi-channel feature representation is characterized by comprising the following steps:

2. The text classification method based on the convolutional neural network multichannel feature representation according to claim 1, wherein the step of inputting the two word vector matrices into corresponding text information extraction models respectively to obtain multichannel feature representation data comprises:

3. The text classification method based on the convolutional neural network multichannel feature representation as claimed in claim 2, wherein inputting the first word vector matrix into the first text information extraction model to obtain the first forward-order text feature data and the first reverse-order text feature data comprises:

inputting the first word vector matrix into the first text information extraction model to respectively obtain a first forward-sequence text feature vector and a first reverse-sequence text feature vector;

vertically stacking the first positive-sequence text feature vectors in sequence to obtain first positive-sequence text feature data;

and vertically stacking the first reverse-order text feature vectors in sequence to obtain the first reverse-order text feature data.

4. The text classification method based on the convolutional neural network multichannel feature representation as claimed in claim 2, wherein inputting the second word vector matrix into the second text information extraction model to obtain the second forward-order text feature data and the second reverse-order text feature data comprises:

inputting the second word vector matrix into the second text information extraction model to respectively obtain a second forward-sequence text feature vector and a second reverse-sequence text feature vector;

5. The text classification method based on the convolutional neural network multichannel feature representation as claimed in claim 1, wherein the step of inputting the multichannel text feature data into a text classification model to obtain a text type comprises:

and inputting the multi-dimensional text characteristic data into a full-connection output network to obtain the text type.

6. The method of claim 1, further comprising:

the following iterative process is performed:

inputting the multi-channel text characteristic training data into a multi-channel convolution network model to obtain a prediction type;

and when the loss function is converged, determining the bidirectional long-short term memory artificial neural network model as the text information extraction model, determining the multi-channel convolution network model as the text classification model, and otherwise, updating the bidirectional long-short term memory artificial neural network model and the multi-channel convolution network model according to the loss function.

7. A text classification device based on convolutional neural network multichannel feature representation is characterized by comprising the following components:

and the text type module is used for inputting the multi-channel text characteristic data into a text classification model to obtain a text type.

8. A computer device comprising a memory, a processor and a computer program stored on the memory and running on the processor, characterized in that the processor implements the steps of the method for text classification based on a convolutional neural network multi-channel feature representation as claimed in any one of claims 1 to 6 when executing the computer program.

9. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method for text classification based on a convolutional neural network multi-channel feature representation as claimed in any one of claims 1 to 6.

10. A computer program product comprising computer program/instructions, characterized in that said computer program/instructions, when executed by a processor, implement the steps of the method for text classification based on a convolutional neural network multi-channel feature representation as claimed in any one of claims 1 to 6.