CN111078833B

CN111078833B - Text classification method based on neural network

Info

Publication number: CN111078833B
Application number: CN201911223541.XA
Authority: CN
Inventors: 黄少滨; 吴汉瑜; 李熔盛; 申林山; 姜梦奇; 范贺添; 谷虹润
Original assignee: Harbin Engineering University
Current assignee: Harbin Engineering University
Priority date: 2019-12-03
Filing date: 2019-12-03
Publication date: 2022-05-20
Anticipated expiration: 2039-12-03
Also published as: CN111078833A

Abstract

The invention belongs to the technical field of text classification, and particularly relates to a text classification method based on a neural network. The invention can extract semantic information and structural information of different levels of the text, including word-level semantic information, word-level structural information, phrase-level semantic information and phrase-level structural information. In order to obtain the final representation of the text, the invention further provides two fusion methods to fuse four kinds of information, namely static fusion and dynamic fusion based on attention mechanism. The invention is based on the neural network, comprehensively utilizes the semantic information and the structural information of different levels of the text, and improves the accuracy of text classification.

Description

Text classification method based on neural network

Technical Field

The invention belongs to the technical field of text classification, and particularly relates to a text classification method based on a neural network.

Background

Text classification is an important component of many natural language processing tasks and can be applied to sentiment classification, question classification, and web page retrieval, and text representation plays an important role in text classification. Early text classification techniques were mostly based on traditional machine learning algorithms, such as naive bayes, support vector machines, etc. This method often requires domain experts to manually design and extract features in the text, which is time-consuming and labor-intensive. In recent years, models of neural networks based on deep learning have demonstrated powerful performance in many tasks in the field of natural language processing, such as machine translation, emotion analysis, text classification. Most neural network models are based on CNN, RNN or attention mechanisms.

The Convolutional Neural Network (CNN) can be used for modeling texts, ngram information of the texts can be extracted through sliding windows, most discriminative words or phrases in the texts can be selected through a maximum pooling technology, however, how to select the size of the window is an important problem, structural information is lost when the window is too small, parameters are too many when the window is too large, and troubles are brought to training.

Recursive neural networks (recursivenn) model text by tree structures, can effectively capture structural information of text, and have proven effective in constructing text representations. However, the performance of the recurrent neural network depends to a large extent on the performance of the construction of the text tree, and it is very time-consuming to construct the text tree, and the relationship of sentences in the text is difficult to model by the tree structure, so it also cannot make good use of semantic information and structural information.

Unlike Recurrent neural networks, Recurrent neural networks (recurrentnn) are a sequential model that inherently fits the modeling of text, which captures the structural information of the text, but which is a biased model in which later words are more dominant than earlier words in the text.

Attention (Attention) mechanisms have been applied to many natural language processing tasks with great success and have proven effective in capturing text semantics. The method can learn the contribution ratio of each part of information in the text to the whole semantic information of the text through a small number of parameters, important words or phrases are assigned with higher weights, but word sequence information is ignored, and therefore structural information of the text cannot be well utilized.

In recent years, models of neural networks based on deep learning have demonstrated powerful performance in many tasks in the field of natural language processing, such as machine translation, emotion analysis, text classification. Most neural network models are based on Convolutional Neural Networks (CNN), Recurrent Neural Networks (RNN), or Attention (Attention) mechanisms.

CNN-based models

Convolutional Neural Networks (CNNs) were introduced by some researchers from the computer vision field to the natural language processing field with great success. Kim proposes to extract the features of the text by using a plurality of convolution kernels with different sizes to classify sentences, and Kalchbrenner et al combines a dynamic k-max pooling mechanism with CNN to achieve good effect in sentence modeling. Zhang et al proposed a character-level convolutional neural network model for use in text classification. Because shallow CNN does not handle long-range dependencies in sentences well, some deep CNN models have been proposed, such as Very Deep CNN (VDCNN) by Conneau et al and deep pyramid CNN by Johnson et al.

RNN-based models

Recurrent neural network (Recurrent NN) is a sequence model, widely used in the field of natural language processing. Tang et al use gated recurrent neural networks for emotion classification. Some researchers have attempted to modify the structure of RNN, Wang proposed using Disconnected RNN for text classification, and Yu et al similarly proposed modeling sentences with Sliced RNN and achieved good results.

Model based on attention mechanism

Bahdana et al first applied the attention mechanism in machine translation. Yang et al use a hierarchical attention network and bidirectional GRUs to model and classify documents. Vaswani et al proposed a Transformer, a model based entirely on self attention mechanism, with significant success in machine translation. Lin et al propose a structured self-attention sentence embedding.

Text classification is the basis for many natural language processing tasks, and text representation is the key to text classification. The text representation can be understood as a high-level feature of the text, and the performance of text classification is directly influenced by the quality of the text representation. The traditional text representation method cannot well represent the text, such as a bag-of-words model, which represents each word as a high-dimensional sparse vector, but ignores the sequence information of the words in the text and the semantic information of the words. In recent years, with the development of deep learning, most of the current good text classification models are based on neural networks, represent texts into low-dimensional real-valued vectors, and then feed the vectors into a softmax function to predict the probability of each category, but the text classification models cannot make good use of semantic information and structural information of the texts.

Disclosure of Invention

The invention aims to provide a text classification method based on a neural network, aiming at the problem that the traditional neural network model cannot effectively utilize semantic information and structural information of a text.

The purpose of the invention is realized by the following technical scheme: the method comprises the following steps:

step 1: inputting a text to be classified, preprocessing the text to obtain a word direction corresponding to each word in the textQuantity x_i；

Step 2: according to the corresponding word vector x of each word_iActing directly on the word vector x using the attention-machine mechanism_iTo obtain word level semantic information I_wse(ii) a Direct action on word vector x using a bidirectional LSTM network_iObtaining word level structure information I_wst；

And step 3: acting on word vectors x using convolutional neural networks_iObtaining phrase information D;

and 4, step 4: obtaining phrase-level semantic information I using an attention mechanism on phrase information D_pse(ii) a Acting on phrase information D by using bidirectional LSTM network to obtain phrase level structure information I_pst；

And 5: fused word-level semantic information I_wseWord level structural information I_wstSemantic information at the phrase level I_pseAnd phrase level structural information I_pstTo obtain the final text vector representation I_T；

Step 6: vector representation I of the final text_TInputting the probability into a softmax classifier to obtain the probability corresponding to each class: the category with the highest probability is taken as the category to which the text belongs;

p＝softmax(W_cI_T+b_c)

wherein W_cIs the weight of the softmax classifier, b_cIs the corresponding offset.

The present invention may further comprise: the step 1 of preprocessing the text specifically comprises the following steps:

step 1.1: detecting the length of the input text; if the length of the input text is greater than the specified length, the text is cut off; if the length of the input text is smaller than the specified length, filling the text;

step 1.2: performing word segmentation on the text, indexing words according to word frequency, and converting the text into a corresponding index sequence;

step 1.3: and converting each index in the index sequence into a word vector of a word corresponding to the index sequence to finish the preprocessing of the text.

SaidObtaining word level semantic information I in step 2_wseThe method comprises the following steps: let the input sentence be w₁，w₂，w₃，...，w_sThe corresponding word vectors are x respectively₁，x₂，x₃，...，x_s(ii) a Since each word in a sentence contributes differently to the overall semantic information of the sentence, the attention mechanism is used to directly act on the word vector to learn the proportion alpha of each word contributing to the word-level semantic information_i(ii) a Vector the word x of each word_iMultiplying the contribution proportion alpha with the corresponding contribution proportion alpha and accumulating to obtain word-level semantic information I_wse；

Wherein the content of the first and second substances,

is the word w_iD is the dimension of the vector;

u_i＝tanh(W_wx_i+b_w)

wherein, tanh is an activation function,

is u_iTranspose of (W)_w，b_w，u_wIs a parameter of the attention mechanism;

obtaining word level structure information I in step 2_wstThe method comprises the following steps: the word level structure information I_wstIs the final state of forward LSTM

And the final state of the inverse LSTM

Are connected to form the product;

the semantic information I of fused word level in the step 5_wseWord level structural information I_wstSemantic information at the phrase level I_pseAnd phrase level structural information I_pstTo obtain the final text vector representation I_TThe method comprises the following steps: static fusion is adopted, namely the text representation is weighted average of word-level semantic information, word-level structure information, phrase-level semantic information and phrase-level structure information;

I_T＝(I_wse+I_wst+I_pse+I_pst)/4。

the semantic information I of fused word level in the step 5_wseWord level structural information I_wstSemantic information at the phrase level I_pseAnd phrase level structural information I_pstTo obtain the final text vector representation I_TThe method comprises the following steps: the attention mechanism is applied to four different information to automatically learn the vector representation I of each part of information to the final text by adopting dynamic fusion based on the attention mechanism_TIs given here as I_wse，I_wst，I_pse，I_pstAre respectively I₁，I₂，I₃，I₄；

u_i＝tanh(W_tI_i+b_t)

Wherein, tanh is an activation function,

is u_iTranspose of (W)_t，b_t，u_tIs a parameter of the attention mechanism.

The invention has the beneficial effects that:

the invention provides a text classification method based on a neural network, which aims to solve the problem that the traditional text classification method cannot simultaneously and effectively utilize semantic information and structural information of a text. In order to obtain the final representation of the text, the invention further provides two fusion methods to fuse four kinds of information, namely static fusion and dynamic fusion based on an attention mechanism. The invention is based on the neural network, comprehensively utilizes the semantic information and the structural information of different levels of the text, and improves the accuracy of text classification.

Drawings

Fig. 1 is an overall architecture diagram of the present invention.

FIG. 2 is a schematic diagram of the static fusion of the present invention.

FIG. 3 is a schematic diagram of dynamic fusion according to the present invention.

FIG. 4 is a diagram of obtaining word-level semantic information I using an attention mechanism_wseVisualization experiment result chart of (1).

FIG. 5 is a diagram of obtaining phrase-level semantic information I using an attention mechanism_pseVisualization experiment result chart of (1).

Fig. 6 is an overall flow chart of the present invention.

FIG. 7 is a table of experimental data in an example of the present invention.

FIG. 8 is a sample analysis table according to an embodiment of the present invention.

Detailed Description

The invention is further described below with reference to the accompanying drawings.

Text classification is the basis for many natural language processing tasks, and text representation is the key to text classification. The text representation can be understood as a high-level feature of the text, and the performance of text classification is directly influenced by the quality of the text representation. The traditional text representation method cannot well represent the text, such as a bag-of-words model, which represents each word as a high-dimensional sparse vector, but ignores the sequence information of the words in the text and the semantic information of the words. In recent years, with the development of deep learning, most of the current good text classification models are based on neural networks, represent texts into low-dimensional real-valued vectors, and then feed the vectors into a softmax function to predict the probability of each category, but the text classification models cannot make good use of semantic information and structural information of the texts. The model provided by the invention is also based on the neural network, but can comprehensively utilize semantic information and structural information of different levels of the text, thereby improving the accuracy of text classification.

Aiming at the fact that the semantic information and the structural information of a text cannot be effectively utilized by a traditional neural network model, the invention aims to design a novel text classification model based on a neural network, the model can extract the semantic information and the structural information of the text at different levels, including word level semantic information, word level structural information, phrase level semantic information and phrase level structural information, then the four parts of information are fused by using the fusion method provided by the invention to form the representation of the text, and finally the representation of the text is input into a softmax function for classification.

A text classification method based on a neural network comprises the following steps:

step 1: inputting a text to be classified, preprocessing the text to obtain a word vector x corresponding to each word in the text_i；

p＝softmax(W_cI_T+b_c)

The preprocessing of the text in the step 1 specifically comprises the following steps:

Obtaining word level semantic information I in step 2_wseThe method comprises the following steps: let the input sentence be w₁，w₂，w₃，...，w_sThe corresponding word vectors are x respectively₁，x₂，x₃，...，x_s(ii) a Since each word in a sentence contributes differently to the overall semantic information of the sentence, the attention mechanism is used to directly act on the word vector to learn the proportion alpha of each word contributing to the word-level semantic information_i(ii) a Vector the word x of each word_iMultiplying the contribution proportion alpha with the corresponding contribution proportion alpha and accumulating to obtain word-level semantic information I_wse；

Wherein the content of the first and second substances,

is the word w_iD is the dimension of the vector;

u_i＝tanh(W_wx_i+b_w)

wherein, tanh is an activation function,

is u_iTranspose of (W)_w，b_w，u_wIs a parameter of the attention mechanism;

And the final state of the inverse LSTM

Are connected to form the product;

fusing word-level semantic information I in step 5_wseWord level structural information I_wstSemantic information at the phrase level I_pseAnd phrase level structural information I_pstTo obtain the final text vector representation I_TThe method comprises the following steps: static fusion is adopted, namely text expression is weighted average of word-level semantic information, word-level structure information, phrase-level semantic information and phrase-level structure information;

I_T＝(I_wse+I_wst+I_pse+I_pst)/4。

fusing word-level semantic information I in step 5_wseWord level structural information I_wstSemantic information at the phrase level I_pseAnd phrase level structural information I_pstTo obtain the final text vector representation I_TThe method comprises the following steps: the attention mechanism is applied to four different information to automatically learn the vector representation I of each part of information to the final text by adopting dynamic fusion based on the attention mechanism_TIs given here as I_wse，I_wst，I_pse，I_pstAre respectively I₁，I₂，I₃，I₄；

u_i＝tanh(W_tI_i+b_t)

Wherein, tanh is an activation function,

is u_iTranspose of (W)_t，b_t，u_tIs a parameter of the attention mechanism.

The invention can be summarized as follows:

1) and preprocessing the text corpus and acquiring word-level semantic information and word-level structure information.

2) And obtaining phrase level semantic information and phrase level structure information.

3) And fusing the word level semantic information, the word level structure information, the phrase level semantic information and the phrase level structure information to obtain the vector representation of the final text for text classification.

For the acquisition of word-level semantic information, the method directly acts on input word vectors by using an attention mechanism to obtain the contribution ratio of each word to the word-level semantic information, and then multiplies and accumulates the contribution ratio and the corresponding word vectors to obtain the word-level semantic information; for the acquisition of word-level structure information, the invention uses a bidirectional LSTM network to directly act on word vectors, and the word-level structure information is formed by connecting the final state of a forward LSTM and the final state of a reverse LSTM.

For the acquisition of phrase-level semantic information, the method firstly uses a convolutional neural network to act on word vectors to obtain phrase information, and then uses an attention mechanism to act on the phrase information to obtain phrase-level semantic information; for the acquisition of phrase level structure information, the present invention uses bi-directional LSTM to act on the phrase information, and the phrase level structure information is formed by connecting the final state of forward LSTM with the final state of backward LSTM.

For the fusion mode of word level semantic information, word level structure information, phrase level semantic information and phrase level structure information, the invention provides two fusion modes: static fusion (i.e., weighted average of the four pieces of information) and dynamic fusion based on the attention mechanism (i.e., learning the proportion of the contribution of the four pieces of information to the overall text representation using the attention mechanism, then multiplying and accumulating)

Example 1:

(1) the input of the invention is a text, which is composed of a series of words, and the word vector corresponding to each word in the input text is obtained by searching the 300-dimensional GloVe pre-training word vector as the input of the neural network.

(2) Using an attention machine to act on word vectors to obtain the contribution proportion of each word to word-level semantic information, and then multiplying and accumulating the contribution proportion of each word and the corresponding word vector to obtain word-level semantic information; and (3) using the bidirectional LSTM to act on the word vector, and splicing the final state of the forward LSTM and the final state of the reverse LSTM to obtain word-level structural information.

(3) Using a convolutional neural network to act on word vectors to obtain hidden representations of the phrases, using self-attention to act on the hidden representations of the phrases to obtain the contribution proportion of each phrase to phrase-level semantic information, and then multiplying and accumulating the contribution proportion of each phrase and the corresponding phrase hidden representations to obtain phrase-level semantic information; the phrase level structure information is derived using bi-directional LSTM to act on the hidden representation of the phrase.

(4) And obtaining final text representation by using the static fusion method or the dynamic fusion method based on the attention mechanism on the word-level semantic information, the word-level structure information, the phrase-level semantic information and the phrase-level structure information, and then sending the text representation serving as the high-level features of the text into the category to which the softmax function predicts the text.

1. Preprocessing text

Firstly, the text is subjected to word segmentation, and an NLTK word segmentation device is adopted as a word segmentation tool. The words are then indexed by word frequency, starting with index 1, and the text is converted into a corresponding index sequence. The predefined model needs to have the input of a fixed length, so the input text is processed, if the length of the input text is greater than the specified length, the text is cut off, and if the length of the input text is less than the specified length, the text is filled in, wherein the filling mode is that 0 is supplemented in the front. After converting an input text into an index sequence, converting each index into a word vector of a corresponding word by searching 300-dimensional GloVe pre-training word vectors, initializing the word vectors of the words which are not in the GloVe by adopting random uniform distribution, and taking the converted word vectors as the input of a neural network.

2. Acquisition of word-level information

Let the input sentence with length s be w₁，w₂，w₃，...，w_sThe corresponding word vectors are x respectively₁，x₂，x₃，...，x_sWherein

Is the word w_iD is the dimension of the vector. Because each word in the sentence has different integral semantic contributions to the sentence, an attention mechanism is directly acted on the word vector to learn the proportion alpha of each word contributing to the word-level semantic information, and then the word vector x of each word is multiplied by the corresponding contribution proportion alpha and accumulated to obtain the word-level semantic information I_wseNamely:

u_i＝tanh(W_wx_i+b_w)

where tanh is the activation function and,

is u_iTranspose of (W)_w，b_w，u_wIs a parameter of the attention mechanism.

For word level structural information I_wstObtained using bi-directional LSTM, i.e.:

word level structural information I_wstIs the final state of forward LSTM

And the final state of the inverse LSTM

And connecting to form the product.

The word vector is 300 dimensions, the word-level semantic information is 300 dimensions, the hidden state dimensions of the forward LSTM and the reverse LSTM are both 150 dimensions, and the word-level structural information is the concatenation of the two states, so that the word-level structural information is 300 dimensions.

3. Phrase-level information acquisition

Since the convolutional neural network can extract n-gram features of the sentence, the window size of the convolutional neural network is set to be n to extract phrase information with the length of n in the sentence. The phrase information with the length of 3, 4 and 5 in the input text is extracted by using 100 convolution kernels with the window size of 3, 4 and 5 respectively, and then the phrase information is spliced to obtain the phrase information. Let the convolved output be d₁，d₂，d₃，...，d_s. Since each phrase in the sentence has different overall semantic contributions to the sentence, the attention mechanism is used on the phrase-level representation to learn the proportion beta of each phrase contributing to the phrase-level semantic information, and then the hidden representation vector d of each phrase is multiplied by the contribution proportion beta of each phrase and accumulated to obtain the word-level semantic information I_pseThe method is similar to the acquisition of word-level semantic information.

For phrase level structural information I_pstBi-directional LSTM is used for acquisition, similar to the acquisition of word-level structural information.

For the phrase information, 100 convolution kernels each having a window size of 3, 4, or 5 are used, and thus the dimension of the spliced phrase information is 300 dimensions. The dimension of the semantic information at the phrase level after attention mechanism extraction is also 300 dimensions. For phrase level structure information, the same bi-directional LSTM structure as the extracted word level structure information is used, wherein the forward LSTM and backward LSTM have 150 dimensions, and the phrase level structure information is the concatenation of their final states, so the dimension is 300 dimensions.

4. Fusion method and classification

For the obtained word-level semantic information I_wseWord level structural information I_wstSemantic information at the phrase level I_psePhrase level structural information I_pstThe invention proposes two different fusion strategies to fuse them together to obtain the final textual representation: static fusion and attention-based dynamic fusion.

For static fusion, as shown in FIG. 2, the text representation is composed of word-level semantic information, word-level structure information, phrase-level semantic information, weighted average of phrase-level structure information, i.e., representation of text T

I_T＝(I_wse+I_wst+I_pse+I_pst)/4

For dynamic fusion, as shown in FIG. 3, the attention mechanism is applied to four different pieces of information to automatically learn the contribution ratio γ of each piece of information to the final text representation. Here is set as I_wse，I_wst，I_pse，I_pstAre respectively I₁，I₂，I₃，I₄The expression calculation formula of the text T is as follows:

u_i＝tanh(W_tI_i+b_t)

this results in a text representation I_TDue to word-level semantic information I_wseWord level structural information I_wstSemantic information at the phrase level I_psePhrase level structural information I_pstAll have 300 dimensions, so the representation of the final text, i.e. the high-level features of the text, also has 300 dimensions.

The text is then represented as a vector I_TSending to a softmax classifier to obtain the probability corresponding to each category:

p＝softmax(W_cI_T+b_c)

To obtain the parameters of the model, the cross-entropy loss function is minimized as follows:

where N is the number of samples in the dataset, C is the number of classes, y_ijIs the true value of the ith sample in the jth class, p_ijIs the predicted probability value of the neural network for the ith sample in the jth class. For the training of model parameters, an Adam optimizer is used, the advantages of two optimization algorithms of AdaGrad and RMSProp are combined, the first moment estimation and the second moment estimation of the gradient are comprehensively considered, and the updating step length is calculated. The method can automatically adjust the learning rate and has the characteristics of simplicity and effectiveness.

After the model parameters are trained, the model is saved. When the texts outside the corpus need to be classified, the texts can be preprocessed firstly, then a model is loaded, word-level semantic information, word-level structure information, phrase-level semantic information and phrase-level structure information are respectively calculated, and then a static fusion method or a dynamic fusion method based on an attention mechanism is used for fusing the four kinds of information to obtain the representation of the final text. And finally, the text expression vector is sent to a softmax function to calculate the probability of each category, and the category with the highest probability is the category to which the text belongs.

5. Experiment of the invention

In order to prove that the model effect provided by the invention is superior to other models, the model is compared with other baseline models on a plurality of public text classification data sets, and the evaluation index is the classification accuracy.

Data set presentation used for experiments:

the MR data set is a two-category movie review data set published by Pang et al, consisting of 5331 positive samples and 5331 negative samples.

The SUBJ dataset is a binary dataset published by Pang et al, and all sentences in the dataset are divided into objective and objective.

The TREC data set is a six-classification problem classification data set issued by Li et al, and the sample labels in the data set are abbrevation, entry, description, location, numeric and human respectively.

The CR dataset is a binary dataset published by Hu et al containing customer reviews, whose labels are positive and negative, respectively.

The Stanford sentment Treebank dataset is a five-category movie review dataset published by Socher et al, whose labels consist of very negative, neutral, positive, very positive.

The AGNews dataset is a news classification dataset issued by Zhang et al, and labels of the AGNews dataset are World, Sports, Business, Sci/Tech respectively.

The experimental setup was as follows:

all experiments were performed on a Windows system using the deep learning framework Keras. For initialization of word vectors, the input to the neural network is initialized with 300 dimensional GloVe word vectors, and for words not in GloVe, their word vectors are initialized with a uniform distribution. The initialization of other weights of the model adopts Xavier uniform distribution, the initialization of bias is 0, the hidden state dimensions of the bidirectional LSTM are all 150, and 100 convolution kernels with window sizes of 3, 4 and 5 are used respectively. For the activation function, a Linear modified Units (Rectified Linear Units) ReLU activation function is applied to the convolutional layer, and the activation function of the fully-connected layer is tanh. For regularization, dropout is used to apply after the Embedding layer, after the convolution layer, and after the fully connected layer, respectively. In addition, no further regularization term is introduced. For model optimization, an Adam optimizer was used to minimize the loss, with the learning rate set at 1 e-4. For model training, set the size of each batch to 32, epoch (total round) to 20, and accuracy on the validation set begins to decline using EarlyStoping.

The results of the experiment are shown in FIG. 7:

all models are divided into 6 parts, the first part is a CNN-based model, the second part is an RNN-based model, the third part is a reinforcement learning-based model, the fourth part is a capsule neural network-based model, the fifth part is an attention-based model, and the last part is the model proposed by the invention.

Compared with other models, the dynamic model provided by the invention achieves the highest performance on four data sets in six classified data sets of the published texts, wherein the MR data set (with the accuracy rate of 83.4) and the CR data set (with the accuracy rate of 87.0) are greatly improved compared with other models. The static model proposed by the present invention also achieves competitive results compared to other models. Compared to CNN-based models and RNN-based models and Attention-based models, dynamic models go well beyond them on six datasets. The reinforcement learning based model and the capsule network based model achieved the highest accuracy on SST5 and AGNews datasets, respectively, but the model also achieved comparable results on both datasets. This shows that the model can effectively extract the features of the text and has strong generalization capability.

Compared with other models, the most important difference is that the model can extract semantic information and structural information of different levels and fuse the semantic information and the structural information to obtain a text representation, and other models only learn a small amount of semantic information or only learn a small amount of structural information and cannot combine the semantic information and the structural information. The main reason that the model can obtain the best performance is that the model can extract word-level semantic information and structure information, phrase-level semantic information and structure information of the text, and the dynamic combination method based on the Attention mechanism can dynamically adjust the weights of the four parts of information to form the final text representation.

In order to prove that the model provided by the invention can extract word-level semantic information and phrase-level semantic information, visual experiments are carried out on some samples. For word-level semantic information, the attention mechanism may learn the proportion of each word's contribution to word-level semantics. As shown in FIG. 4, the sample "a sample open human heart together by means of skewed em element" is taken from the MR data set and the class label is Positive. It can be seen that the key words "pleasant" and "killed" are assigned higher weights by the attention mechanism, and word-level semantic information is learned.

The phrase-level semantic information is similar to the word-level semantic information, as shown in FIG. 5, the sample "it's not differential to spot the custom early-on in this predicted able threshold" is taken from the MR dataset, with the class label Negative. It is difficult to find words with Negative emotion from sentences, but the phrase-level semantic information still learns key phrases such as "this predictive threshold" and assigns higher weights.

To investigate why the dynamic model proposed by the present invention can achieve the best performance on four of the six data sets, we chose some samples to analyze, as shown in fig. 8. Wherein, Att_wseAttention value, Att, representing word-level semantic information_pseAttention value, Att, representing phrase-level semantic information_wstAttention value, Att, representing word-level structural information_pstAn attention value representing phrase level structure information.

For a movie review MR data set including "a thughtful, provocative and instant humanizing file", semantic information of words such as "thughtful", "provocative" and "humanizing" can be extracted by the model, and the semantic information is assigned with higher weight to word-level semantic information, so that the semantic information is classified as positive.

For movie review MR data set "i didn't laugh, i didn't smile, i survived", although attention mechanism may focus on the word "didn't", the sentence also contains many words such as "laugh" and "smile", if only word-level semantic information is considered, which may cause misclassification, at this time, the model may extract semantic information of the phrases such as "didn't laugh" and "didn't smile", and attention mechanism assigns higher weight to the phrase-level semantic information, so that the classification is negative.

For the "nice machines, but i connector user quality preliminary low now" in the CR dataset, the word-level structure information "nice … … but … … low" is learned and is therefore correctly classified as negative.

For the TREC problem, "What type of the curve is used in Australia? "if only the semantic information is focused on, it may cause classification errors, because the word" Australia "may make the model give higher weight to the location class, and the model can learn the phrase-level structure information of" what type of … … ", so the classification as entity

The invention provides a new neural network model for text classification. In order to solve the problem that the traditional text classification method cannot simultaneously and effectively utilize semantic information and structural information of a text, the model provided by the invention can extract the semantic information and the structural information of the text at different levels, including word level semantic information, word level structural information, phrase level semantic information and phrase level structural information. The model takes a text as an input and outputs a category to which the text predicted by the model belongs. In order to obtain the final representation of the text, the invention further provides two fusion methods to fuse four kinds of information, namely static fusion and dynamic fusion based on an attention mechanism. Compared with the traditional method, the text classification model provided by the invention can utilize more text information, and experiments prove that the method has higher performance on a plurality of public text classification data sets than the traditional text classification model.

The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A text classification method based on a neural network is characterized by comprising the following steps:

and 4, step 4: obtaining phrase-level semantic information I using an attention mechanism on phrase information D_pse(ii) a Acting on the phrase information D by using a bidirectional LSTM network to obtain phrase level structure information I_pst；

Step 6: vector representation I of the final text_TInputting the probability into a softmax classifier to obtain the probability corresponding to each class: taking the category with the highest probability as the category to which the text belongs;

p＝softmax(W_cI_T+b_c)

2. The neural network-based text classification method according to claim 1, characterized in that: the step 1 of preprocessing the text specifically comprises the following steps:

3. The neural-network-based text classification method according to claim 1 or 2, characterized in that: obtaining word level semantic information I in step 2_wseThe method comprises the following steps: let the input sentence be w₁,w₂,w₃,…,w_sThe corresponding word vectors are x respectively₁,x₂,x₃,…,x_s(ii) a Since each word in a sentence contributes differently to the overall semantic information of the sentence, the attention mechanism is used to directly act on the word vector to learn the proportion alpha of each word contributing to the word-level semantic information_i(ii) a Vector the word x of each word_iMultiplying the contribution proportion alpha with the corresponding contribution proportion alpha and accumulating to obtain word-level semantic information I_wse；

Wherein the content of the first and second substances,

is the word w_iD is the dimension of the vector;

u_i＝tanh(W_wx_i+b_w)

wherein, tanh is an activation function,

is u_iTranspose of (W)_w,b_w,u_wIs a parameter of the attention mechanism;

And the final state of the inverse LSTM

Are connected to form the composite material;

4. the neural-network-based text classification method according to claim 1 or 2, characterized in that: the semantic information I of fused word level in the step 5_wstWord, wordLevel Structure information I_wstSemantic information at the phrase level I_pseAnd phrase level structural information I_pseTo obtain a vector representation I of the final text_TThe method comprises the following steps: static fusion is adopted, namely text expression is weighted average of word-level semantic information, word-level structure information, phrase-level semantic information and phrase-level structure information;

I_T＝(I_wse+I_wst+I_pse+I_pst)/4。

5. the neural network-based text classification method according to claim 3, characterized in that: the semantic information I of fused word level in the step 5_wseWord level structural information I_wstSemantic information at the phrase level I_pseAnd phrase level structural information I_pstTo obtain a vector representation I of the final text_TThe method comprises the following steps: static fusion is adopted, namely the text representation is weighted average of word-level semantic information, word-level structure information, phrase-level semantic information and phrase-level structure information;

I_T＝(I_wse+I_wst+I_pse+I_pst)/4。

6. the neural-network-based text classification method according to claim 1 or 2, characterized in that: the semantic information I of fused word level in the step 5_wseWord level structural information I_wstSemantic information at the phrase level I_pseAnd phrase level structural information I_pstTo obtain the final text vector representation I_TThe method comprises the following steps: the attention mechanism is applied to four different information to automatically learn the vector representation I of each part of information to the final text by adopting dynamic fusion based on the attention mechanism_TIs given here as I_wse,I_wst,I_pse,I_pstAre respectively I₁,I₂,I₃,I₄；

u_i＝tanh(W_tI_i+b_t)

Wherein, tanh is an activation function,

is u_iTranspose of (W)_t,b_t,u_tIs a parameter of the attention mechanism.

7. The neural network-based text classification method according to claim 3, characterized in that: the semantic information I of fused word level in the step 5_wseWord level structural information I_wstSemantic information at the phrase level I_pseAnd phrase level structural information I_pstTo obtain the final text vector representation I_TThe method comprises the following steps: the attention mechanism is applied to four different information to automatically learn the vector representation I of each part of information to the final text by adopting dynamic fusion based on the attention mechanism_TIs given here as I_wse,I_wst,I_pse,I_pstAre respectively I₁,I₂,I₃,I₄；

u_i＝tanh(W_tI_i+b_t)

Wherein, tanh is an activation function,

is u_iTranspose of (W)_t,b_t,u_tIs a parameter of the attention mechanism.