CN114357166A - Text classification method based on deep learning - Google Patents

Text classification method based on deep learning Download PDF

Info

Publication number
CN114357166A
CN114357166A CN202111662807.8A CN202111662807A CN114357166A CN 114357166 A CN114357166 A CN 114357166A CN 202111662807 A CN202111662807 A CN 202111662807A CN 114357166 A CN114357166 A CN 114357166A
Authority
CN
China
Prior art keywords
training
input
layer
lstm
loss
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111662807.8A
Other languages
Chinese (zh)
Other versions
CN114357166B (en
Inventor
张丽
王月怡
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Technology
Original Assignee
Beijing University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Technology filed Critical Beijing University of Technology
Priority to CN202111662807.8A priority Critical patent/CN114357166B/en
Publication of CN114357166A publication Critical patent/CN114357166A/en
Application granted granted Critical
Publication of CN114357166B publication Critical patent/CN114357166B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Machine Translation (AREA)

Abstract

The invention discloses a text classification method based on deep learning, which firstly carries out noise elimination and comprises the step of removing punctuation marks and special characters. Constructing a dictionary and constructing a data set according to the dictionary; word embedding and confrontation training; training a bidirectional long-time and short-time memory network layer; training an attention mechanism layer; and calculating an output result. The method applies the countervailing training method widely applied to the image field to the natural language processing field, changes the direction of increasing the loss of the network in the model training process by adding the countervailing disturbance in the deep neural network, and updates the parameters by utilizing the loss to carry out derivation on the input, thereby reducing the sensitivity of the model to the countervailing disturbance, effectively relieving the overfitting of the model and improving the text classification effect.

Description

Text classification method based on deep learning
Technical Field
The invention belongs to the field of natural language processing. Text classification is one of the most basic and key technologies in natural language processing, and accurate and efficient text classification has great significance for natural language processing tasks. And carrying out accurate text classification by using a deep learning algorithm.
Background
Among the various fields of artificial intelligence development, natural language processing is one of the fastest growing and most widely used fields. Natural language processing is machine processing of human language and is intended to teach machines how to process and understand human language, thereby establishing a simple communication channel between human and machine. Text classification is one of the most basic and key technologies in natural language processing, and is a technology for converting text and then automatically classifying the converted text into a certain or a plurality of specified categories. Under the background of big data era, the text classification technology applying the deep learning algorithm can automatically and efficiently execute classification tasks, and the cost consumption is greatly reduced. The text classification task plays an important role in a plurality of fields such as emotion analysis, public opinion analysis, field recognition, intention recognition and the like.
The text classification task comprises two parts: text representation and text classification. The text representation goes through the process from symbolic representation to implicit semantic representation, including text preprocessing techniques and text representation techniques. Text preprocessing refers to that in most cases, a certain noise and useless parts exist in the text, so that before classification, the text needs to be preprocessed, and the preprocessing usually comprises the steps of noise removal, word deactivation, Chinese word segmentation, English case unification and the like. The text representation technology is a technology in which, when an original natural language is composed of natural language characters that can be recognized only by humans, a computer cannot directly understand and process the characters, and therefore, it is necessary to convert a text composed of a natural language into a digital representation that can be recognized by a computer. . Including a representation method based on one-hot encoding, a representation method based on a vector space model, a representation method based on a distributed word vector, and the like.
The current text classification model based on deep learning comprises a text classification model based on a convolutional neural network; and secondly, a classification model based on a recurrent neural network, which is mainly used for better processing sequence information, takes sequence data as input, recurses in the evolution direction of the sequence, all nodes are connected in a chain manner, can effectively identify sequence characteristics and predict the next possible situation by using a previous mode, thereby effectively solving the problem that the traditional neural network cannot capture the correlation of each input, but due to an RNN feedback loop, the gradient can quickly diverge to infinity or quickly become 0, namely the problems of gradient disappearance and gradient explosion exist, and under the two conditions, the network stops learning any useful things. The problem of gradient explosion can be solved by gradient clipping, and the problem of gradient disappearance needs more complex RNN basic units to be defined; a more complex RNN basic unit is used, a long-time memory network model and a gate control cycle unit model are obtained through improvement, and both the model and the model pass through a gate mechanism, so that information is selectively passed, and historical information is updated or kept, and the gradient problem is solved to a certain extent; also included are attentional mechanisms that can give different degrees of attention to important and secondary content, which, as an assistive technique commonly used in the field of deep learning, focus neural networks more on the learning of certain specific neurons.
Disclosure of Invention
The method aims at the problem that the existing text classification models based on deep learning are not subjected to noise introduction in training, and the robustness of the models is to be enhanced.
The technical scheme adopted by the invention is to provide a text classification model based on deep learning, which introduces noise data in the model training process. In order to achieve the purpose, the technical scheme adopted by the invention comprises the following steps:
step 1, preprocessing the text.
And carrying out noise elimination on the text, including removing punctuations and special characters. And constructing a dictionary and constructing a data set according to the dictionary.
And 2, word embedding and confrontation training.
And 2.1, using a word embedding mode based on the pre-trained word vector, using the word + word as the pre-trained word vector of the context characteristics, and adapting to the current context in a fine tuning mode.
Step 2.2 represents the new sample input by X + delta, where X is the original input representation, delta is the disturbance superimposed on the input, deltaIs δ ═ α × sign (g), where g denotes the gradient of the Loss function Loss with respect to the input X. Calculating a disturbance delta superimposed on the sample X, and passing through a neural network function fθ() The resulting loss is compared to the tag y and the delta that maximizes the loss is found.
And 2.3, aiming at the loss value obtained in the last step, optimizing the neural network by using a minimization formula.
And 3, training a bidirectional long-time and short-time memory network layer.
The word embedding result is input into a bidirectional long-short time memory neural network layer, the two-way long-short time memory neural network layer is formed by combining the LSTM of the previous item and the backward LSTM, and bidirectional semantic dependence is captured better through Bi-LSTM. Wherein the ith hidden state h of the Bi-LSTMiFrom hi→And hi←Are formed by splicing hi→And hi←All information in the forward and reverse directions, respectively. Wherein each LSTM layer is composed of a plurality of cells, and the output H at any time ttFrom Ht-1、Ct-1And XtIs calculated to obtain wherein Ct-1Is the candidate cell state at time t-1, XtIs the input of a time step t.
And 4, training an attention mechanism layer.
The input of the training attention machine layer is H ═ H1,h2,...,hT]Where T represents the length of the input sequence. The attention score M is calculated from tanh (H), and the probability distribution α of the attention score is calculated from softmax (ω)TM) is calculated, wherein ω isTAre trainable parameters.
The output r of the training attention mechanism layer is composed of H and alphaTAnd matrix multiplication is carried out to obtain the product.
And 5, calculating an output result.
Mapping the extracted features to specific categories by using a full-connection layer, splicing the features extracted by two LSTM layers, mapping the feature information to each category by multiplying the feature information by a weight matrix and adding an offset term, and finally obtaining the probability by a Softmax function, wherein the calculation method is Lable [, ]]=softmax(Fc(A) Wherein a ═ a0,A2,...,Ai]For an input feature, i is the dimension of the input feature. C ═ C0,C2,...,Cn]N represents the number of categories for the score of each category obtained after the characteristics pass through the full connection layer. Then C0To CnAnd obtaining the probability distribution L from the category fraction to each category through a Softmax function.
The method applies the countervailing training method widely applied to the image field to the natural language processing field, changes the direction of increasing the loss of the network in the model training process by adding the countervailing disturbance in the deep neural network, and updates the parameters by utilizing the loss to carry out derivation on the input, thereby reducing the sensitivity of the model to the countervailing disturbance, effectively relieving the overfitting of the model and improving the text classification effect.
Drawings
FIG. 1 is a flow chart of an embodiment of the present invention.
Detailed Description
The flow chart of an embodiment is shown in fig. 1, and comprises the following steps:
(1) text pre-processing
Including operations to clean up noise, i.e., remove noise such as punctuation marks, special characters, etc. And then constructing a dictionary and constructing a data set according to the dictionary.
(2) Word embedding and FGSM attack layer
The effect of word embedding is to map simple word IDs to dense spatial vectors. The word is a basic unit for text processing by a deep learning model, and firstly, the word needs to be symbolized and converted into a digital vector representation by a text composed of a natural language. In word representation, for a given text consisting of T words, the purpose of the word embedding layer is to represent each word as a vector of appropriate dimensions.
Adding one perturbation to the gradient by an FGSM method on word embedding, generating antagonistic samples, inputting the antagonistic samples into a subsequent processing layer in the same form as the original samples, and training a model by optimizing the sum of loss functions of the two types of samples. FGSM will let the direction of the perturbation follow the direction of gradient elevation. Lifting along the gradient also means that the loss increase can be maximized.
After the disturbance is finished, the parameters are added to Embedding to finish the confrontation training of the word Embedding part.
(3) Bidirectional LSTM layer
Since semantic information contained in a word in the text is related not only to the preceding text but also to the following text, the one-way LSTM ignores important information of the preceding text or the following text. If the text is learned from front to back and from back to front at the same time, the semantic information of the text can be better extracted, and the specific contextual meaning is considered. The bidirectional long-and-short-term memory neural network is formed by combining an LSTM of a previous item and an LSTM of a backward item, after a word vector is obtained, a forward hidden layer and a backward hidden layer are spliced by a bidirectional LSTM layer, and finally an output matrix H obtained by multiplying the current cell state by a weight matrix of an output gate is output [ H1,h2,…,hT]。
To prevent high prediction accuracy on the training data set and low prediction accuracy on the test data set, i.e. overfitting, training is performed in the bi-directional LSTM layer in combination with Dropout and the parameter optimization algorithm: in each iteration process, the neurons of the hidden layer are temporarily discarded with a certain probability, then a new network is trained, and parameters of the retained neurons are updated.
(4) Attention layer
The main idea of Attention Mechanism (Attention Mechanism) is to mimic the way a human observes something, i.e. the Mechanism of aligning internal experience to external senses to increase the accuracy of observation of partial areas. When text classification is carried out, key words related to category information are certainly related in a certain sentence, and other words in the sentences are context information words, and the roles of the words are far from large than that of the keywords. The attention mechanism may determine which words in the entire sentence need significant attention, allowing the model to extract more discriminative features from key words.
Obtaining an output matrix H of the bidirectional LSTM layer1,h2,…,hT]Then, the Attention layer learns the weight distribution of the vector representation of the moment at each moment, and then performs resource distribution and weighted summation according to the weight distribution to obtain a vector representation h of the current moment i with richer key informationi
(5) Classification
The main functions of the former two-way LSTM and attention layer are to complete the feature extraction of problem text data, and the fully connected layer maps the extracted features into specific categories. The input of the method is formed by splicing the features extracted from two bidirectional LSTM layers with different depths, the feature information is mapped into each category by multiplying the feature information by a weight matrix and adding a bias term, and finally the probability p of the problem data on each category is obtained by a Softmax function to obtain the final classification result.
The results of experiments using the present invention are given below.
Table 1 shows the test results of twenty thousand news headline data sets extracted from theucnews by the method of the present invention, and the test evaluation method consists of the accuracy, precision and recall F1 values. As can be seen from the table, the four indexes of the method are higher than that of the Bi-LSTM-Attention method without adding the counter-training, which shows that the method has better effect than that of the Bi-LSTM-Attention method without using the counter-training
TABLE 1 comparison of Performance of the inventive method to the reference model method
Measurement index Bi-LSTM-Attention The method of the invention
Rate of accuracy 90.47% 91.93%
Rate of accuracy 90.6% 92.02%
Recall rate 90.4% 91.93%
F1 value 90.4% 91.95%

Claims (1)

1. A text classification method based on deep learning is characterized in that: the method comprises the following implementation steps:
step 1, preprocessing a text;
noise elimination is carried out on the text, and punctuation marks and special characters are removed; constructing a dictionary and constructing a data set according to the dictionary;
step 2, word embedding and confrontation training;
step 2.1, using a word embedding mode based on the word vector of pre-training, using the word + word as the pre-training word vector of the context characteristic, and adapting to the current context by a fine tuning mode;
step 2.2, using X + δ as a new sample input to represent, wherein X is the original input representation, δ is the disturbance superimposed on the input, and δ is calculated by δ α sign (g), wherein g represents the gradient of the Loss function Loss with respect to the input X; calculating a disturbance delta superimposed on the sample X, and passing through a neural network function fθ() Comparing the obtained loss with the tag y, and finding a loss value which maximizes the loss;
step 2.3, aiming at the loss value obtained in the last step, optimizing the neural network by using a minimization formula;
step 3, training a bidirectional long-time and short-time memory network layer;
the word embedding result is input into a bidirectional long-short time memory neural network layer, the bidirectional long-short time memory neural network layer is formed by combining an LSTM of a previous item and an LSTM of a backward item, and bidirectional semantic dependence is captured better through a Bi-LSTM; wherein the ith hidden state h of the Bi-LSTMiFrom hi→And hi←Are formed by splicing hi→And hi←All information in the forward and reverse directions, respectively; wherein each LSTM layer is composed of a plurality of cells, and the output H at any time ttFrom Ht-1、Ct-1And XtIs calculated to obtain wherein Ct-1Is the candidate cell state at time t-1, XtInputting a time step t;
step 4, training an attention mechanism layer;
the input of the training attention machine layer is H ═ H1,h2,...,hT]Wherein T represents the length of the input sequence; the attention score M is calculated from tanh (H), and the probability distribution α of the attention score is calculated from softmax (ω)TM) is calculated, wherein ω isTIs a trainable parameter;
the output r of the training attention mechanism layer is composed of H and alphaTMatrix multiplication is carried out to obtain;
step 5, calculating an output result;
mapping the extracted features to specific categories by using a full-connection layer, splicing the features extracted by two LSTM layers, mapping the feature information to each category by multiplying the feature information by a weight matrix and adding an offset term, and finally obtaining the probability by a Softmax function, wherein the calculation method is Lable [, ]]=softmax(FC(A) Wherein a ═ a0,A2,...,Ai]As input features, i is the dimension of the input feature;
C=[C0,C2,...,Cn]the method is characterized in that the score of each category is obtained after the characteristics pass through a full connection layer, and n represents the number of the categories; then C0To CnObtaining category scores to each item through a Softmax functionProbability distribution of classes L.
CN202111662807.8A 2021-12-31 2021-12-31 Text classification method based on deep learning Active CN114357166B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111662807.8A CN114357166B (en) 2021-12-31 2021-12-31 Text classification method based on deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111662807.8A CN114357166B (en) 2021-12-31 2021-12-31 Text classification method based on deep learning

Publications (2)

Publication Number Publication Date
CN114357166A true CN114357166A (en) 2022-04-15
CN114357166B CN114357166B (en) 2024-05-28

Family

ID=81104826

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111662807.8A Active CN114357166B (en) 2021-12-31 2021-12-31 Text classification method based on deep learning

Country Status (1)

Country Link
CN (1) CN114357166B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114743081A (en) * 2022-05-10 2022-07-12 北京瑞莱智慧科技有限公司 Model training method, related device and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109740148A (en) * 2018-12-16 2019-05-10 北京工业大学 A kind of text emotion analysis method of BiLSTM combination Attention mechanism
CN109992780A (en) * 2019-03-29 2019-07-09 哈尔滨理工大学 One kind being based on deep neural network specific objective sensibility classification method
CN111274405A (en) * 2020-02-26 2020-06-12 北京工业大学 Text classification method based on GCN
CN111444346A (en) * 2020-03-31 2020-07-24 广州大学 Word vector confrontation sample generation method and device for text classification
CN113822328A (en) * 2021-08-05 2021-12-21 厦门市美亚柏科信息股份有限公司 Image classification method for defending against sample attack, terminal device and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109740148A (en) * 2018-12-16 2019-05-10 北京工业大学 A kind of text emotion analysis method of BiLSTM combination Attention mechanism
CN109992780A (en) * 2019-03-29 2019-07-09 哈尔滨理工大学 One kind being based on deep neural network specific objective sensibility classification method
CN111274405A (en) * 2020-02-26 2020-06-12 北京工业大学 Text classification method based on GCN
CN111444346A (en) * 2020-03-31 2020-07-24 广州大学 Word vector confrontation sample generation method and device for text classification
CN113822328A (en) * 2021-08-05 2021-12-21 厦门市美亚柏科信息股份有限公司 Image classification method for defending against sample attack, terminal device and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114743081A (en) * 2022-05-10 2022-07-12 北京瑞莱智慧科技有限公司 Model training method, related device and storage medium
CN114743081B (en) * 2022-05-10 2023-06-20 北京瑞莱智慧科技有限公司 Model training method, related device and storage medium

Also Published As

Publication number Publication date
CN114357166B (en) 2024-05-28

Similar Documents

Publication Publication Date Title
CN110609891B (en) Visual dialog generation method based on context awareness graph neural network
CN110298037B (en) Convolutional neural network matching text recognition method based on enhanced attention mechanism
CN108984526B (en) Document theme vector extraction method based on deep learning
CN110929030B (en) Text abstract and emotion classification combined training method
CN110134946B (en) Machine reading understanding method for complex data
CN110647612A (en) Visual conversation generation method based on double-visual attention network
CN111414461B (en) Intelligent question-answering method and system fusing knowledge base and user modeling
CN110287323B (en) Target-oriented emotion classification method
CN110765775A (en) Self-adaptive method for named entity recognition field fusing semantics and label differences
CN110598005A (en) Public safety event-oriented multi-source heterogeneous data knowledge graph construction method
CN110008323B (en) Problem equivalence judgment method combining semi-supervised learning and ensemble learning
CN111966812B (en) Automatic question answering method based on dynamic word vector and storage medium
CN112069831A (en) Unreal information detection method based on BERT model and enhanced hybrid neural network
CN106682089A (en) RNNs-based method for automatic safety checking of short message
CN113255366B (en) Aspect-level text emotion analysis method based on heterogeneous graph neural network
CN112232053A (en) Text similarity calculation system, method and storage medium based on multi-keyword pair matching
CN113190656A (en) Chinese named entity extraction method based on multi-label framework and fusion features
CN112434514B (en) Multi-granularity multi-channel neural network based semantic matching method and device and computer equipment
CN112347269A (en) Method for recognizing argument pairs based on BERT and Att-BilSTM
CN114239574A (en) Miner violation knowledge extraction method based on entity and relationship joint learning
CN111651993A (en) Chinese named entity recognition method fusing local-global character level association features
CN112988970A (en) Text matching algorithm serving intelligent question-answering system
CN114254645A (en) Artificial intelligence auxiliary writing system
CN114048314A (en) Natural language steganalysis method
CN116522165B (en) Public opinion text matching system and method based on twin structure

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant