CN114398488A - Bilstm multi-label text classification method based on attention mechanism - Google Patents
Bilstm multi-label text classification method based on attention mechanism Download PDFInfo
- Publication number
- CN114398488A CN114398488A CN202210047500.5A CN202210047500A CN114398488A CN 114398488 A CN114398488 A CN 114398488A CN 202210047500 A CN202210047500 A CN 202210047500A CN 114398488 A CN114398488 A CN 114398488A
- Authority
- CN
- China
- Prior art keywords
- label
- text
- representation
- data
- document
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000007246 mechanism Effects 0.000 title claims abstract description 59
- 238000000034 method Methods 0.000 title claims abstract description 32
- 238000013145 classification model Methods 0.000 claims abstract description 13
- 239000011159 matrix material Substances 0.000 claims description 22
- 238000012545 processing Methods 0.000 claims description 15
- 230000004927 fusion Effects 0.000 claims description 11
- 239000013598 vector Substances 0.000 claims description 11
- 239000000126 substance Substances 0.000 claims description 4
- 230000004913 activation Effects 0.000 claims description 3
- 238000010606 normalization Methods 0.000 claims description 2
- 230000006870 function Effects 0.000 abstract description 17
- 238000003058 natural language processing Methods 0.000 abstract description 4
- 238000012549 training Methods 0.000 abstract description 3
- 230000001186 cumulative effect Effects 0.000 abstract description 2
- 238000010586 diagram Methods 0.000 description 5
- 230000008569 process Effects 0.000 description 4
- 238000013135 deep learning Methods 0.000 description 2
- 238000010801 machine learning Methods 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 230000003044 adaptive effect Effects 0.000 description 1
- 230000004075 alteration Effects 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 239000000969 carrier Substances 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 238000013527 convolutional neural network Methods 0.000 description 1
- 125000004122 cyclic group Chemical group 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 238000007499 fusion processing Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000003062 neural network model Methods 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 238000007634 remodeling Methods 0.000 description 1
- 230000035945 sensitivity Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000011426 transformation method Methods 0.000 description 1
- 230000017105 transposition Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
- G06F16/353—Clustering; Classification into predefined classes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/047—Probabilistic or stochastic networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/048—Activation functions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/082—Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Molecular Biology (AREA)
- Computational Linguistics (AREA)
- Software Systems (AREA)
- Mathematical Physics (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computing Systems (AREA)
- General Health & Medical Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Probability & Statistics with Applications (AREA)
- Databases & Information Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention belongs to the field of natural language processing and multi-label text classification, and particularly relates to a BILSTM multi-label text classification method based on an attention mechanism; respectively embedding words in text data and label data through bert and Word2 vec; respectively extracting context information of text data and label data after word embedding by adopting a BILSTM module to obtain text representation and label representation; obtaining, by an attention mechanism module, a tag-based textual representation; training a multi-label text classification model through a loss function; inputting the real-time data into a trained multi-label text classification model to obtain a label classification prediction result of the real-time data; the invention utilizes the Bert to embed words, utilizes the BILSTM to extract the context dependency relationship, and fully utilizes the text and the text, the text and the label as well as the information of the label and the label, thereby improving the accuracy rate of multi-label text classification and the normalized breaking loss cumulative gain.
Description
Technical Field
The invention belongs to the field of natural language processing and multi-label text classification, and particularly relates to a BILSTM multi-label text classification method based on an attention mechanism.
Background
The text is one of important carriers of the information, the theme and the scale of the text information are various and have great difference, and how to efficiently process the text information is a problem with great research significance, and the rapid development of the automatic text classification technology is promoted. Text classification is an important and classical problem in Natural Language Processing (NLP), and in the conventional text classification problem, each sample has only one category label, and the category labels are independent from each other, and the classification granularity is relatively rough, which is called single-label text classification; with the increasing richness of text information, the degree of classification granularity refinement is higher and higher, one sample is related to a plurality of class labels, and meanwhile, certain dependency exists among the class labels, which is called multi-label text classification.
The multi-label text classification is an important branch of the multi-label classification, and the multi-label text classification method is divided into two main categories: conventional machine learning methods and deep learning based methods. The first conventional machine learning method includes a problem transformation method and an algorithm adaptive method; the second deep learning-based method is to process the multi-label text classification problem by using various neural network models, and classify the multi-label text classification problem into a convolutional neural network structure-based multi-label text classification method, a cyclic neural network structure-based multi-label text classification method and a transform structure-based multi-label text classification method according to the structure of the network. There are a number of studies on multi-label text classification, but there are still several problems:
1. the correlation between labels is studied. The labels in the multi-label text classification problem are inherently related, and the relevance among the labels is often not considered in the conventional methods for processing the multi-label text classification problem, so that the multi-label text classification efficiency is not high.
2. The research on the relevance of the document content and the label content has the defects that the fusion effect of the document content and the label content is poor, and the classification precision is influenced.
Disclosure of Invention
In order to solve the problems, the invention provides a BILSTM multi-label text classification method based on an attention mechanism, which comprises the following steps of constructing a multi-label text classification model, wherein the multi-label text classification model comprises a bert model, a Word2vec model, a BILSTM module and an attention mechanism module:
s1, Word embedding is carried out on text data through a bert model, and Word embedding is carried out on label data through a Word2vec model;
s2, extracting context information from the text data and the label data after word embedding through a BILSTM module to obtain text representation and label representation;
s3, processing the text representation and the label representation by adopting an attention mechanism module to obtain a text representation based on a label;
s4, calculating the loss of the text representation based on the label through a loss function until convergence to obtain a trained multi-label text classification model;
and S5, inputting the real-time data into the trained multi-label text classification model to obtain a label classification prediction result of the real-time data.
Further, step S2 uses the BILSTM module to learn the text data and the tag data after word embedding, and obtains the text representation and the tag representation, which are expressed as:
wherein, H is a text representation,in order to be represented in a forward text direction,for the purpose of the reverse text representation,representing a forward text representation at time step p,representing the reverse text representation at time step p, H' is the tag representation,in order to be represented by the positive-going label,in order to be represented by the reverse label,representing the positive label representation at time step p,denotes the reverse label representation at time step p, R denotes the dimension range, H belongs to R2k×nH' belongs to R2k×lWherein:
wherein the content of the first and second substances,andall belong to RkK denotes the size of the LSTM hidden layer, VtAn embedded vector for the t-th word in the text data,representing a forward text representation at time step p-1,representing the reverse text representation, V, at time step p-1t"is the embedded vector of the t-th word in the tag data,representing a positive label representation at time step p-1,representing the reverse label representation at time step p-1.
Further, the step S3 of obtaining the tag-based text representation includes:
s11, sending the text representation into a self-attention mechanism to obtain a label document representation under the self-attention mechanism;
s12, sending the label data and text representation after word embedding into a label attention mechanism to obtain document representation of all labels;
s13, fusing the label document representation under the self-attention mechanism obtained in the step S11 and the document representation which is obtained in the step S12 and is subjected to all labels to obtain a fused document representation;
and S14, sending the label text into a self-attention mechanism for processing, and fusing the processing result with the fusion document representation of S13 to obtain a text representation based on the label.
Further, step S11 obtains a linear combination of each tagged context word in the text data by using the tag attention score, and obtains a tag document representation of the text representation under the self-attention mechanism according to the linear combination of each tagged context word, where the tag attention score and the linear combination of each tagged context word are respectively represented as:
A(s)=softmax(W2tanh(W1H));
wherein A is(s)For label attention scoring, R represents a range of dimensions,W1、W2as a self-attention parameter, daFor a hyper-parameter, H is a text representation, tanh () is an activation function,representing the contribution of all words to the jth label,for along the jth tag under the self-attention mechanismIs represented by a tag document of HTA transpose matrix for the text representation H.
Further, the step S12 obtaining the document representation with all tags includes:
converting label data after word embedding into a trainable matrix, constructing a semantic relation between text representation and the trainable matrix by linearly combining context words of labels, and acquiring document representation passing through all labels according to the semantic relation between the text representation and the trainable matrixWherein:
c represents a trainable matrix of label data after word embedding, R represents a dimension range, C belongs to Rl×k,In order to be represented in a forward text direction,for the purpose of the reverse text representation,for a positive representation of the context words of the linear combination label,for the reverse representation of the context words of the linear combination labels,for a forward representation of the document representation via all tags,a transpose matrix for the forward text representation,being a reverse representation of the document representation across all tags,represented as a transposed matrix of the inverted text representation.
Further, the fusion process of step S13 includes:
wherein M isjFor the first fused document representation along the jth tag,for the tag document representation along the jth tag,for a j-tagged document representation, αjFor self-attention weighting, LαIs a first parameter, LβIs the second parameter.
Further, the process of obtaining a tag-based text representation includes:
s21, capturing the dependency relationship of each label in the label text through a self-attention mechanism to obtain the attention score of the label word of the label text;
s22, acquiring a linear combination of each label according to the attention score of the label words of the label text, and obtaining label representation specific to the label under the self-attention mechanism through the linear combination of each label;
and S23, fusing the label representation specific to the label under the self-attention mechanism with the fused document representation to obtain a text representation based on the label.
Further, before the fusion in step S23, the fused document representation is processed through a full link layer to obtain a first text, the tag representation is processed through a full link layer to obtain a second text, and the first text and the second text are fused to obtain a tag-based text representation, where the processing formula is:
a=sigmoid(W5M)
d=sigmoid(W6M`(s))
z=BN[a,d]
wherein a is a first text, d is a second text, M is a fusion document representation, and M ″(s)For label representation, BN [ ·]For batch normalization, z is a label-based text representation, W5、W6Is a weight value.
Further, a predictive probability of the classification is calculated by a sigmoid function based on the tag-based textual representationExpressed as:
where reshape (. circle.) is the reshape function, b is the offset, W7For weights, sigmoid () is the sigmoid function, zTIs a transpose of the label-based text representation.
Further, the loss function is expressed as:
wherein N is the total number of text data, l is the total number of tag data,to predict the probability, yijE {0,1} represents the classification accuracy of the ith document along the jth label.
The invention has the beneficial effects that:
the invention utilizes the Bert model to embed words in text data, and uses the Word2vec model to embed words in label data, and converts the label data after Word embedding into a trainable matrix, thereby solving the relation between text and label, better improving classification precision, enhancing the sensitivity of label in text, the label data generally has dozens or more than 100 labels compared with the text data, the complexity is reduced by processing label data with Word2vec, the processing speed is improved, the BILSTM is used for extracting context dependency after Word embedding, the text and label are respectively processed by using the self-attention machine, the correlation between text and label is improved, in addition, the text context relation extracted by BILSTM and the label after Word embedding are processed by using the label attention machine, the correlation between text and label is improved, the text under the self-attention machine is processed, And the text and the label under the label attention mechanism are fused, and the fusion result is fused with the label under the self-attention mechanism again, so that the accuracy of multi-label text classification and the normalized breaking cumulative gain are improved.
Drawings
FIG. 1 is a general flow diagram of the present invention;
FIG. 2 is a mechanism diagram of the LSTM of the present invention;
FIG. 3 is a diagram of the bi-directional LSTM model architecture of the present invention;
FIG. 4 is a block diagram of a BILSTM multi-label text classification method based on an attention mechanism according to the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
A BILSTM multi-label text classification method based on attention mechanism, as shown in fig. 1, comprising the following steps:
s1, Word embedding is carried out on text data through a bert model, and Word embedding is carried out on label data through a Word2vec model;
s2, extracting context information from the text data and the label data after word embedding through a BILSTM module to obtain text representation and label representation;
s3, processing the text representation and the label representation by adopting an attention mechanism module to obtain a text representation based on a label;
s4, calculating the loss of the text representation based on the label through a loss function until convergence to obtain a trained multi-label text classification model;
and S5, inputting the real-time data into the trained multi-label text classification model to obtain a label classification prediction result of the real-time data.
Specifically, a general block diagram of the bilst text classification method based on the attention mechanism is shown in fig. 4, the multi-label text classification model includes a bert model, a Word2vec model, a bilst module, and an attention mechanism module, and the specific implementation flow includes:
s11, inputting text data into a bert model for Word embedding, and performing Word embedding on tag data through a Word2vec model;
s12, extracting context information from the text data and the label data which are subjected to word embedding by adopting a BILSTM model to obtain text representation and label representation;
s13, sending the text representation into a self-attention mechanism to obtain a label document representation under the self-attention mechanism;
s14, sending the label data and the text representation after word embedding into a label attention mechanism to obtain document representations of all labels;
s15, fusing the label document representation under the self-attention mechanism obtained in the S13 and the document representation with all labels obtained in the S14 to obtain a fused document representation, namely A in FIG. 4;
s16, sending the label text into a self-attention mechanism for processing, and fusing the processing result with the fusion document representation of S15 to obtain a text representation based on the label, namely B in FIG. 4;
and S17, processing the text representation based on the label through a sigmoid function to obtain a final label classification prediction result.
In one embodiment, text data is input into a bert model, which processes the text data through word embedding, sentence embedding, and position embedding in sequence to obtain a text output vector containing words, sentences, and positions, which is denoted as { V }1,V2,...,Vp,...,VnN represents the maximum embedded word length, in this embodiment, n is 300, and the dimension of the bert model is set to 768;
word embedding is carried out on tag data through Word2vec, the embedded dimensionality is set to be k, and the tag data are used for { V'1,V′2,…V′p,…,V′lDenotes the tag output vector, l denotes the number of embedded tags, and k is 300 in this embodiment.
In an embodiment, text data and tag data after embedding a blst model learning word are used to obtain a text representation and a tag representation, where the structure of the LSTM is shown in fig. 2, and various operations in the LSTM structure are represented as:
Df=sigmoid(Wf[xi,st-1]+bf);
Din=sigmoid(Win[xi,st-1]+bin);
Ct=tanh(Wc[xi,st-1]+bc);
Ct=Df*Ct-1+Din*Ct;
D0=sigmoid(Wo[xi,st-1]+bo);
st=D0*tanh(Ct);
wherein x isiRepresenting an input vector, Wf、Win、WcRespectively representing the weight of a forgetting gate, the weight of an input gate and the weight of an input unit at the moment t. bf、bin、bcRespectively showing the left gate offset, the input unit offset at time t, and Ct-1Indicating information on the state of the cells at time t-1, DfAnd DinRespectively representing the output of the forgetting gate and the output of the input gate, CtIndicating cell state input at time t, CtRepresenting updated cell state information, D0Representing output gate output, stIndicating the hidden layer state at time t.
BILSTM operation is carried out on the basis of LSTM:
preferably, the text data and the tag data after embedding the learning words by using the BILSTM model are used to obtain a text representation and a tag representation, which are represented as follows:
wherein, H is a text representation,in order to be represented in a forward text direction,for the purpose of the reverse text representation,representing a forward text representation at time step p,representing the reverse text representation at time step p, H' is the tag representation,in order to be represented by the positive-going label,in order to be represented by the reverse label,representing the positive label representation at time step p,denotes the reverse label representation at time step p, R denotes the dimension range, H belongs to R2k×nH' belongs to R2k×lWherein:
wherein the content of the first and second substances,andall belong to RkK denotes the size of the LSTM hidden layer, VtAn embedded vector for the t-th word in the text data,representing a forward text representation at time step p-1,representing the reverse text representation, V, at time step p-1t"is the embedded vector of the t-th word in the tag data,representing a positive label representation at time step p-1,represents the reverse label representation at time step p-1;
as shown in fig. 3, the text data and tag data embedded by the word are learned by using the blstm, and at a time step p, the hidden state can be updated by inputting and outputting (p-1) step, where k represents the size of the LSTM hidden layer and is set to be 300 in this embodiment.
A multi-labeled document may be tagged with multiple labels, each document should have the most relevant context to its corresponding label, in other words, each document may contain multiple labels, with words in one document contributing differently to each label.
In one embodiment, in order to obtain different contributions of each tag, a self-attention mechanism is adopted, which specifically includes: linear combinations of each label context word in the text data are obtained by adopting label attention scores, label document representations of the text representations under a self-attention mechanism are obtained according to the linear combinations of each label context word, and the label attention scores and the linear combinations of each label context word are respectively represented as follows:
A(s)=softmax(W2tanh(W1H));
wherein A is(s)In order to score the attention of the tag,W1and W2Representing the self-attention parameter to be trained, daIn this example d is a hyper-parameteraAt 200, H is a text representation, tanh () is an activation function,representing the contribution of all words in the text data to the jth label,for the tag document representation under the autofocusing mechanism along the jth tag, HTA transpose matrix for the text representation; finally, label document representation M of text representation under the self-attention mechanism is obtained(s),M(s)∈Rl×2k。
In order to utilize semantic information of the label, after Word2vec preprocessing is carried out on the label, the label is expressed as a trainable matrix C e Rl×kI.e. the same potential k-dimensional space as the wordWith the embedding of tags in conjunction with the embedding of text words via the BILSTM, the semantic relationship between each pair of words and tags can be determined explicitly.
In one embodiment, a semantic relationship between a text representation and word-embedded tag data is constructed by linearly combining context words of tags, and a document representation of all tags is obtained from the semantic relationship between the text representation and the word-embedded tag dataWherein:
c represents a trainable matrix of label data after word embedding, C belongs to Rl×k,In order to be represented in a forward text direction,for the purpose of the reverse text representation,for a positive representation of the context words of the linear combination label,for the reverse representation of the context words of the linear combination labels,for a forward representation of the document representation via all tags,a transpose matrix for the forward text representation,being a reverse representation of the document representation across all tags,represented as a transposed matrix of the inverted text representation.
M(s)And M(l)Are all tag-specific representations of documents, but they are different, M(s)With emphasis on document content, M(l)More inclined to semantic association between document content and label text, two weight vectors (alpha, beta epsilon to R) are introducedl) To determine the importance of the two parts, which are input to M(s)And M(l)Obtaining the full connection layer;
for the tag document representation along the jth tag,for a j-tagged document representation, αjFor self-attention weighting, LαIs a first parameter, LβFor the second parameter, a constraint is added to the two weight parameters, and a first fused document representation along the jth label is obtained according to the fused weight, namely
Obtaining a fused first fusion document representation M through the introduced self-attention weight value and according to the fusion weight method, wherein M belongs to Rl×2k。
In one embodiment, the tag score of each tag is obtained, so as to obtain a tag-specific tag representation based on the self-attention mechanism, and further obtain a tag-based text representation, and the specific steps are as follows:
s21, capturing the dependency relationship of each label in the label text through a self-attention mechanism to obtain the attention score of the label word of the label text;
s22, acquiring a linear combination of each label according to the attention score of the label words of the label text, and obtaining label representation specific to the label under the self-attention mechanism through the linear combination of each label;
s23, fusing the document representation which is specific to the label under the self-attention mechanism with the document representation which is the label under the self-attention mechanism to obtain a text representation based on the label.
Specifically, the tag word attention score of the tag text is expressed as:
wherein the content of the first and second substances,W3、W4as a self-attention parameter, daFor hyper-parameter, H' is a label representation, in this embodiment d is seta=200;
Specifically, step S22 includes:
representing the contribution of all tags to the jth tag,representing a label representation specific to the jth label in the self-attention mechanism, matrix M ″(s)∈Rl×2kIs a label-specific label representation under the self-attention mechanism;
in this embodiment, the label attention mechanism is that various labels in the label data are used in the text information in the text data, and is an association between the text and the labels, so the label attention mechanism is called; the two-time self-attention mechanism is used in the present invention, the first time based on the association between text and text, specifically, the association between text content and tags in text, and the second time based on the association between tags and tags.
The adopted fusion mode has the advantages that under the condition that dimensionality is not changed, training speed can be accelerated better, and the dependence relationship among parameters is reduced. The specific formula is as follows:
a=sigmoid(W5M)
d=sigmoid(W6M`(s))
z=BN[a,d]
W5∈R1×l,W6∈R1×l,a∈R1×2k,b∈R1×2k,z∈R1×4kand z is a tag-based text representation.
Once there is a comprehensive label-specific document representation, matrix remodeling can be performed through a reshape function to obtain vectors of l rows and l columns, and then output is performed through a final sigmoid function, and mathematically, the prediction probability of each label can be calculated as follows:
W7∈Rl×4k;
where reshape (. circle.) is the reshape function, b is the offset, W7For weights, sigmoid () is the sigmoid function, zTFor the transposition of the label-based text representation, the output value is converted into a probability by using a sigmoid function, and the cross entropy loss can be used as a loss function:
wherein N is the number of training documents, l is the number of labels,to predict the probability, yijE {0,1} represents the classification accuracy of the ith document along the jth label, W5For full link layer parameters, W7Sigmoid () is a sigmoid function for output layer parameters.
Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.
Claims (10)
1. The BILSTM multi-label text classification method based on the attention mechanism is characterized by constructing a multi-label text classification model, wherein the multi-label text classification model comprises a bert model, a Word2vec model, a BILSTM module and an attention mechanism module, and the BILSTM multi-label text classification method based on the attention mechanism comprises the following steps:
s1, Word embedding is carried out on text data through a bert model, and Word embedding is carried out on label data through a Word2vec model;
s2, extracting context information from the text data and the label data after word embedding through a BILSTM module to obtain text representation and label representation;
s3, processing the text representation and the label representation by adopting an attention mechanism module to obtain a text representation based on a label;
s4, calculating the loss of the text representation based on the label through a loss function until convergence to obtain a trained multi-label text classification model;
and S5, inputting the real-time data into the trained multi-label text classification model to obtain a label classification prediction result of the real-time data.
2. The method of claim 1, wherein step S2 learns the text data and label data after word embedding by using the blst module to obtain the text representation and label representation, which is expressed as:
wherein, H is a text representation,in order to be represented in a forward text direction,for the purpose of the reverse text representation,representing a forward text representation at time step p,representing the reverse text representation at time step p, H' is the tag representation,is represented by a forward tag, H' is represented by a reverse tag,representing the positive label representation at time step p,denotes the reverse label representation at time step p, R denotes the dimension range, H belongs to R2k×nH' belongs to R2k×lWherein:
wherein the content of the first and second substances,andall belong to RkK denotes the size of the LSTM hidden layer, VtAn embedded vector for the t-th word in the text data,representing a forward text representation at time step p-1,representing the reverse text representation, V, at time step p-1t"is the embedded vector of the t-th word in the tag data,representing a positive label representation at time step p-1,representing the reverse label representation at time step p-1.
3. The method of claim 1, wherein the step S3 of obtaining the label-based text representation comprises:
s11, sending the text representation into a self-attention mechanism to obtain a label document representation under the self-attention mechanism;
s12, sending the label data and text representation after word embedding into a label attention mechanism to obtain document representation of all labels;
s13, fusing the label document representation under the self-attention mechanism obtained in the step S11 and the document representation which is obtained in the step S12 and is subjected to all labels to obtain a fused document representation;
and S14, sending the label text into a self-attention mechanism for processing, and fusing the processing result with the fusion document representation of S13 to obtain a text representation based on the label.
4. The method according to claim 3, wherein step S11 obtains the linear combination of each tagged contextual word in the text data by using the tag attention score, obtains the tagged document representation of the text representation under the self-attention mechanism according to the linear combination of each tagged contextual word, and the tag attention score and the linear combination of each tagged contextual word are respectively expressed as:
A(s)=softmax(W2tanh(W1H));
wherein A is(s)In order to score the attention of the tag,W1、W2as a self-attention parameter, daFor a hyper-parameter, H is a text representation, tanh () is an activation function,representing the contribution of all words to the jth label,along the jth tagLabel document representation under the self-attention mechanism, HTA transpose matrix for the text representation H.
5. The method of claim 3, wherein the step S12 of obtaining the document representation with all labels comprises:
converting label data after word embedding into a trainable matrix, constructing a semantic relation between text representation and the trainable matrix by linearly combining context words of labels, and acquiring document representation passing through all labels according to the semantic relation between the text representation and the trainable matrixWherein:
c represents a trainable matrix of label data after word embedding, C belongs to Rl×k,For forward text representation, H for reverse text representation,for a positive representation of the context words of the linear combination label,for the reverse representation of the context words of the linear combination labels,for a forward representation of the document representation via all tags,a transpose matrix for the forward text representation,being a reverse representation of the document representation across all tags,represented as a transposed matrix of the inverted text representation.
6. The method for classifying BILSTM multi-label text based on attention mechanism as claimed in claim 3, wherein the fusing procedure of step S13 includes:
7. The method of claim 3, wherein obtaining the label-based text representation comprises:
s21, capturing the dependency relationship of each label in the label text through a self-attention mechanism to obtain the attention score of the label word of the label text;
s22, acquiring a linear combination of each label according to the attention score of the label words of the label text, and obtaining label representation specific to the label under the self-attention mechanism through the linear combination of each label;
and S23, fusing the label representation specific to the label under the self-attention mechanism with the fused document representation to obtain a text representation based on the label.
8. The method of claim 7, wherein before the fusing in step S23, the fused document representation is processed through a full link layer to obtain the first text, the label representation is processed through a full link layer to obtain the second text, and the first text and the second text are fused to obtain the label-based text representation, according to the following processing formula:
a=sigmoid(W5M)
d=sigmoid(W6M`(s))
z=BN[a,d]
wherein a is a first text, d is a second text, M is a fusion document representation, and M ″(s)For label representation, BN [ ·]For batch normalization, z is a label-based text representation, W5、W6Is a weight value.
9. The system of claim 8, wherein the BILSTM multi-label text is based on the attention mechanismClassification method characterized in that, on the basis of a tag-based text representation, the prediction probability of a classification is calculated by means of a sigmoid functionExpressed as:
where reshape (. circle.) is the reshape function, b is the offset, W7For weights, sigmoid () is the sigmoid function, zTIs a transpose of the label-based text representation.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210047500.5A CN114398488A (en) | 2022-01-17 | 2022-01-17 | Bilstm multi-label text classification method based on attention mechanism |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210047500.5A CN114398488A (en) | 2022-01-17 | 2022-01-17 | Bilstm multi-label text classification method based on attention mechanism |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114398488A true CN114398488A (en) | 2022-04-26 |
Family
ID=81231064
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210047500.5A Pending CN114398488A (en) | 2022-01-17 | 2022-01-17 | Bilstm multi-label text classification method based on attention mechanism |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114398488A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115905533A (en) * | 2022-11-24 | 2023-04-04 | 重庆邮电大学 | Intelligent multi-label text classification method |
CN116562251A (en) * | 2023-05-19 | 2023-08-08 | 中国矿业大学(北京) | Form classification method for stock information disclosure long document |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109582789A (en) * | 2018-11-12 | 2019-04-05 | 北京大学 | Text multi-tag classification method based on semantic primitive information |
CN110019653A (en) * | 2019-04-08 | 2019-07-16 | 北京航空航天大学 | A kind of the social content characterizing method and system of fusing text and label network |
CN110209823A (en) * | 2019-06-12 | 2019-09-06 | 齐鲁工业大学 | A kind of multi-tag file classification method and system |
CN110442723A (en) * | 2019-08-14 | 2019-11-12 | 山东大学 | A method of multi-tag text classification is used for based on the Co-Attention model that multistep differentiates |
US20200327381A1 (en) * | 2019-04-10 | 2020-10-15 | International Business Machines Corporation | Evaluating text classification anomalies predicted by a text classification model |
CN113626589A (en) * | 2021-06-18 | 2021-11-09 | 电子科技大学 | Multi-label text classification method based on mixed attention mechanism |
-
2022
- 2022-01-17 CN CN202210047500.5A patent/CN114398488A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109582789A (en) * | 2018-11-12 | 2019-04-05 | 北京大学 | Text multi-tag classification method based on semantic primitive information |
CN110019653A (en) * | 2019-04-08 | 2019-07-16 | 北京航空航天大学 | A kind of the social content characterizing method and system of fusing text and label network |
US20200327381A1 (en) * | 2019-04-10 | 2020-10-15 | International Business Machines Corporation | Evaluating text classification anomalies predicted by a text classification model |
CN110209823A (en) * | 2019-06-12 | 2019-09-06 | 齐鲁工业大学 | A kind of multi-tag file classification method and system |
CN110442723A (en) * | 2019-08-14 | 2019-11-12 | 山东大学 | A method of multi-tag text classification is used for based on the Co-Attention model that multistep differentiates |
CN113626589A (en) * | 2021-06-18 | 2021-11-09 | 电子科技大学 | Multi-label text classification method based on mixed attention mechanism |
Non-Patent Citations (3)
Title |
---|
YANRU DONG等: "A Fusion Model-Based Label Embedding and Self-Interaction Attention for Text Classification", 《IEEE ACCESS 》, vol. 8, 21 November 2019 (2019-11-21), pages 30548, XP011772629, DOI: 10.1109/ACCESS.2019.2954985 * |
刘杰等: "融合注意力机制的多标签文本分类", 《微电子学与计算机》, 4 January 2024 (2024-01-04), pages 26 - 34 * |
孙伟: "基于注意力和图卷积的多标签文本分类研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》, no. 09, 15 September 2021 (2021-09-15), pages 138 - 714 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115905533A (en) * | 2022-11-24 | 2023-04-04 | 重庆邮电大学 | Intelligent multi-label text classification method |
CN115905533B (en) * | 2022-11-24 | 2023-09-19 | 湖南光线空间信息科技有限公司 | Multi-label text intelligent classification method |
CN116562251A (en) * | 2023-05-19 | 2023-08-08 | 中国矿业大学(北京) | Form classification method for stock information disclosure long document |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111783462B (en) | Chinese named entity recognition model and method based on double neural network fusion | |
CN110609897B (en) | Multi-category Chinese text classification method integrating global and local features | |
CN110298037B (en) | Convolutional neural network matching text recognition method based on enhanced attention mechanism | |
Sivakumar et al. | Review on word2vec word embedding neural net | |
CN112733541A (en) | Named entity identification method of BERT-BiGRU-IDCNN-CRF based on attention mechanism | |
CN113626589B (en) | Multi-label text classification method based on mixed attention mechanism | |
CN111078833B (en) | Text classification method based on neural network | |
CN113255294B (en) | Named entity recognition model training method, recognition method and device | |
CN112749274B (en) | Chinese text classification method based on attention mechanism and interference word deletion | |
CN111309918A (en) | Multi-label text classification method based on label relevance | |
CN114398488A (en) | Bilstm multi-label text classification method based on attention mechanism | |
CN113657115A (en) | Multi-modal Mongolian emotion analysis method based on ironic recognition and fine-grained feature fusion | |
CN115130591A (en) | Cross supervision-based multi-mode data classification method and device | |
CN115238693A (en) | Chinese named entity recognition method based on multi-word segmentation and multi-layer bidirectional long-short term memory | |
CN115098673A (en) | Business document information extraction method based on variant attention and hierarchical structure | |
CN111325036A (en) | Emerging technology prediction-oriented evidence fact extraction method and system | |
Tarride et al. | A comparative study of information extraction strategies using an attention-based neural network | |
CN114048314A (en) | Natural language steganalysis method | |
CN116956228A (en) | Text mining method for technical transaction platform | |
CN116562291A (en) | Chinese nested named entity recognition method based on boundary detection | |
CN115906846A (en) | Document-level named entity identification method based on double-graph hierarchical feature fusion | |
CN113590819B (en) | Large-scale category hierarchical text classification method | |
Zhang et al. | Hierarchical attention networks for grid text classification | |
CN114722818A (en) | Named entity recognition model based on anti-migration learning | |
CN114239584A (en) | Named entity identification method based on self-supervision learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |