CN114398488A - Bilstm multi-label text classification method based on attention mechanism - Google Patents

Bilstm multi-label text classification method based on attention mechanism Download PDF

Info

Publication number
CN114398488A
CN114398488A CN202210047500.5A CN202210047500A CN114398488A CN 114398488 A CN114398488 A CN 114398488A CN 202210047500 A CN202210047500 A CN 202210047500A CN 114398488 A CN114398488 A CN 114398488A
Authority
CN
China
Prior art keywords
label
text
representation
data
document
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210047500.5A
Other languages
Chinese (zh)
Inventor
唐宏
刘杰
甘陈敏
彭金枝
刘蓓明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing University of Post and Telecommunications
Original Assignee
Chongqing University of Post and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing University of Post and Telecommunications filed Critical Chongqing University of Post and Telecommunications
Priority to CN202210047500.5A priority Critical patent/CN114398488A/en
Publication of CN114398488A publication Critical patent/CN114398488A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/353Clustering; Classification into predefined classes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Probability & Statistics with Applications (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention belongs to the field of natural language processing and multi-label text classification, and particularly relates to a BILSTM multi-label text classification method based on an attention mechanism; respectively embedding words in text data and label data through bert and Word2 vec; respectively extracting context information of text data and label data after word embedding by adopting a BILSTM module to obtain text representation and label representation; obtaining, by an attention mechanism module, a tag-based textual representation; training a multi-label text classification model through a loss function; inputting the real-time data into a trained multi-label text classification model to obtain a label classification prediction result of the real-time data; the invention utilizes the Bert to embed words, utilizes the BILSTM to extract the context dependency relationship, and fully utilizes the text and the text, the text and the label as well as the information of the label and the label, thereby improving the accuracy rate of multi-label text classification and the normalized breaking loss cumulative gain.

Description

Bilstm multi-label text classification method based on attention mechanism
Technical Field
The invention belongs to the field of natural language processing and multi-label text classification, and particularly relates to a BILSTM multi-label text classification method based on an attention mechanism.
Background
The text is one of important carriers of the information, the theme and the scale of the text information are various and have great difference, and how to efficiently process the text information is a problem with great research significance, and the rapid development of the automatic text classification technology is promoted. Text classification is an important and classical problem in Natural Language Processing (NLP), and in the conventional text classification problem, each sample has only one category label, and the category labels are independent from each other, and the classification granularity is relatively rough, which is called single-label text classification; with the increasing richness of text information, the degree of classification granularity refinement is higher and higher, one sample is related to a plurality of class labels, and meanwhile, certain dependency exists among the class labels, which is called multi-label text classification.
The multi-label text classification is an important branch of the multi-label classification, and the multi-label text classification method is divided into two main categories: conventional machine learning methods and deep learning based methods. The first conventional machine learning method includes a problem transformation method and an algorithm adaptive method; the second deep learning-based method is to process the multi-label text classification problem by using various neural network models, and classify the multi-label text classification problem into a convolutional neural network structure-based multi-label text classification method, a cyclic neural network structure-based multi-label text classification method and a transform structure-based multi-label text classification method according to the structure of the network. There are a number of studies on multi-label text classification, but there are still several problems:
1. the correlation between labels is studied. The labels in the multi-label text classification problem are inherently related, and the relevance among the labels is often not considered in the conventional methods for processing the multi-label text classification problem, so that the multi-label text classification efficiency is not high.
2. The research on the relevance of the document content and the label content has the defects that the fusion effect of the document content and the label content is poor, and the classification precision is influenced.
Disclosure of Invention
In order to solve the problems, the invention provides a BILSTM multi-label text classification method based on an attention mechanism, which comprises the following steps of constructing a multi-label text classification model, wherein the multi-label text classification model comprises a bert model, a Word2vec model, a BILSTM module and an attention mechanism module:
s1, Word embedding is carried out on text data through a bert model, and Word embedding is carried out on label data through a Word2vec model;
s2, extracting context information from the text data and the label data after word embedding through a BILSTM module to obtain text representation and label representation;
s3, processing the text representation and the label representation by adopting an attention mechanism module to obtain a text representation based on a label;
s4, calculating the loss of the text representation based on the label through a loss function until convergence to obtain a trained multi-label text classification model;
and S5, inputting the real-time data into the trained multi-label text classification model to obtain a label classification prediction result of the real-time data.
Further, step S2 uses the BILSTM module to learn the text data and the tag data after word embedding, and obtains the text representation and the tag representation, which are expressed as:
Figure BDA0003472765610000021
Figure BDA0003472765610000022
Figure BDA0003472765610000023
Figure BDA0003472765610000024
Figure BDA0003472765610000025
Figure BDA0003472765610000026
wherein, H is a text representation,
Figure BDA0003472765610000027
in order to be represented in a forward text direction,
Figure BDA0003472765610000028
for the purpose of the reverse text representation,
Figure BDA0003472765610000029
representing a forward text representation at time step p,
Figure BDA00034727656100000210
representing the reverse text representation at time step p, H' is the tag representation,
Figure BDA0003472765610000031
in order to be represented by the positive-going label,
Figure BDA0003472765610000032
in order to be represented by the reverse label,
Figure BDA0003472765610000033
representing the positive label representation at time step p,
Figure BDA0003472765610000034
denotes the reverse label representation at time step p, R denotes the dimension range, H belongs to R2k×nH' belongs to R2k×lWherein:
Figure BDA0003472765610000035
Figure BDA0003472765610000036
Figure BDA0003472765610000037
Figure BDA0003472765610000038
wherein the content of the first and second substances,
Figure BDA0003472765610000039
and
Figure BDA00034727656100000310
all belong to RkK denotes the size of the LSTM hidden layer, VtAn embedded vector for the t-th word in the text data,
Figure BDA00034727656100000311
representing a forward text representation at time step p-1,
Figure BDA00034727656100000312
representing the reverse text representation, V, at time step p-1t"is the embedded vector of the t-th word in the tag data,
Figure BDA00034727656100000313
representing a positive label representation at time step p-1,
Figure BDA00034727656100000314
representing the reverse label representation at time step p-1.
Further, the step S3 of obtaining the tag-based text representation includes:
s11, sending the text representation into a self-attention mechanism to obtain a label document representation under the self-attention mechanism;
s12, sending the label data and text representation after word embedding into a label attention mechanism to obtain document representation of all labels;
s13, fusing the label document representation under the self-attention mechanism obtained in the step S11 and the document representation which is obtained in the step S12 and is subjected to all labels to obtain a fused document representation;
and S14, sending the label text into a self-attention mechanism for processing, and fusing the processing result with the fusion document representation of S13 to obtain a text representation based on the label.
Further, step S11 obtains a linear combination of each tagged context word in the text data by using the tag attention score, and obtains a tag document representation of the text representation under the self-attention mechanism according to the linear combination of each tagged context word, where the tag attention score and the linear combination of each tagged context word are respectively represented as:
A(s)=softmax(W2tanh(W1H));
Figure BDA0003472765610000041
wherein A is(s)For label attention scoring, R represents a range of dimensions,
Figure BDA0003472765610000042
W1、W2as a self-attention parameter, daFor a hyper-parameter, H is a text representation, tanh () is an activation function,
Figure BDA0003472765610000043
representing the contribution of all words to the jth label,
Figure BDA0003472765610000044
for along the jth tag under the self-attention mechanismIs represented by a tag document of HTA transpose matrix for the text representation H.
Further, the step S12 obtaining the document representation with all tags includes:
converting label data after word embedding into a trainable matrix, constructing a semantic relation between text representation and the trainable matrix by linearly combining context words of labels, and acquiring document representation passing through all labels according to the semantic relation between the text representation and the trainable matrix
Figure BDA0003472765610000045
Wherein:
Figure BDA0003472765610000046
Figure BDA0003472765610000047
Figure BDA0003472765610000048
Figure BDA0003472765610000049
c represents a trainable matrix of label data after word embedding, R represents a dimension range, C belongs to Rl×k
Figure BDA00034727656100000410
In order to be represented in a forward text direction,
Figure BDA00034727656100000411
for the purpose of the reverse text representation,
Figure BDA00034727656100000412
for a positive representation of the context words of the linear combination label,
Figure BDA00034727656100000413
for the reverse representation of the context words of the linear combination labels,
Figure BDA00034727656100000414
for a forward representation of the document representation via all tags,
Figure BDA00034727656100000415
a transpose matrix for the forward text representation,
Figure BDA00034727656100000416
being a reverse representation of the document representation across all tags,
Figure BDA00034727656100000417
represented as a transposed matrix of the inverted text representation.
Further, the fusion process of step S13 includes:
Figure BDA00034727656100000418
Figure BDA00034727656100000419
wherein M isjFor the first fused document representation along the jth tag,
Figure BDA00034727656100000420
for the tag document representation along the jth tag,
Figure BDA0003472765610000051
for a j-tagged document representation, αjFor self-attention weighting, LαIs a first parameter, LβIs the second parameter.
Further, the process of obtaining a tag-based text representation includes:
s21, capturing the dependency relationship of each label in the label text through a self-attention mechanism to obtain the attention score of the label word of the label text;
s22, acquiring a linear combination of each label according to the attention score of the label words of the label text, and obtaining label representation specific to the label under the self-attention mechanism through the linear combination of each label;
and S23, fusing the label representation specific to the label under the self-attention mechanism with the fused document representation to obtain a text representation based on the label.
Further, before the fusion in step S23, the fused document representation is processed through a full link layer to obtain a first text, the tag representation is processed through a full link layer to obtain a second text, and the first text and the second text are fused to obtain a tag-based text representation, where the processing formula is:
a=sigmoid(W5M)
d=sigmoid(W6M`(s))
z=BN[a,d]
wherein a is a first text, d is a second text, M is a fusion document representation, and M ″(s)For label representation, BN [ ·]For batch normalization, z is a label-based text representation, W5、W6Is a weight value.
Further, a predictive probability of the classification is calculated by a sigmoid function based on the tag-based textual representation
Figure BDA0003472765610000052
Expressed as:
Figure BDA0003472765610000053
where reshape (. circle.) is the reshape function, b is the offset, W7For weights, sigmoid () is the sigmoid function, zTIs a transpose of the label-based text representation.
Further, the loss function is expressed as:
Figure BDA0003472765610000054
wherein N is the total number of text data, l is the total number of tag data,
Figure BDA0003472765610000055
to predict the probability, yijE {0,1} represents the classification accuracy of the ith document along the jth label.
The invention has the beneficial effects that:
the invention utilizes the Bert model to embed words in text data, and uses the Word2vec model to embed words in label data, and converts the label data after Word embedding into a trainable matrix, thereby solving the relation between text and label, better improving classification precision, enhancing the sensitivity of label in text, the label data generally has dozens or more than 100 labels compared with the text data, the complexity is reduced by processing label data with Word2vec, the processing speed is improved, the BILSTM is used for extracting context dependency after Word embedding, the text and label are respectively processed by using the self-attention machine, the correlation between text and label is improved, in addition, the text context relation extracted by BILSTM and the label after Word embedding are processed by using the label attention machine, the correlation between text and label is improved, the text under the self-attention machine is processed, And the text and the label under the label attention mechanism are fused, and the fusion result is fused with the label under the self-attention mechanism again, so that the accuracy of multi-label text classification and the normalized breaking cumulative gain are improved.
Drawings
FIG. 1 is a general flow diagram of the present invention;
FIG. 2 is a mechanism diagram of the LSTM of the present invention;
FIG. 3 is a diagram of the bi-directional LSTM model architecture of the present invention;
FIG. 4 is a block diagram of a BILSTM multi-label text classification method based on an attention mechanism according to the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
A BILSTM multi-label text classification method based on attention mechanism, as shown in fig. 1, comprising the following steps:
s1, Word embedding is carried out on text data through a bert model, and Word embedding is carried out on label data through a Word2vec model;
s2, extracting context information from the text data and the label data after word embedding through a BILSTM module to obtain text representation and label representation;
s3, processing the text representation and the label representation by adopting an attention mechanism module to obtain a text representation based on a label;
s4, calculating the loss of the text representation based on the label through a loss function until convergence to obtain a trained multi-label text classification model;
and S5, inputting the real-time data into the trained multi-label text classification model to obtain a label classification prediction result of the real-time data.
Specifically, a general block diagram of the bilst text classification method based on the attention mechanism is shown in fig. 4, the multi-label text classification model includes a bert model, a Word2vec model, a bilst module, and an attention mechanism module, and the specific implementation flow includes:
s11, inputting text data into a bert model for Word embedding, and performing Word embedding on tag data through a Word2vec model;
s12, extracting context information from the text data and the label data which are subjected to word embedding by adopting a BILSTM model to obtain text representation and label representation;
s13, sending the text representation into a self-attention mechanism to obtain a label document representation under the self-attention mechanism;
s14, sending the label data and the text representation after word embedding into a label attention mechanism to obtain document representations of all labels;
s15, fusing the label document representation under the self-attention mechanism obtained in the S13 and the document representation with all labels obtained in the S14 to obtain a fused document representation, namely A in FIG. 4;
s16, sending the label text into a self-attention mechanism for processing, and fusing the processing result with the fusion document representation of S15 to obtain a text representation based on the label, namely B in FIG. 4;
and S17, processing the text representation based on the label through a sigmoid function to obtain a final label classification prediction result.
In one embodiment, text data is input into a bert model, which processes the text data through word embedding, sentence embedding, and position embedding in sequence to obtain a text output vector containing words, sentences, and positions, which is denoted as { V }1,V2,...,Vp,...,VnN represents the maximum embedded word length, in this embodiment, n is 300, and the dimension of the bert model is set to 768;
word embedding is carried out on tag data through Word2vec, the embedded dimensionality is set to be k, and the tag data are used for { V'1,V′2,…V′p,…,V′lDenotes the tag output vector, l denotes the number of embedded tags, and k is 300 in this embodiment.
In an embodiment, text data and tag data after embedding a blst model learning word are used to obtain a text representation and a tag representation, where the structure of the LSTM is shown in fig. 2, and various operations in the LSTM structure are represented as:
Df=sigmoid(Wf[xi,st-1]+bf);
Din=sigmoid(Win[xi,st-1]+bin);
Ct=tanh(Wc[xi,st-1]+bc);
Ct=Df*Ct-1+Din*Ct
D0=sigmoid(Wo[xi,st-1]+bo);
st=D0*tanh(Ct);
wherein x isiRepresenting an input vector, Wf、Win、WcRespectively representing the weight of a forgetting gate, the weight of an input gate and the weight of an input unit at the moment t. bf、bin、bcRespectively showing the left gate offset, the input unit offset at time t, and Ct-1Indicating information on the state of the cells at time t-1, DfAnd DinRespectively representing the output of the forgetting gate and the output of the input gate, CtIndicating cell state input at time t, CtRepresenting updated cell state information, D0Representing output gate output, stIndicating the hidden layer state at time t.
BILSTM operation is carried out on the basis of LSTM:
preferably, the text data and the tag data after embedding the learning words by using the BILSTM model are used to obtain a text representation and a tag representation, which are represented as follows:
Figure BDA0003472765610000081
Figure BDA0003472765610000082
Figure BDA0003472765610000083
Figure BDA0003472765610000091
Figure BDA0003472765610000092
Figure BDA0003472765610000093
wherein, H is a text representation,
Figure BDA0003472765610000094
in order to be represented in a forward text direction,
Figure BDA0003472765610000095
for the purpose of the reverse text representation,
Figure BDA0003472765610000096
representing a forward text representation at time step p,
Figure BDA0003472765610000097
representing the reverse text representation at time step p, H' is the tag representation,
Figure BDA0003472765610000098
in order to be represented by the positive-going label,
Figure BDA0003472765610000099
in order to be represented by the reverse label,
Figure BDA00034727656100000910
representing the positive label representation at time step p,
Figure BDA00034727656100000911
denotes the reverse label representation at time step p, R denotes the dimension range, H belongs to R2k×nH' belongs to R2k×lWherein:
Figure BDA00034727656100000912
Figure BDA00034727656100000913
Figure BDA00034727656100000914
Figure BDA00034727656100000915
wherein the content of the first and second substances,
Figure BDA00034727656100000916
and
Figure BDA00034727656100000917
all belong to RkK denotes the size of the LSTM hidden layer, VtAn embedded vector for the t-th word in the text data,
Figure BDA00034727656100000918
representing a forward text representation at time step p-1,
Figure BDA00034727656100000919
representing the reverse text representation, V, at time step p-1t"is the embedded vector of the t-th word in the tag data,
Figure BDA00034727656100000920
representing a positive label representation at time step p-1,
Figure BDA00034727656100000921
represents the reverse label representation at time step p-1;
as shown in fig. 3, the text data and tag data embedded by the word are learned by using the blstm, and at a time step p, the hidden state can be updated by inputting and outputting (p-1) step, where k represents the size of the LSTM hidden layer and is set to be 300 in this embodiment.
A multi-labeled document may be tagged with multiple labels, each document should have the most relevant context to its corresponding label, in other words, each document may contain multiple labels, with words in one document contributing differently to each label.
In one embodiment, in order to obtain different contributions of each tag, a self-attention mechanism is adopted, which specifically includes: linear combinations of each label context word in the text data are obtained by adopting label attention scores, label document representations of the text representations under a self-attention mechanism are obtained according to the linear combinations of each label context word, and the label attention scores and the linear combinations of each label context word are respectively represented as follows:
A(s)=softmax(W2tanh(W1H));
Figure BDA0003472765610000101
wherein A is(s)In order to score the attention of the tag,
Figure BDA0003472765610000102
W1and W2Representing the self-attention parameter to be trained, daIn this example d is a hyper-parameteraAt 200, H is a text representation, tanh () is an activation function,
Figure BDA0003472765610000103
representing the contribution of all words in the text data to the jth label,
Figure BDA0003472765610000104
for the tag document representation under the autofocusing mechanism along the jth tag, HTA transpose matrix for the text representation; finally, label document representation M of text representation under the self-attention mechanism is obtained(s),M(s)∈Rl×2k
In order to utilize semantic information of the label, after Word2vec preprocessing is carried out on the label, the label is expressed as a trainable matrix C e Rl×kI.e. the same potential k-dimensional space as the wordWith the embedding of tags in conjunction with the embedding of text words via the BILSTM, the semantic relationship between each pair of words and tags can be determined explicitly.
In one embodiment, a semantic relationship between a text representation and word-embedded tag data is constructed by linearly combining context words of tags, and a document representation of all tags is obtained from the semantic relationship between the text representation and the word-embedded tag data
Figure BDA0003472765610000105
Wherein:
Figure BDA0003472765610000106
Figure BDA0003472765610000107
Figure BDA0003472765610000108
Figure BDA0003472765610000109
c represents a trainable matrix of label data after word embedding, C belongs to Rl×k
Figure BDA00034727656100001010
In order to be represented in a forward text direction,
Figure BDA00034727656100001011
for the purpose of the reverse text representation,
Figure BDA00034727656100001012
for a positive representation of the context words of the linear combination label,
Figure BDA00034727656100001013
for the reverse representation of the context words of the linear combination labels,
Figure BDA0003472765610000111
for a forward representation of the document representation via all tags,
Figure BDA0003472765610000112
a transpose matrix for the forward text representation,
Figure BDA0003472765610000113
being a reverse representation of the document representation across all tags,
Figure BDA0003472765610000114
represented as a transposed matrix of the inverted text representation.
M(s)And M(l)Are all tag-specific representations of documents, but they are different, M(s)With emphasis on document content, M(l)More inclined to semantic association between document content and label text, two weight vectors (alpha, beta epsilon to R) are introducedl) To determine the importance of the two parts, which are input to M(s)And M(l)Obtaining the full connection layer;
Figure BDA0003472765610000115
Figure BDA0003472765610000116
for the tag document representation along the jth tag,
Figure BDA0003472765610000117
for a j-tagged document representation, αjFor self-attention weighting, LαIs a first parameter, LβFor the second parameter, a constraint is added to the two weight parameters, and a first fused document representation along the jth label is obtained according to the fused weight, namely
Figure BDA0003472765610000118
Obtaining a fused first fusion document representation M through the introduced self-attention weight value and according to the fusion weight method, wherein M belongs to Rl×2k
In one embodiment, the tag score of each tag is obtained, so as to obtain a tag-specific tag representation based on the self-attention mechanism, and further obtain a tag-based text representation, and the specific steps are as follows:
s21, capturing the dependency relationship of each label in the label text through a self-attention mechanism to obtain the attention score of the label word of the label text;
s22, acquiring a linear combination of each label according to the attention score of the label words of the label text, and obtaining label representation specific to the label under the self-attention mechanism through the linear combination of each label;
s23, fusing the document representation which is specific to the label under the self-attention mechanism with the document representation which is the label under the self-attention mechanism to obtain a text representation based on the label.
Specifically, the tag word attention score of the tag text is expressed as:
Figure BDA0003472765610000119
wherein the content of the first and second substances,
Figure BDA00034727656100001110
W3、W4as a self-attention parameter, daFor hyper-parameter, H' is a label representation, in this embodiment d is seta=200;
Specifically, step S22 includes:
Figure BDA0003472765610000121
Figure BDA0003472765610000122
representing the contribution of all tags to the jth tag,
Figure BDA0003472765610000123
representing a label representation specific to the jth label in the self-attention mechanism, matrix M ″(s)∈Rl×2kIs a label-specific label representation under the self-attention mechanism;
in this embodiment, the label attention mechanism is that various labels in the label data are used in the text information in the text data, and is an association between the text and the labels, so the label attention mechanism is called; the two-time self-attention mechanism is used in the present invention, the first time based on the association between text and text, specifically, the association between text content and tags in text, and the second time based on the association between tags and tags.
The adopted fusion mode has the advantages that under the condition that dimensionality is not changed, training speed can be accelerated better, and the dependence relationship among parameters is reduced. The specific formula is as follows:
a=sigmoid(W5M)
d=sigmoid(W6M`(s))
z=BN[a,d]
W5∈R1×l,W6∈R1×l,a∈R1×2k,b∈R1×2k,z∈R1×4kand z is a tag-based text representation.
Once there is a comprehensive label-specific document representation, matrix remodeling can be performed through a reshape function to obtain vectors of l rows and l columns, and then output is performed through a final sigmoid function, and mathematically, the prediction probability of each label can be calculated as follows:
Figure BDA0003472765610000124
W7∈Rl×4k
where reshape (. circle.) is the reshape function, b is the offset, W7For weights, sigmoid () is the sigmoid function, zTFor the transposition of the label-based text representation, the output value is converted into a probability by using a sigmoid function, and the cross entropy loss can be used as a loss function:
Figure BDA0003472765610000131
wherein N is the number of training documents, l is the number of labels,
Figure BDA0003472765610000132
to predict the probability, yijE {0,1} represents the classification accuracy of the ith document along the jth label, W5For full link layer parameters, W7Sigmoid () is a sigmoid function for output layer parameters.
Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims (10)

1. The BILSTM multi-label text classification method based on the attention mechanism is characterized by constructing a multi-label text classification model, wherein the multi-label text classification model comprises a bert model, a Word2vec model, a BILSTM module and an attention mechanism module, and the BILSTM multi-label text classification method based on the attention mechanism comprises the following steps:
s1, Word embedding is carried out on text data through a bert model, and Word embedding is carried out on label data through a Word2vec model;
s2, extracting context information from the text data and the label data after word embedding through a BILSTM module to obtain text representation and label representation;
s3, processing the text representation and the label representation by adopting an attention mechanism module to obtain a text representation based on a label;
s4, calculating the loss of the text representation based on the label through a loss function until convergence to obtain a trained multi-label text classification model;
and S5, inputting the real-time data into the trained multi-label text classification model to obtain a label classification prediction result of the real-time data.
2. The method of claim 1, wherein step S2 learns the text data and label data after word embedding by using the blst module to obtain the text representation and label representation, which is expressed as:
Figure FDA0003472765600000011
Figure FDA0003472765600000012
Figure FDA0003472765600000013
Figure FDA0003472765600000014
Figure FDA0003472765600000015
Figure FDA0003472765600000016
wherein, H is a text representation,
Figure FDA0003472765600000019
in order to be represented in a forward text direction,
Figure FDA0003472765600000017
for the purpose of the reverse text representation,
Figure FDA0003472765600000018
representing a forward text representation at time step p,
Figure FDA0003472765600000021
representing the reverse text representation at time step p, H' is the tag representation,
Figure FDA0003472765600000022
is represented by a forward tag, H' is represented by a reverse tag,
Figure FDA0003472765600000023
representing the positive label representation at time step p,
Figure FDA0003472765600000024
denotes the reverse label representation at time step p, R denotes the dimension range, H belongs to R2k×nH' belongs to R2k×lWherein:
Figure FDA0003472765600000025
Figure FDA0003472765600000026
Figure FDA0003472765600000027
Figure FDA0003472765600000028
wherein the content of the first and second substances,
Figure FDA0003472765600000029
and
Figure FDA00034727656000000210
all belong to RkK denotes the size of the LSTM hidden layer, VtAn embedded vector for the t-th word in the text data,
Figure FDA00034727656000000211
representing a forward text representation at time step p-1,
Figure FDA00034727656000000212
representing the reverse text representation, V, at time step p-1t"is the embedded vector of the t-th word in the tag data,
Figure FDA00034727656000000213
representing a positive label representation at time step p-1,
Figure FDA00034727656000000214
representing the reverse label representation at time step p-1.
3. The method of claim 1, wherein the step S3 of obtaining the label-based text representation comprises:
s11, sending the text representation into a self-attention mechanism to obtain a label document representation under the self-attention mechanism;
s12, sending the label data and text representation after word embedding into a label attention mechanism to obtain document representation of all labels;
s13, fusing the label document representation under the self-attention mechanism obtained in the step S11 and the document representation which is obtained in the step S12 and is subjected to all labels to obtain a fused document representation;
and S14, sending the label text into a self-attention mechanism for processing, and fusing the processing result with the fusion document representation of S13 to obtain a text representation based on the label.
4. The method according to claim 3, wherein step S11 obtains the linear combination of each tagged contextual word in the text data by using the tag attention score, obtains the tagged document representation of the text representation under the self-attention mechanism according to the linear combination of each tagged contextual word, and the tag attention score and the linear combination of each tagged contextual word are respectively expressed as:
A(s)=softmax(W2tanh(W1H));
Figure FDA0003472765600000031
wherein A is(s)In order to score the attention of the tag,
Figure FDA0003472765600000032
W1、W2as a self-attention parameter, daFor a hyper-parameter, H is a text representation, tanh () is an activation function,
Figure FDA0003472765600000033
representing the contribution of all words to the jth label,
Figure FDA0003472765600000034
along the jth tagLabel document representation under the self-attention mechanism, HTA transpose matrix for the text representation H.
5. The method of claim 3, wherein the step S12 of obtaining the document representation with all labels comprises:
converting label data after word embedding into a trainable matrix, constructing a semantic relation between text representation and the trainable matrix by linearly combining context words of labels, and acquiring document representation passing through all labels according to the semantic relation between the text representation and the trainable matrix
Figure FDA0003472765600000035
Wherein:
Figure FDA0003472765600000036
Figure FDA0003472765600000037
Figure FDA0003472765600000038
Figure FDA0003472765600000039
c represents a trainable matrix of label data after word embedding, C belongs to Rl×k
Figure FDA00034727656000000310
For forward text representation, H for reverse text representation,
Figure FDA00034727656000000311
for a positive representation of the context words of the linear combination label,
Figure FDA00034727656000000312
for the reverse representation of the context words of the linear combination labels,
Figure FDA00034727656000000313
for a forward representation of the document representation via all tags,
Figure FDA00034727656000000314
a transpose matrix for the forward text representation,
Figure FDA00034727656000000315
being a reverse representation of the document representation across all tags,
Figure FDA00034727656000000316
represented as a transposed matrix of the inverted text representation.
6. The method for classifying BILSTM multi-label text based on attention mechanism as claimed in claim 3, wherein the fusing procedure of step S13 includes:
Figure FDA0003472765600000041
Figure FDA0003472765600000042
wherein M isjFor the first fused document representation along the jth tag,
Figure FDA0003472765600000043
for the tag document representation along the jth tag,
Figure FDA0003472765600000044
for a j-tagged document representation, αjFor self-attention weighting, LαIs a first parameter, LβIs the second parameter.
7. The method of claim 3, wherein obtaining the label-based text representation comprises:
s21, capturing the dependency relationship of each label in the label text through a self-attention mechanism to obtain the attention score of the label word of the label text;
s22, acquiring a linear combination of each label according to the attention score of the label words of the label text, and obtaining label representation specific to the label under the self-attention mechanism through the linear combination of each label;
and S23, fusing the label representation specific to the label under the self-attention mechanism with the fused document representation to obtain a text representation based on the label.
8. The method of claim 7, wherein before the fusing in step S23, the fused document representation is processed through a full link layer to obtain the first text, the label representation is processed through a full link layer to obtain the second text, and the first text and the second text are fused to obtain the label-based text representation, according to the following processing formula:
a=sigmoid(W5M)
d=sigmoid(W6M`(s))
z=BN[a,d]
wherein a is a first text, d is a second text, M is a fusion document representation, and M ″(s)For label representation, BN [ ·]For batch normalization, z is a label-based text representation, W5、W6Is a weight value.
9. The system of claim 8, wherein the BILSTM multi-label text is based on the attention mechanismClassification method characterized in that, on the basis of a tag-based text representation, the prediction probability of a classification is calculated by means of a sigmoid function
Figure FDA0003472765600000051
Expressed as:
Figure FDA0003472765600000052
where reshape (. circle.) is the reshape function, b is the offset, W7For weights, sigmoid () is the sigmoid function, zTIs a transpose of the label-based text representation.
10. The method of claim 1, wherein the penalty function is expressed as:
Figure FDA0003472765600000053
wherein N is the total number of text data, l is the total number of tag data,
Figure FDA0003472765600000054
to predict the probability, yijE {0,1} represents the classification accuracy of the ith document along the jth label.
CN202210047500.5A 2022-01-17 2022-01-17 Bilstm multi-label text classification method based on attention mechanism Pending CN114398488A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210047500.5A CN114398488A (en) 2022-01-17 2022-01-17 Bilstm multi-label text classification method based on attention mechanism

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210047500.5A CN114398488A (en) 2022-01-17 2022-01-17 Bilstm multi-label text classification method based on attention mechanism

Publications (1)

Publication Number Publication Date
CN114398488A true CN114398488A (en) 2022-04-26

Family

ID=81231064

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210047500.5A Pending CN114398488A (en) 2022-01-17 2022-01-17 Bilstm multi-label text classification method based on attention mechanism

Country Status (1)

Country Link
CN (1) CN114398488A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115905533A (en) * 2022-11-24 2023-04-04 重庆邮电大学 Intelligent multi-label text classification method
CN116562251A (en) * 2023-05-19 2023-08-08 中国矿业大学(北京) Form classification method for stock information disclosure long document

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109582789A (en) * 2018-11-12 2019-04-05 北京大学 Text multi-tag classification method based on semantic primitive information
CN110019653A (en) * 2019-04-08 2019-07-16 北京航空航天大学 A kind of the social content characterizing method and system of fusing text and label network
CN110209823A (en) * 2019-06-12 2019-09-06 齐鲁工业大学 A kind of multi-tag file classification method and system
CN110442723A (en) * 2019-08-14 2019-11-12 山东大学 A method of multi-tag text classification is used for based on the Co-Attention model that multistep differentiates
US20200327381A1 (en) * 2019-04-10 2020-10-15 International Business Machines Corporation Evaluating text classification anomalies predicted by a text classification model
CN113626589A (en) * 2021-06-18 2021-11-09 电子科技大学 Multi-label text classification method based on mixed attention mechanism

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109582789A (en) * 2018-11-12 2019-04-05 北京大学 Text multi-tag classification method based on semantic primitive information
CN110019653A (en) * 2019-04-08 2019-07-16 北京航空航天大学 A kind of the social content characterizing method and system of fusing text and label network
US20200327381A1 (en) * 2019-04-10 2020-10-15 International Business Machines Corporation Evaluating text classification anomalies predicted by a text classification model
CN110209823A (en) * 2019-06-12 2019-09-06 齐鲁工业大学 A kind of multi-tag file classification method and system
CN110442723A (en) * 2019-08-14 2019-11-12 山东大学 A method of multi-tag text classification is used for based on the Co-Attention model that multistep differentiates
CN113626589A (en) * 2021-06-18 2021-11-09 电子科技大学 Multi-label text classification method based on mixed attention mechanism

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
YANRU DONG等: "A Fusion Model-Based Label Embedding and Self-Interaction Attention for Text Classification", 《IEEE ACCESS 》, vol. 8, 21 November 2019 (2019-11-21), pages 30548, XP011772629, DOI: 10.1109/ACCESS.2019.2954985 *
刘杰等: "融合注意力机制的多标签文本分类", 《微电子学与计算机》, 4 January 2024 (2024-01-04), pages 26 - 34 *
孙伟: "基于注意力和图卷积的多标签文本分类研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》, no. 09, 15 September 2021 (2021-09-15), pages 138 - 714 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115905533A (en) * 2022-11-24 2023-04-04 重庆邮电大学 Intelligent multi-label text classification method
CN115905533B (en) * 2022-11-24 2023-09-19 湖南光线空间信息科技有限公司 Multi-label text intelligent classification method
CN116562251A (en) * 2023-05-19 2023-08-08 中国矿业大学(北京) Form classification method for stock information disclosure long document

Similar Documents

Publication Publication Date Title
CN111783462B (en) Chinese named entity recognition model and method based on double neural network fusion
CN110609897B (en) Multi-category Chinese text classification method integrating global and local features
CN110298037B (en) Convolutional neural network matching text recognition method based on enhanced attention mechanism
Sivakumar et al. Review on word2vec word embedding neural net
CN112733541A (en) Named entity identification method of BERT-BiGRU-IDCNN-CRF based on attention mechanism
CN113626589B (en) Multi-label text classification method based on mixed attention mechanism
CN111078833B (en) Text classification method based on neural network
CN113255294B (en) Named entity recognition model training method, recognition method and device
CN112749274B (en) Chinese text classification method based on attention mechanism and interference word deletion
CN111309918A (en) Multi-label text classification method based on label relevance
CN114398488A (en) Bilstm multi-label text classification method based on attention mechanism
CN113657115A (en) Multi-modal Mongolian emotion analysis method based on ironic recognition and fine-grained feature fusion
CN115130591A (en) Cross supervision-based multi-mode data classification method and device
CN115238693A (en) Chinese named entity recognition method based on multi-word segmentation and multi-layer bidirectional long-short term memory
CN115098673A (en) Business document information extraction method based on variant attention and hierarchical structure
CN111325036A (en) Emerging technology prediction-oriented evidence fact extraction method and system
Tarride et al. A comparative study of information extraction strategies using an attention-based neural network
CN114048314A (en) Natural language steganalysis method
CN116956228A (en) Text mining method for technical transaction platform
CN116562291A (en) Chinese nested named entity recognition method based on boundary detection
CN115906846A (en) Document-level named entity identification method based on double-graph hierarchical feature fusion
CN113590819B (en) Large-scale category hierarchical text classification method
Zhang et al. Hierarchical attention networks for grid text classification
CN114722818A (en) Named entity recognition model based on anti-migration learning
CN114239584A (en) Named entity identification method based on self-supervision learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination