CN114896388A - Hierarchical multi-label text classification method based on mixed attention - Google Patents
Hierarchical multi-label text classification method based on mixed attention Download PDFInfo
- Publication number
- CN114896388A CN114896388A CN202210216140.7A CN202210216140A CN114896388A CN 114896388 A CN114896388 A CN 114896388A CN 202210216140 A CN202210216140 A CN 202210216140A CN 114896388 A CN114896388 A CN 114896388A
- Authority
- CN
- China
- Prior art keywords
- label
- text
- node
- hierarchical
- attention
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2415—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/243—Classification techniques relating to the number of classes
- G06F18/24323—Tree-organised classifiers
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/253—Fusion techniques of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/12—Use of codes for handling textual entities
- G06F40/126—Character encoding
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/12—Use of codes for handling textual entities
- G06F40/151—Transformation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/166—Editing, e.g. inserting or deleting
- G06F40/183—Tabulation, i.e. one-dimensional positioning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/216—Parsing using statistical methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/237—Lexical tools
- G06F40/242—Dictionaries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Data Mining & Analysis (AREA)
- General Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Probability & Statistics with Applications (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Databases & Information Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention provides a hierarchical multi-label text classification method based on mixed attention, which uses pre-trained word vectors as word embedding and uses Bi-GRU to perform primary feature extraction on the input word embedding; modeling a hierarchical label structure system by using a graph convolution neural network, and generating a label representation containing label relevance; local feature extraction with different granularities is further carried out on the output of the Bi-GRU by using a plurality of convolution neural networks with different convolution kernel sizes, the output is spliced into a text feature after being subjected to maximum pooling, and the text feature is further subjected to feature extraction by using attention expressed based on labels; and meanwhile, performing global feature extraction on the output of the Bi-GRU by using a self-attention mechanism, performing self-adaptive fusion on the text features expressed based on the labels and the text features expressed based on the self-attention mechanism to obtain text expression based on mixed attention, extracting information among the labels through a relation network, and obtaining a final classification result through a multilayer perceptron.
Description
Technical Field
The invention relates to the technical field of computer information and the field of natural language processing, in particular to a hierarchical multi-label text classification method based on mixed attention.
Background
With the advent of the internet era, people can more conveniently contact various information, meanwhile, various media data are continuously generated, which provides a basic condition for mining valuable data on the internet, and is undoubtedly a waste if an efficient management mode and a knowledge acquisition means are lacked for massive data. In data mining, the text classification problem is one of the core problems.
The task of multi-label text classification is to select a subset of the given set of labels that is most relevant to the text content. In an actual scene, a plurality of data are related to a plurality of tags in a tag set, and the tags can simply show specific contents of the data, so that people can more conveniently and effectively manage mass data and further analyze the data. The hierarchical multi-label text classification is a special case of multi-label text classification, a label system of the hierarchical multi-label text classification has a hierarchical structure, a general multi-label text classification algorithm does not consider the influence of the hierarchical label structure on the classification effect, and does not fully utilize the associated information among text labels, so that the classification of the text labels is not accurate enough, and particularly, the classification effect of data with long tail distribution still has a larger promotion space. Meanwhile, most of the existing models focus on local features or global features of texts, and the local features and the global features are not comprehensively considered, so that important features related to classification are not captured sufficiently.
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides a hierarchical multi-label text classification method based on mixed attention, which aims to improve the performance of hierarchical multi-label text classification by utilizing a label hierarchical structure to perform label semantic representation and fully utilizing global and local semantic information of a text.
In order to achieve the purpose, the technical scheme adopted by the invention is as follows: a hierarchical multi-label text classification method based on mixed attention comprises the following steps:
s1, preprocessing multi-label text data; the text data is used for training a model and is composed of text contents and a corresponding label set, a tree graph is arranged among all label categories of the whole data set and has a hierarchical relation, for the tree graph, the tree graph is composed of a plurality of nodes, each node represents one label category, and a label corresponding to each sample text in the data set comes from a node on the label tree graph;
s2, aiming at the text labels, obtaining prior hierarchy information of a hierarchy classification system, wherein the prior hierarchy information refers to the prior probability of mutual dependence between the labels and can be obtained by calculating the transition probability between a father label and a son label;
s3, constructing a deep learning level multi-label text classification model;
the deep learning multi-label text classification model comprises a word embedding module, a text coding module, a label attention mechanism-based text representation module, an attention mechanism-based text representation module, a feature fusion module, a vector regression layer, a relation network module and a label probability prediction layer;
s4, inputting the text data after the data set preprocessing to model training; and after the model training is finished, classifying the multi-label texts by using the trained model.
In the above technical solution, step S1 includes performing data preprocessing on the samples in the data set D, and specifically includes the following steps: step 1.1, performing word segmentation, removing stop words and removing punctuation marks; step 1.2, counting word frequency word _ frequency in the text in the data set D, deleting words with the occurrence frequency less than X1, recording the filtered words, and constructing a word list. After the data set D is preprocessed, the data set D is divided into a training set, a verification set and a test set according to a certain proportion.
In the above technical solution, step S2 includes: for the data of the training set in data set D, assume parent node v i And child node v j There is a hierarchical path e between i,j Then the characteristic f (e) of the edge formed by the path of the parent and child nodes i,j ) From a priori probability p (U) j |U i ) And p (U) i |U j ) Represents:
f(e i,j ) The relation of two nodes is expressed, the relation is described by the transition probability or co-occurrence probability of the two nodes, and the transition probability of the two nodes respectively comprises the transition probability p (U) from a father node to a certain son node j |U i ) Transition probability p (U) of child node to parent node i |U j ) If only one child node exists under the parent tag node, the value is 1; if there are multiple sub-tags, then this value is less than 1 at this time, but their sum is 1; in the formula of U j And U i Respectively representing text samples marked as v j Node label and is marked as v i Node label, p (U) j |U i ) Is given by v i In the case marked v j Conditional probability of node label. P (U) j ∩U i ) Is { v j ,v i The probability of being marked at the same time.N j And N i Respectively representing a training set v j Node labels and v i The number of node labels.
In the above technical solution, step 3 further includes performing word embedding processing on the input text and the tag thereof by using a word embedding module, where the word embedding processing method specifically includes:
step 2.1, obtaining the preprocessed text sequence, and embedding words { x ] in the text into a dictionary table through the query words 1 ,x 2 ,...,x n Convert to word vector representation w 1 ,w 2 ,...,w n N refers to the number of words of the preprocessed text.
Step 2.2, obtaining a label set { l ] of the hierarchical multi-label text classification 1 ,l 2 ,...,l n Converting the label set into a label embedding set with dimension d { c } by means of kaiming coding 1 ,c 2 ,...,c n }。
In the above technical solution, step S3 further includes representing the word vector { w ] by the text encoding module 1 ,w 2 ,...,w n Performing coding processing, wherein the coding processing method specifically comprises the following steps:
word vector representation of text using Bi-GRU network w 1 ,w 2 ,...,w n Encoding to generate an implicit representation h with context semantic information 1 ,h 2 ,...,h n }. Then will implicitly represent h 1 ,h 2 ,...,h n Sending the data into three convolutions with different convolution kernel sizes, obtaining semantic vectors under three different receptive fields, and finally splicing the 3 semantic vectors into a new semantic expression vector S ═ S 1 ,s 2 ,…,s n }。
In the above technical solution, step S3 further includes representing { c ] for the tag vector by the tag encoding module 1 ,c 2 ,...,c n Performing encoding processing, wherein the label encoding processing method specifically comprises the following steps:
using single-layer GCN to represent the tag vector c 1 ,c 2 ,...,c n Coding to generate an implicit table with label level associated informationWhere M is { M ═ M 1 ,m 2 ,...,m n }. The realization process is as follows:
the hierarchical GCN aggregates data flows within top-down, bottom-up, and self-cycling edges. In hierarchical GCN, each directed edge represents a pair of label-related features, and the data streams are subjected to node transformation by using edge-wise linear transformation.
In order to realize node transformation, the invention uses a weighted adjacency matrix to represent the linear transformation, and the initial value of the weighted adjacency matrix is from the prior hierarchical information of the hierarchical classification system in the second step. Formally, the hierarchical GCN encodes its hidden state according to the relevant neighborhood of node k, where the neighborhood n (k) { n } k ,child(k),parent(k)},n k The hidden state of a node k is calculated in the following way:
in the above formula, v j ,v k Is a parameter that can be trained on,andis a trainable bias parameter; for u k,j And g k,j In other words, u may be k,j Understood as information between nodes k, j, g k,j Understood as a gated value, control u k,j Last pair of
The influence of node k; sigma refers to an activation function in deep learning, can be taken as sigmoid function, b l ∈R N×dim and b is g ∈R N Dim is the dimension of the vector and belongs to a predefined hyperparameter. d (j, k) represents the hierarchical direction from node j to node k, including top-down, bottom-up, and self-looping edges. Wherein, a k,j E.r denotes the hierarchical probability f d(k,j) (e kj ),f d(k,j) (e kj ) Refers to the transition probability from the kth node to the jth tag node. It is obtained by f (e _ (i, j)) as above, and the self-circulating edge takes the form of a k,k 1, used from top to bottomUsing from bottom to top f p (e j,k ) 1. The feature matrix F of the above edge is { a ═ a 0,0 ,a 0,1 ,…,a c-1,c-1 Denoted is a weighted adjacency matrix of the text label directed hierarchical graph. Finally, the output hidden state h of node k k Indicating that it corresponds to a tag representation of the hierarchy information.
In the above technical solution, step S3 further includes a text representation module based on the label attention mechanism: for text representation from text coding layerAnd label representation from a label coding layerd c The dimensional size of the text encoding vector is represented, and is a predetermined fixed value, and the text representation based on the attention of the label is calculated by the following formula:
wherein alpha is kj Representing the amount of information for the jth text feature vector for the kth tag. v. of k I.e. a text representation based on the tag attention.
In the foregoing technical solution, step S3 further includes a self-attention mechanism-based text representation module: hidden layer text representation of Bi-GRU output from text encoding layerA textual representation based on the self-attention mechanism is calculated by the following formula:
wherein w 1 ,w 2 H is a text representation, α, as a parameter kt Is the weight, u, occupied by the t-th vector in the text representation k Is a textual representation based on the self-attention mechanism.
In the above technical solution, step S3 further includes a feature fusion module: the text features based on the label attention mechanism and the text features based on the self-attention mechanism are subjected to self-adaptive fusion to obtain final text features d ik-fusio The calculation method is as follows:
wherein w 1 ,w 2 Is a parameter, v k For textual representation based on tag attention, u k For self-attention based text representation, β k Is v is k The occupied weight.
In the above technical solution, step S3 further includes the relationship network module further mining the association information between the tags: the mining method is to fuse the text features d generated by the feature fusion module ik-fusion Inputting the data into a full connection layer to obtain a logits vector O ═ O corresponding to each label 1 ,o 2 ,...,o n And inputting the vector O into a relational network module to obtain a prediction vector y ═ y 1 ,y 2 ,...,y n And finally, inputting the prediction vector y into a multilayer perceptron to obtain the prediction probability of the label, wherein the essence of the relational network is a residual network。
In the above technical solution, step S4 includes that a cross entropy loss function is used in the training process, and an Adam optimizer is used for training, where the cross entropy loss function of multi-label text classification is as follows:
wherein, y ij The actual probability of the ith sample to the jth label,and finally obtaining a trained deep learning multi-label text classification model for the prediction probability of the ith sample to the jth label, wherein L refers to the number of label categories, and N refers to the number of sample texts.
The invention has the following advantages and beneficial effects:
the invention uses Bi-GRU combined with CNN to extract the semantic representation of the text, and can more fully obtain the local semantic information of the text. The hierarchical information of hierarchical multi-label classification is characterized through the graph neural network, and label representation with hierarchical association information can be obtained. The method and the device use a self-attention mechanism to extract the semantic representation of the text, and can obtain the semantic representation of the global association of the text. The invention uses the self-adaptive fusion of the text features based on the label representation and the text features based on the self-attention representation, and can obtain the text representation of the global, local text and the label information. The invention uses the relation network in the last layer of the model, so that the original label prediction vector can further obtain the label relevance.
The invention comprises four aspects: firstly, extracting a label representation containing a hierarchical relation by using a graph convolutional neural network; secondly, extracting local features by using a plurality of convolutions with different granularities; thirdly, text features are further extracted and adaptively Fused (FA) by using a label-based attention mechanism and a self-attention-based mechanism. Fourthly, the relationship network is used for further extracting the tag relevance. According to the hierarchical multi-label classification method based on mixed attention, the text features of the input text to be classified are extracted, then the text is classified through the multilayer perceptron, one or more labels can be marked on the text, and the method can be widely applied to the fields of E-commerce, news, scientific and technical papers and the like.
Drawings
FIG. 1 is a flow chart of a hierarchical multi-label text classification method based on mixed attention according to the present invention;
FIG. 2 is a network structure diagram of a hierarchical multi-label text classification model based on mixed attention according to the present invention;
FIG. 3 is a schematic diagram of a hierarchical structure of hierarchical multi-label text classification labels according to the present invention;
FIG. 4 is a schematic diagram of the hierarchical multi-label text classification graph convolutional neural network calculation according to the present invention;
FIG. 5 is a diagram of a relational network according to the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.
In order to achieve the purpose, the technical scheme adopted by the invention is as follows: a hierarchical multi-label text classification method based on mixed attention comprises the following steps:
step S1, preprocessing the multi-label text data in the data set D;
step S2, acquiring prior hierarchy information of a hierarchy classification system aiming at the text labels, wherein the prior hierarchy information refers to prior probability of mutual dependence between the labels and can be obtained by calculating transition probability between a father label and a son label;
step S3, constructing a deep learning level multi-label text classification model;
the deep learning multi-label text classification model comprises a word embedding module, a text coding module, a label coding module, a text representation module based on a label attention mechanism, a text representation module based on an attention mechanism, a feature fusion module, a vector regression layer, a relation network module and a label probability prediction layer;
and step S4, inputting the text data after the data set preprocessing to model training, and after the model training is finished, classifying the multi-label text by using the trained model.
Preferably, step S1 includes the steps of:
the method for preprocessing the data of the samples in the data set D specifically comprises the following steps:
step 1-1, performing word segmentation, stop word removal and punctuation removal on a text in a data set D;
step 1-2, counting word frequency word _ frequency in the text in the data set D, deleting words with the occurrence frequency less than X1, recording the filtered words, and constructing a word list.
1-3, preprocessing a data set D, and then, processing the data set D according to the ratio of 3: 1: 1, dividing the training set, the verification set and the test set.
Preferably, step S2 includes the steps of:
for the data of the training set in the data set D, assume the parent node v i And child node v j There is a hierarchical path e between i,j . Then the characteristic f (e) of the edge formed by the path of the parent and child nodes i,j ) From a priori probability p (U) j |U i ) And p (U) i |U j ) Represents:
f(e i,j ) The relation of two nodes is represented, the relation is described by the transition probability or the co-occurrence probability of the two nodes, and the transition probability of the two nodes respectively comprises the transition probability p (U) from a parent node to a child node j |U i ) Transfer of child node to parent nodeRate p (U) i |U j ) In the formula, U j And U i Respectively representing text data marked v j Node labels and are labeled v i Node label, p (U) j |U i ) Is given by v i In the case marked v j Conditional probability of node label, P (U) j ∩U i ) Is { v j ,v i Probability of being marked simultaneously, N j And N i Respectively representing a training set v j Node label and v i The number of node labels.
Preferably, step S3 further includes the steps of:
the method comprises the following steps of carrying out word embedding processing on an input text and a label thereof through a word embedding module, wherein the word embedding processing method specifically comprises the following steps:
step 2-1, obtaining the preprocessed text sequence, and embedding words { x ] in the text through a query word embedding table (Glove-300d) 1 ,x 2 ,...,x n Convert to word vector representation w 1 ,w 2 ,...,w n }。
Step 2-2, obtaining a label set { l ] of the hierarchical multi-label text classification 1 ,l 2 ,...,l n Converting the label set into a label embedded set with 300 dimensionality by means of kaiming coding { c } 1 ,c 2 ,...,c n N refers to the number of words of the preprocessed text.
Preferably, step S3 further includes the steps of:
representing a word vector by an encoding module w 1 ,w 2 ,...,w n Performing coding processing, wherein the coding processing method specifically comprises the following steps:
word vector representation of text using Bi-GRU network w 1 ,w 2 ,...,w n Encoding to generate an implicit representation h with context semantic information 1 ,h 2 ,...,h n }. Then will implicitly represent h 1 ,h 2 ,...,h n Sending the data to convolution with convolution kernels of 2, 3 and 4 and hidden layer number of 100 respectively to obtain semantic vectors of three different receptive fields, and respectively processing the semantic vectors through the maximumAfter pooling, the 3 semantic vectors are spliced into a new semantic representation vector S with the dimensionality of 300 (S ═ S) 1 ,s 2 ,…,s n }。
Preferably, step S3 further includes the steps of:
representing the tag vector by a tag encoding module { c } 1 ,c 2 ,...,c n Encoding processing is carried out, and the label encoding processing method specifically comprises the step of representing the label vector by using single-layer GCN (generalized notation for symbols c) 1 ,c 2 ,...,c n Encoding to generate an implicit expression M-M with label level associated information 1 ,m 2 ,...,m n }. The realization process is as follows:
the hierarchical GCN aggregates data flows within top-down, bottom-up, and self-cycling edges. In hierarchical GCN, each directed edge represents a pair of label-related features, and the data streams are subjected to node transformation by using edge-wise linear transformation.
To implement the nodal transformation, the present invention uses a weighted adjacency matrix to represent the linear transformation, and the initial value of the weighted adjacency matrix is derived from the prior level information of the level classification system in step S2. Formally, the hierarchical GCN encodes its hidden state according to the relevant neighborhood of node k, where the neighborhood n (k) { n } k ,child(k),parent(k)},n k Referring to the kth label node in the hierarchical label tree, child (k) refers to the child label node of the kth node, parent (k) refers to the parent label node of the kth node, and the hidden state of the node k is calculated as follows:
in the above formula, v j ,v k Is a parameter that can be trained on,andis a trainable bias parameter;for u k,j And g k,j In other words, u may be k,j Understood as information between nodes k, j, g k,j Understood as a gated value, control u k,j Finally, the influence on the node k is large; sigma means that the activation function in deep learning can be taken as sigmoid function,b l ∈R N×dim and b is g ∈R N Dim is the dimension of the vector and belongs to a predefined hyperparameter; d (j, k) represents the hierarchical direction from node j to node k, including top-down, bottom-up, and self-looping edges; wherein, a k,j E.r denotes the hierarchical probability f d(k,j) (e kj ),f d(k,j) (e kj ) Refers to the transition probability from the kth node to the jth tag node, which is determined by f (e) i,j ) Obtained by self-circulation of the edge with a k,k 1, used from top to bottomUsing from bottom to top f p (e j,k ) 1 is ═ 1; the feature matrix F of the above edge is { a ═ a 0,0 ,a 0,1 ,…,a c-1,c-1 The expression is the weighted adjacency matrix of the text label directed hierarchical graph, and finally, the output hidden state h of the node k k Indicating that it corresponds to a tag representation of the hierarchy information.
Preferably, step S3 further includes the steps of:
the extraction method of the text representation module based on the label attention mechanism comprises the following steps: for text representation from text coding layerAnd label representation from a label coding layerThe tag attention based text representation is calculated by the following formula:
wherein alpha is kj Representing the amount of information for the jth text feature vector for the kth tag. v. of k I.e. a text representation based on the tag attention.
Preferably, step S3 further includes the steps of:
the extraction method based on the self-attention mechanism text representation module specifically comprises the following steps: hidden layer text representation of Bi-GRU output from text encoding layerA textual representation based on the self-attention mechanism is calculated by the following formula:
wherein w 1 ,w 2 H is a text representation, α, as a parameter kt Is the weight, u, occupied by the t-th vector in the text representation k Is a textual representation based on the self-attention mechanism.
Preferably, step S3 further includes the steps of:
the characteristic fusion module is as follows: the text features based on the label attention mechanism and the text features based on the self-attention mechanism are subjected to self-adaptive fusion to obtain final text features d ik-fusion The calculation method is as follows:
wherein w 1 ,w 2 As a parameter, v k For text representation based on tag attention, u k For self-attention based text representation, β k Is v is k The occupied weight.
Preferably, step S3 further includes the steps of:
pairing tags using relational network modulesFurther mining the associated information between: the mining method is to fuse the text features d generated by the feature fusion module ik-fusion Inputting the data into a full connection layer to obtain a logits vector O ═ O corresponding to each label 1 ,o 2 ,...,o n And inputting the vector O into a relational network module to obtain a prediction vector y ═ y 1 ,y 2 ,...,y n And finally, inputting the prediction vector y into a multilayer perceptron to obtain the label prediction probability, wherein the essence of the relationship network is a residual error network.
Preferably, step S4 includes the steps of:
in the training process, a cross entropy loss function is required to be used, an Adam optimizer is used for training, and the cross entropy loss function of multi-label text classification is as follows:
wherein, y ij And finally obtaining a trained deep learning multi-label text classification model for the actual probability of the ith sample to the jth label and the prediction probability of the ith sample to the jth label, wherein L refers to the number of label categories, and N refers to the number of text samples.
The specific embodiments described herein are merely illustrative of the spirit of the invention. Various modifications or additions may be made to the described embodiments or alternatives may be employed by those skilled in the art without departing from the spirit or ambit of the invention as defined in the appended claims.
Claims (10)
1. A hierarchical multi-label text classification method based on mixed attention is characterized by comprising the following steps:
step S1, preprocessing the multi-label text data in the data set D;
step S2, acquiring prior hierarchy information of a hierarchy classification system aiming at the text labels, wherein the prior hierarchy information refers to prior probability of mutual dependence between the labels and can be obtained by calculating transition probability between a father label and a son label;
step S3, constructing a deep learning level multi-label text classification model;
the deep learning multi-label text classification model comprises a word embedding module, a text coding module, a label coding module, a text representation module based on a label attention mechanism, a text representation module based on an attention mechanism, a feature fusion module, a vector regression layer, a relation network module and a label probability prediction layer;
and step S4, inputting the text data after the data set preprocessing to model training, and after the model training is finished, classifying the multi-label text by using the trained model.
2. The method of claim 1, wherein the hierarchical multi-label text classification based on label attention comprises: in step S1, preprocessing the text data in the data set D includes the following steps:
step 1.1, performing word segmentation, removing stop words and removing punctuation marks;
step 1.2, counting word frequency word _ frequency in the text in the data set D, deleting words with the occurrence frequency less than X1, recording the filtered words, and constructing a word list.
After the data set D is preprocessed, the data set D is divided into a training set, a verification set and a test set according to a certain proportion.
3. The method of claim 1, wherein the hierarchical multi-label text classification based on label attention comprises: the specific implementation of the step S2 includes;
for data in dataset D, assume parent node v i And child node v j There is a hierarchical path e between i,j Then the characteristic f (e) of the edge formed by the path of the parent and child nodes i,j ) From a priori probability p (U) j |U i ) And p (U) i |U j ) Represents:
f(e i,j ) The relation of two nodes is represented, the relation is described by the transition probability or the co-occurrence probability of the two nodes, and the transition probability of the two nodes respectively comprises the transition probability p (U) from a parent node to a child node j |U i ) Transition probability p (U) of child node to parent node i |U j ) In the formula, U j And U i Respectively representing text data marked v j Node label and is marked as v i Node label, p (U) j |U i ) Is given by v i In the case marked v j Conditional probability of node label, P (U) j ∩U i ) Is { v j ,v i Probability of being marked simultaneously, N j And N i Respectively representing a training set v j Node labels and v i The number of node labels.
4. The method of claim 3, wherein the hierarchical multi-label text classification based on label attention comprises: in step S3, word embedding processing is performed on the input text and the tag thereof by the word embedding module, and the word embedding processing method specifically includes:
step 2.1, obtaining the preprocessed text sequence, and embedding words { x ] in the text into a dictionary table through the query words 1 ,x 2 ,...,x n Convert to word vector representation w 1 ,w 2 ,...,w n };
Step 2.2, obtaining a label set { l ] of the hierarchical multi-label text classification 1 ,l 2 ,...,l n Converting the label set into a label embedding set with dimension d { c } by means of kaiming coding 1 ,c 2 ,...,c n N refers to the number of words of the preprocessed text;
5. the method of claim 4, wherein the hierarchical multi-label text classification based on label attention comprises: in step S3Representing the word vector by the text encoding module w 1 ,w 2 ,...,w n Performing coding processing, wherein the coding processing method specifically comprises the following steps:
word vector representation of text using Bi-GRU network w 1 ,w 2 ,...,w n Encode, generating an implicit representation h with context semantic information 1 ,h 2 ,...,h n Will then implicitly represent h 1 ,h 2 ,...,h n Sending the data into three convolutions with different convolution kernel sizes, obtaining semantic vectors under three different receptive fields, and finally splicing the 3 semantic vectors into a new semantic expression vector S ═ S 1 ,s 2 ,...,s n }。
In step S3, label vector is represented by label encoding module { c 1 ,c 2 ,...,c n Performing encoding processing, wherein the label encoding processing method specifically comprises the following steps:
using single-layer GCN to represent the tag vector c 1 ,c 2 ,...,c n Encoding to generate an implicit expression M ═ M with label hierarchy association information 1 ,m 2 ,...,m n The implementation process is as follows:
hierarchical GCN aggregates data flows from top to bottom, bottom to top and within self-looping edges, in hierarchical GCN each directed edge represents a pair of label related features, these data flows are node transformed using edgewise linear transformation;
to implement the node transformation, a weighted adjacency matrix is used to represent the linear transformation, and the initial value of the weighted adjacency matrix comes from the prior hierarchical information of the hierarchical classification system in step S2, and the hierarchical GCN formally encodes its hidden state according to the relevant neighborhood of node k, where the neighborhood n (k) { n } n k ,child(k),parent(k)},n k Referring to the kth label node in the hierarchical label tree, child (k) refers to the child label node of the kth node, parent (k) refers to the parent label node of the kth node, and the hidden state of the node k is calculated as follows:
in the above formula, v j ,v k Is a parameter that can be trained on,andis a trainable bias parameter; for u k,j And g k,j In other words, u may be k,j Understood as information between nodes k, j, g k,j Understood as a gated value, control u k,j Finally, the influence on the node k is large; sigma means that the activation function in deep learning can be taken as sigmoid function,b l ∈R N×dim and b is g ∈R N Dim is the dimension of the vector and belongs to a predefined hyperparameter; d (j, k) represents the hierarchical direction from node j to node k, including top-down, bottom-up, and self-circulating edges; wherein, a k,j E.r denotes the hierarchical probability f d(k,j) (e kj ),f d(k,j) (e kj ) Refers to the transition probability from the kth node to the jth tag node, which is determined by f (e) i,j ) Is obtained by self-circulation side using a k,k 1, used from top to bottomUsing from bottom to top f p (e j,k ) 1 is ═ 1; the feature matrix F of the above edge is { a ═ a 0,0 ,a 0,1 ,...,a c-1,c-1 The expression is the weighted adjacency matrix of the text label directed hierarchical graph, and finally, the output hidden state h of the node k k Indicating that it corresponds to a tag representation of the hierarchy information.
6. The method of claim 5, wherein the hierarchical multi-label text classification based on label attention comprises: the extraction method of the text representation module based on the label attention mechanism in the step S3 is as follows: for text representation from text coding layerAnd label representation from a label coding layerd c The dimensional size of the text encoding vector is represented, and is a predetermined fixed value, and the text representation based on the attention of the label is calculated by the following formula:
wherein alpha is kj Represents the amount of information, v, of the jth text feature vector to the kth tag k I.e. a text representation based on the tag attention.
7. The method of claim 6, wherein the hierarchical multi-label text classification based on label attention comprises: the extraction method based on the self-attention mechanism text representation module in step S3 specifically includes: hidden layer text representation of Bi-GRU output from text encoding layerA textual representation based on the self-attention mechanism is calculated by the following formula:
wherein w 1 ,w 2 H is a text representation, α, as a parameter kt In a text representationWeight, u, occupied by the t-th vector k Is a textual representation based on the self-attention mechanism.
8. The method of claim 7, wherein the hierarchical multi-label text classification based on label attention comprises: the feature fusion module in step S3 is: the text features based on the label attention mechanism and the text features based on the self-attention mechanism are subjected to self-adaptive fusion to obtain final text features d ik-fusion The calculation method is as follows:
wherein w 1 ,w 2 Is a parameter, v k For text representation based on tag attention, u k For self-attention based text representation, β k Is v is k The occupied weight.
9. The method of claim 8, wherein the hierarchical multi-label text classification based on label attention comprises: the relationship network module in step S3 further mines the association information between the tags: the mining method is to fuse the text features d generated by the feature fusion module ik-fusion Inputting the data into a full connection layer to obtain a logits vector O ═ O corresponding to each label 1 ,o 2 ,...,o n And inputting the vector O into a relational network module to obtain a prediction vector y ═ y 1 ,y 2 ,...,y n And finally, inputting the prediction vector y into a multilayer perceptron to obtain the label prediction probability, wherein the nature of the relational network is a residual network.
10. The label attention based hierarchical multi-label text classification method according to claim 1, characterized in that: in the training process in the step S4, a cross entropy loss function is used, and an Adam optimizer is used for training, and the cross entropy loss function of multi-label text classification is as follows:
wherein, y ij The actual probability of the ith sample to the jth label,and finally obtaining a trained deep learning multi-label text classification model for the prediction probability of the ith sample to the jth label, wherein L refers to the number of label categories, and N refers to the number of text samples.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210216140.7A CN114896388A (en) | 2022-03-07 | 2022-03-07 | Hierarchical multi-label text classification method based on mixed attention |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210216140.7A CN114896388A (en) | 2022-03-07 | 2022-03-07 | Hierarchical multi-label text classification method based on mixed attention |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114896388A true CN114896388A (en) | 2022-08-12 |
Family
ID=82714905
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210216140.7A Pending CN114896388A (en) | 2022-03-07 | 2022-03-07 | Hierarchical multi-label text classification method based on mixed attention |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114896388A (en) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115374285A (en) * | 2022-10-26 | 2022-11-22 | 思创数码科技股份有限公司 | Government affair resource catalog theme classification method and system |
CN115757823A (en) * | 2022-11-10 | 2023-03-07 | 魔方医药科技(苏州)有限公司 | Data processing method and device, electronic equipment and storage medium |
CN116089618A (en) * | 2023-04-04 | 2023-05-09 | 江西师范大学 | Drawing meaning network text classification model integrating ternary loss and label embedding |
CN116187419A (en) * | 2023-04-25 | 2023-05-30 | 中国科学技术大学 | Automatic hierarchical system construction method based on text chunks |
CN116304845A (en) * | 2023-05-23 | 2023-06-23 | 云筑信息科技(成都)有限公司 | Hierarchical classification and identification method for building materials |
CN116542252A (en) * | 2023-07-07 | 2023-08-04 | 北京营加品牌管理有限公司 | Financial text checking method and system |
CN116932765A (en) * | 2023-09-15 | 2023-10-24 | 中汽信息科技(天津)有限公司 | Patent text multi-stage classification method and equipment based on graphic neural network |
CN117453921A (en) * | 2023-12-22 | 2024-01-26 | 南京华飞数据技术有限公司 | Data information label processing method of large language model |
-
2022
- 2022-03-07 CN CN202210216140.7A patent/CN114896388A/en active Pending
Cited By (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115374285A (en) * | 2022-10-26 | 2022-11-22 | 思创数码科技股份有限公司 | Government affair resource catalog theme classification method and system |
CN115374285B (en) * | 2022-10-26 | 2023-02-07 | 思创数码科技股份有限公司 | Government affair resource catalog theme classification method and system |
CN115757823A (en) * | 2022-11-10 | 2023-03-07 | 魔方医药科技(苏州)有限公司 | Data processing method and device, electronic equipment and storage medium |
CN115757823B (en) * | 2022-11-10 | 2024-03-05 | 魔方医药科技(苏州)有限公司 | Data processing method, device, electronic equipment and storage medium |
CN116089618A (en) * | 2023-04-04 | 2023-05-09 | 江西师范大学 | Drawing meaning network text classification model integrating ternary loss and label embedding |
CN116089618B (en) * | 2023-04-04 | 2023-06-27 | 江西师范大学 | Drawing meaning network text classification model integrating ternary loss and label embedding |
CN116187419A (en) * | 2023-04-25 | 2023-05-30 | 中国科学技术大学 | Automatic hierarchical system construction method based on text chunks |
CN116187419B (en) * | 2023-04-25 | 2023-08-29 | 中国科学技术大学 | Automatic hierarchical system construction method based on text chunks |
CN116304845B (en) * | 2023-05-23 | 2023-08-18 | 云筑信息科技(成都)有限公司 | Hierarchical classification and identification method for building materials |
CN116304845A (en) * | 2023-05-23 | 2023-06-23 | 云筑信息科技(成都)有限公司 | Hierarchical classification and identification method for building materials |
CN116542252A (en) * | 2023-07-07 | 2023-08-04 | 北京营加品牌管理有限公司 | Financial text checking method and system |
CN116542252B (en) * | 2023-07-07 | 2023-09-29 | 北京营加品牌管理有限公司 | Financial text checking method and system |
CN116932765A (en) * | 2023-09-15 | 2023-10-24 | 中汽信息科技(天津)有限公司 | Patent text multi-stage classification method and equipment based on graphic neural network |
CN116932765B (en) * | 2023-09-15 | 2023-12-08 | 中汽信息科技(天津)有限公司 | Patent text multi-stage classification method and equipment based on graphic neural network |
CN117453921A (en) * | 2023-12-22 | 2024-01-26 | 南京华飞数据技术有限公司 | Data information label processing method of large language model |
CN117453921B (en) * | 2023-12-22 | 2024-02-23 | 南京华飞数据技术有限公司 | Data information label processing method of large language model |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN114896388A (en) | Hierarchical multi-label text classification method based on mixed attention | |
CN111914558B (en) | Course knowledge relation extraction method and system based on sentence bag attention remote supervision | |
CN110020438B (en) | Sequence identification based enterprise or organization Chinese name entity disambiguation method and device | |
CN109783818B (en) | Enterprise industry classification method | |
CN112732916B (en) | BERT-based multi-feature fusion fuzzy text classification system | |
Zhang et al. | Aspect-based sentiment analysis for user reviews | |
CN113516198B (en) | Cultural resource text classification method based on memory network and graphic neural network | |
CN113806547B (en) | Deep learning multi-label text classification method based on graph model | |
CN112749274A (en) | Chinese text classification method based on attention mechanism and interference word deletion | |
CN113515632A (en) | Text classification method based on graph path knowledge extraction | |
CN112163089B (en) | High-technology text classification method and system integrating named entity recognition | |
CN113987187A (en) | Multi-label embedding-based public opinion text classification method, system, terminal and medium | |
CN114925205B (en) | GCN-GRU text classification method based on contrast learning | |
CN116304066A (en) | Heterogeneous information network node classification method based on prompt learning | |
CN115952794A (en) | Chinese-Tai cross-language sensitive information recognition method fusing bilingual sensitive dictionary and heterogeneous graph | |
CN112732872A (en) | Biomedical text-oriented multi-label classification method based on subject attention mechanism | |
CN111651597A (en) | Multi-source heterogeneous commodity information classification method based on Doc2Vec and convolutional neural network | |
CN115292490A (en) | Analysis algorithm for policy interpretation semantics | |
CN113590827B (en) | Scientific research project text classification device and method based on multiple angles | |
CN111709225A (en) | Event cause and effect relationship judging method and device and computer readable storage medium | |
CN113051886B (en) | Test question duplicate checking method, device, storage medium and equipment | |
CN117787283A (en) | Small sample fine granularity text named entity classification method based on prototype comparison learning | |
CN115795037B (en) | Multi-label text classification method based on label perception | |
CN116756605A (en) | ERNIE-CN-GRU-based automatic speech step recognition method, system, equipment and medium | |
CN116956228A (en) | Text mining method for technical transaction platform |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |