CN114896388A - Hierarchical multi-label text classification method based on mixed attention - Google Patents

Hierarchical multi-label text classification method based on mixed attention Download PDF

Info

Publication number
CN114896388A
CN114896388A CN202210216140.7A CN202210216140A CN114896388A CN 114896388 A CN114896388 A CN 114896388A CN 202210216140 A CN202210216140 A CN 202210216140A CN 114896388 A CN114896388 A CN 114896388A
Authority
CN
China
Prior art keywords
label
text
node
hierarchical
attention
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210216140.7A
Other languages
Chinese (zh)
Inventor
马小林
钟港
旷海兰
刘新华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan University of Technology WUT
Original Assignee
Wuhan University of Technology WUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan University of Technology WUT filed Critical Wuhan University of Technology WUT
Priority to CN202210216140.7A priority Critical patent/CN114896388A/en
Publication of CN114896388A publication Critical patent/CN114896388A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/126Character encoding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/151Transformation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/166Editing, e.g. inserting or deleting
    • G06F40/183Tabulation, i.e. one-dimensional positioning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/242Dictionaries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a hierarchical multi-label text classification method based on mixed attention, which uses pre-trained word vectors as word embedding and uses Bi-GRU to perform primary feature extraction on the input word embedding; modeling a hierarchical label structure system by using a graph convolution neural network, and generating a label representation containing label relevance; local feature extraction with different granularities is further carried out on the output of the Bi-GRU by using a plurality of convolution neural networks with different convolution kernel sizes, the output is spliced into a text feature after being subjected to maximum pooling, and the text feature is further subjected to feature extraction by using attention expressed based on labels; and meanwhile, performing global feature extraction on the output of the Bi-GRU by using a self-attention mechanism, performing self-adaptive fusion on the text features expressed based on the labels and the text features expressed based on the self-attention mechanism to obtain text expression based on mixed attention, extracting information among the labels through a relation network, and obtaining a final classification result through a multilayer perceptron.

Description

Hierarchical multi-label text classification method based on mixed attention
Technical Field
The invention relates to the technical field of computer information and the field of natural language processing, in particular to a hierarchical multi-label text classification method based on mixed attention.
Background
With the advent of the internet era, people can more conveniently contact various information, meanwhile, various media data are continuously generated, which provides a basic condition for mining valuable data on the internet, and is undoubtedly a waste if an efficient management mode and a knowledge acquisition means are lacked for massive data. In data mining, the text classification problem is one of the core problems.
The task of multi-label text classification is to select a subset of the given set of labels that is most relevant to the text content. In an actual scene, a plurality of data are related to a plurality of tags in a tag set, and the tags can simply show specific contents of the data, so that people can more conveniently and effectively manage mass data and further analyze the data. The hierarchical multi-label text classification is a special case of multi-label text classification, a label system of the hierarchical multi-label text classification has a hierarchical structure, a general multi-label text classification algorithm does not consider the influence of the hierarchical label structure on the classification effect, and does not fully utilize the associated information among text labels, so that the classification of the text labels is not accurate enough, and particularly, the classification effect of data with long tail distribution still has a larger promotion space. Meanwhile, most of the existing models focus on local features or global features of texts, and the local features and the global features are not comprehensively considered, so that important features related to classification are not captured sufficiently.
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides a hierarchical multi-label text classification method based on mixed attention, which aims to improve the performance of hierarchical multi-label text classification by utilizing a label hierarchical structure to perform label semantic representation and fully utilizing global and local semantic information of a text.
In order to achieve the purpose, the technical scheme adopted by the invention is as follows: a hierarchical multi-label text classification method based on mixed attention comprises the following steps:
s1, preprocessing multi-label text data; the text data is used for training a model and is composed of text contents and a corresponding label set, a tree graph is arranged among all label categories of the whole data set and has a hierarchical relation, for the tree graph, the tree graph is composed of a plurality of nodes, each node represents one label category, and a label corresponding to each sample text in the data set comes from a node on the label tree graph;
s2, aiming at the text labels, obtaining prior hierarchy information of a hierarchy classification system, wherein the prior hierarchy information refers to the prior probability of mutual dependence between the labels and can be obtained by calculating the transition probability between a father label and a son label;
s3, constructing a deep learning level multi-label text classification model;
the deep learning multi-label text classification model comprises a word embedding module, a text coding module, a label attention mechanism-based text representation module, an attention mechanism-based text representation module, a feature fusion module, a vector regression layer, a relation network module and a label probability prediction layer;
s4, inputting the text data after the data set preprocessing to model training; and after the model training is finished, classifying the multi-label texts by using the trained model.
In the above technical solution, step S1 includes performing data preprocessing on the samples in the data set D, and specifically includes the following steps: step 1.1, performing word segmentation, removing stop words and removing punctuation marks; step 1.2, counting word frequency word _ frequency in the text in the data set D, deleting words with the occurrence frequency less than X1, recording the filtered words, and constructing a word list. After the data set D is preprocessed, the data set D is divided into a training set, a verification set and a test set according to a certain proportion.
In the above technical solution, step S2 includes: for the data of the training set in data set D, assume parent node v i And child node v j There is a hierarchical path e between i,j Then the characteristic f (e) of the edge formed by the path of the parent and child nodes i,j ) From a priori probability p (U) j |U i ) And p (U) i |U j ) Represents:
Figure BDA0003534798700000021
f(e i,j ) The relation of two nodes is expressed, the relation is described by the transition probability or co-occurrence probability of the two nodes, and the transition probability of the two nodes respectively comprises the transition probability p (U) from a father node to a certain son node j |U i ) Transition probability p (U) of child node to parent node i |U j ) If only one child node exists under the parent tag node, the value is 1; if there are multiple sub-tags, then this value is less than 1 at this time, but their sum is 1; in the formula of U j And U i Respectively representing text samples marked as v j Node label and is marked as v i Node label, p (U) j |U i ) Is given by v i In the case marked v j Conditional probability of node label. P (U) j ∩U i ) Is { v j ,v i The probability of being marked at the same time.N j And N i Respectively representing a training set v j Node labels and v i The number of node labels.
In the above technical solution, step 3 further includes performing word embedding processing on the input text and the tag thereof by using a word embedding module, where the word embedding processing method specifically includes:
step 2.1, obtaining the preprocessed text sequence, and embedding words { x ] in the text into a dictionary table through the query words 1 ,x 2 ,...,x n Convert to word vector representation w 1 ,w 2 ,...,w n N refers to the number of words of the preprocessed text.
Step 2.2, obtaining a label set { l ] of the hierarchical multi-label text classification 1 ,l 2 ,...,l n Converting the label set into a label embedding set with dimension d { c } by means of kaiming coding 1 ,c 2 ,...,c n }。
In the above technical solution, step S3 further includes representing the word vector { w ] by the text encoding module 1 ,w 2 ,...,w n Performing coding processing, wherein the coding processing method specifically comprises the following steps:
word vector representation of text using Bi-GRU network w 1 ,w 2 ,...,w n Encoding to generate an implicit representation h with context semantic information 1 ,h 2 ,...,h n }. Then will implicitly represent h 1 ,h 2 ,...,h n Sending the data into three convolutions with different convolution kernel sizes, obtaining semantic vectors under three different receptive fields, and finally splicing the 3 semantic vectors into a new semantic expression vector S ═ S 1 ,s 2 ,…,s n }。
In the above technical solution, step S3 further includes representing { c ] for the tag vector by the tag encoding module 1 ,c 2 ,...,c n Performing encoding processing, wherein the label encoding processing method specifically comprises the following steps:
using single-layer GCN to represent the tag vector c 1 ,c 2 ,...,c n Coding to generate an implicit table with label level associated informationWhere M is { M ═ M 1 ,m 2 ,...,m n }. The realization process is as follows:
the hierarchical GCN aggregates data flows within top-down, bottom-up, and self-cycling edges. In hierarchical GCN, each directed edge represents a pair of label-related features, and the data streams are subjected to node transformation by using edge-wise linear transformation.
In order to realize node transformation, the invention uses a weighted adjacency matrix to represent the linear transformation, and the initial value of the weighted adjacency matrix is from the prior hierarchical information of the hierarchical classification system in the second step. Formally, the hierarchical GCN encodes its hidden state according to the relevant neighborhood of node k, where the neighborhood n (k) { n } k ,child(k),parent(k)},n k The hidden state of a node k is calculated in the following way:
Figure BDA0003534798700000031
in the above formula, v j ,v k Is a parameter that can be trained on,
Figure BDA0003534798700000032
and
Figure BDA0003534798700000033
is a trainable bias parameter; for u k,j And g k,j In other words, u may be k,j Understood as information between nodes k, j, g k,j Understood as a gated value, control u k,j Last pair of
The influence of node k; sigma refers to an activation function in deep learning, can be taken as sigmoid function,
Figure BDA0003534798700000034
Figure BDA0003534798700000041
b l ∈R N×dim and b is g ∈R N Dim is the dimension of the vector and belongs to a predefined hyperparameter. d (j, k) represents the hierarchical direction from node j to node k, including top-down, bottom-up, and self-looping edges. Wherein, a k,j E.r denotes the hierarchical probability f d(k,j) (e kj ),f d(k,j) (e kj ) Refers to the transition probability from the kth node to the jth tag node. It is obtained by f (e _ (i, j)) as above, and the self-circulating edge takes the form of a k,k 1, used from top to bottom
Figure BDA0003534798700000042
Using from bottom to top f p (e j,k ) 1. The feature matrix F of the above edge is { a ═ a 0,0 ,a 0,1 ,…,a c-1,c-1 Denoted is a weighted adjacency matrix of the text label directed hierarchical graph. Finally, the output hidden state h of node k k Indicating that it corresponds to a tag representation of the hierarchy information.
In the above technical solution, step S3 further includes a text representation module based on the label attention mechanism: for text representation from text coding layer
Figure BDA0003534798700000043
And label representation from a label coding layer
Figure BDA0003534798700000044
d c The dimensional size of the text encoding vector is represented, and is a predetermined fixed value, and the text representation based on the attention of the label is calculated by the following formula:
Figure BDA0003534798700000045
wherein alpha is kj Representing the amount of information for the jth text feature vector for the kth tag. v. of k I.e. a text representation based on the tag attention.
In the foregoing technical solution, step S3 further includes a self-attention mechanism-based text representation module: hidden layer text representation of Bi-GRU output from text encoding layer
Figure BDA0003534798700000046
A textual representation based on the self-attention mechanism is calculated by the following formula:
Figure BDA0003534798700000047
wherein w 1 ,w 2 H is a text representation, α, as a parameter kt Is the weight, u, occupied by the t-th vector in the text representation k Is a textual representation based on the self-attention mechanism.
In the above technical solution, step S3 further includes a feature fusion module: the text features based on the label attention mechanism and the text features based on the self-attention mechanism are subjected to self-adaptive fusion to obtain final text features d ik-fusio The calculation method is as follows:
Figure BDA0003534798700000051
wherein w 1 ,w 2 Is a parameter, v k For textual representation based on tag attention, u k For self-attention based text representation, β k Is v is k The occupied weight.
In the above technical solution, step S3 further includes the relationship network module further mining the association information between the tags: the mining method is to fuse the text features d generated by the feature fusion module ik-fusion Inputting the data into a full connection layer to obtain a logits vector O ═ O corresponding to each label 1 ,o 2 ,...,o n And inputting the vector O into a relational network module to obtain a prediction vector y ═ y 1 ,y 2 ,...,y n And finally, inputting the prediction vector y into a multilayer perceptron to obtain the prediction probability of the label, wherein the essence of the relational network is a residual network。
In the above technical solution, step S4 includes that a cross entropy loss function is used in the training process, and an Adam optimizer is used for training, where the cross entropy loss function of multi-label text classification is as follows:
Figure BDA0003534798700000052
wherein, y ij The actual probability of the ith sample to the jth label,
Figure BDA0003534798700000053
and finally obtaining a trained deep learning multi-label text classification model for the prediction probability of the ith sample to the jth label, wherein L refers to the number of label categories, and N refers to the number of sample texts.
The invention has the following advantages and beneficial effects:
the invention uses Bi-GRU combined with CNN to extract the semantic representation of the text, and can more fully obtain the local semantic information of the text. The hierarchical information of hierarchical multi-label classification is characterized through the graph neural network, and label representation with hierarchical association information can be obtained. The method and the device use a self-attention mechanism to extract the semantic representation of the text, and can obtain the semantic representation of the global association of the text. The invention uses the self-adaptive fusion of the text features based on the label representation and the text features based on the self-attention representation, and can obtain the text representation of the global, local text and the label information. The invention uses the relation network in the last layer of the model, so that the original label prediction vector can further obtain the label relevance.
The invention comprises four aspects: firstly, extracting a label representation containing a hierarchical relation by using a graph convolutional neural network; secondly, extracting local features by using a plurality of convolutions with different granularities; thirdly, text features are further extracted and adaptively Fused (FA) by using a label-based attention mechanism and a self-attention-based mechanism. Fourthly, the relationship network is used for further extracting the tag relevance. According to the hierarchical multi-label classification method based on mixed attention, the text features of the input text to be classified are extracted, then the text is classified through the multilayer perceptron, one or more labels can be marked on the text, and the method can be widely applied to the fields of E-commerce, news, scientific and technical papers and the like.
Drawings
FIG. 1 is a flow chart of a hierarchical multi-label text classification method based on mixed attention according to the present invention;
FIG. 2 is a network structure diagram of a hierarchical multi-label text classification model based on mixed attention according to the present invention;
FIG. 3 is a schematic diagram of a hierarchical structure of hierarchical multi-label text classification labels according to the present invention;
FIG. 4 is a schematic diagram of the hierarchical multi-label text classification graph convolutional neural network calculation according to the present invention;
FIG. 5 is a diagram of a relational network according to the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.
In order to achieve the purpose, the technical scheme adopted by the invention is as follows: a hierarchical multi-label text classification method based on mixed attention comprises the following steps:
step S1, preprocessing the multi-label text data in the data set D;
step S2, acquiring prior hierarchy information of a hierarchy classification system aiming at the text labels, wherein the prior hierarchy information refers to prior probability of mutual dependence between the labels and can be obtained by calculating transition probability between a father label and a son label;
step S3, constructing a deep learning level multi-label text classification model;
the deep learning multi-label text classification model comprises a word embedding module, a text coding module, a label coding module, a text representation module based on a label attention mechanism, a text representation module based on an attention mechanism, a feature fusion module, a vector regression layer, a relation network module and a label probability prediction layer;
and step S4, inputting the text data after the data set preprocessing to model training, and after the model training is finished, classifying the multi-label text by using the trained model.
Preferably, step S1 includes the steps of:
the method for preprocessing the data of the samples in the data set D specifically comprises the following steps:
step 1-1, performing word segmentation, stop word removal and punctuation removal on a text in a data set D;
step 1-2, counting word frequency word _ frequency in the text in the data set D, deleting words with the occurrence frequency less than X1, recording the filtered words, and constructing a word list.
1-3, preprocessing a data set D, and then, processing the data set D according to the ratio of 3: 1: 1, dividing the training set, the verification set and the test set.
Preferably, step S2 includes the steps of:
for the data of the training set in the data set D, assume the parent node v i And child node v j There is a hierarchical path e between i,j . Then the characteristic f (e) of the edge formed by the path of the parent and child nodes i,j ) From a priori probability p (U) j |U i ) And p (U) i |U j ) Represents:
Figure BDA0003534798700000071
f(e i,j ) The relation of two nodes is represented, the relation is described by the transition probability or the co-occurrence probability of the two nodes, and the transition probability of the two nodes respectively comprises the transition probability p (U) from a parent node to a child node j |U i ) Transfer of child node to parent nodeRate p (U) i |U j ) In the formula, U j And U i Respectively representing text data marked v j Node labels and are labeled v i Node label, p (U) j |U i ) Is given by v i In the case marked v j Conditional probability of node label, P (U) j ∩U i ) Is { v j ,v i Probability of being marked simultaneously, N j And N i Respectively representing a training set v j Node label and v i The number of node labels.
Preferably, step S3 further includes the steps of:
the method comprises the following steps of carrying out word embedding processing on an input text and a label thereof through a word embedding module, wherein the word embedding processing method specifically comprises the following steps:
step 2-1, obtaining the preprocessed text sequence, and embedding words { x ] in the text through a query word embedding table (Glove-300d) 1 ,x 2 ,...,x n Convert to word vector representation w 1 ,w 2 ,...,w n }。
Step 2-2, obtaining a label set { l ] of the hierarchical multi-label text classification 1 ,l 2 ,...,l n Converting the label set into a label embedded set with 300 dimensionality by means of kaiming coding { c } 1 ,c 2 ,...,c n N refers to the number of words of the preprocessed text.
Preferably, step S3 further includes the steps of:
representing a word vector by an encoding module w 1 ,w 2 ,...,w n Performing coding processing, wherein the coding processing method specifically comprises the following steps:
word vector representation of text using Bi-GRU network w 1 ,w 2 ,...,w n Encoding to generate an implicit representation h with context semantic information 1 ,h 2 ,...,h n }. Then will implicitly represent h 1 ,h 2 ,...,h n Sending the data to convolution with convolution kernels of 2, 3 and 4 and hidden layer number of 100 respectively to obtain semantic vectors of three different receptive fields, and respectively processing the semantic vectors through the maximumAfter pooling, the 3 semantic vectors are spliced into a new semantic representation vector S with the dimensionality of 300 (S ═ S) 1 ,s 2 ,…,s n }。
Preferably, step S3 further includes the steps of:
representing the tag vector by a tag encoding module { c } 1 ,c 2 ,...,c n Encoding processing is carried out, and the label encoding processing method specifically comprises the step of representing the label vector by using single-layer GCN (generalized notation for symbols c) 1 ,c 2 ,...,c n Encoding to generate an implicit expression M-M with label level associated information 1 ,m 2 ,...,m n }. The realization process is as follows:
the hierarchical GCN aggregates data flows within top-down, bottom-up, and self-cycling edges. In hierarchical GCN, each directed edge represents a pair of label-related features, and the data streams are subjected to node transformation by using edge-wise linear transformation.
To implement the nodal transformation, the present invention uses a weighted adjacency matrix to represent the linear transformation, and the initial value of the weighted adjacency matrix is derived from the prior level information of the level classification system in step S2. Formally, the hierarchical GCN encodes its hidden state according to the relevant neighborhood of node k, where the neighborhood n (k) { n } k ,child(k),parent(k)},n k Referring to the kth label node in the hierarchical label tree, child (k) refers to the child label node of the kth node, parent (k) refers to the parent label node of the kth node, and the hidden state of the node k is calculated as follows:
Figure BDA0003534798700000081
in the above formula, v j ,v k Is a parameter that can be trained on,
Figure BDA0003534798700000082
and
Figure BDA0003534798700000083
is a trainable bias parameter;for u k,j And g k,j In other words, u may be k,j Understood as information between nodes k, j, g k,j Understood as a gated value, control u k,j Finally, the influence on the node k is large; sigma means that the activation function in deep learning can be taken as sigmoid function,
Figure BDA0003534798700000084
b l ∈R N×dim and b is g ∈R N Dim is the dimension of the vector and belongs to a predefined hyperparameter; d (j, k) represents the hierarchical direction from node j to node k, including top-down, bottom-up, and self-looping edges; wherein, a k,j E.r denotes the hierarchical probability f d(k,j) (e kj ),f d(k,j) (e kj ) Refers to the transition probability from the kth node to the jth tag node, which is determined by f (e) i,j ) Obtained by self-circulation of the edge with a k,k 1, used from top to bottom
Figure BDA0003534798700000085
Using from bottom to top f p (e j,k ) 1 is ═ 1; the feature matrix F of the above edge is { a ═ a 0,0 ,a 0,1 ,…,a c-1,c-1 The expression is the weighted adjacency matrix of the text label directed hierarchical graph, and finally, the output hidden state h of the node k k Indicating that it corresponds to a tag representation of the hierarchy information.
Preferably, step S3 further includes the steps of:
the extraction method of the text representation module based on the label attention mechanism comprises the following steps: for text representation from text coding layer
Figure BDA0003534798700000086
And label representation from a label coding layer
Figure BDA0003534798700000087
The tag attention based text representation is calculated by the following formula:
Figure BDA0003534798700000091
wherein alpha is kj Representing the amount of information for the jth text feature vector for the kth tag. v. of k I.e. a text representation based on the tag attention.
Preferably, step S3 further includes the steps of:
the extraction method based on the self-attention mechanism text representation module specifically comprises the following steps: hidden layer text representation of Bi-GRU output from text encoding layer
Figure BDA0003534798700000092
A textual representation based on the self-attention mechanism is calculated by the following formula:
Figure BDA0003534798700000093
wherein w 1 ,w 2 H is a text representation, α, as a parameter kt Is the weight, u, occupied by the t-th vector in the text representation k Is a textual representation based on the self-attention mechanism.
Preferably, step S3 further includes the steps of:
the characteristic fusion module is as follows: the text features based on the label attention mechanism and the text features based on the self-attention mechanism are subjected to self-adaptive fusion to obtain final text features d ik-fusion The calculation method is as follows:
Figure BDA0003534798700000094
wherein w 1 ,w 2 As a parameter, v k For text representation based on tag attention, u k For self-attention based text representation, β k Is v is k The occupied weight.
Preferably, step S3 further includes the steps of:
pairing tags using relational network modulesFurther mining the associated information between: the mining method is to fuse the text features d generated by the feature fusion module ik-fusion Inputting the data into a full connection layer to obtain a logits vector O ═ O corresponding to each label 1 ,o 2 ,...,o n And inputting the vector O into a relational network module to obtain a prediction vector y ═ y 1 ,y 2 ,...,y n And finally, inputting the prediction vector y into a multilayer perceptron to obtain the label prediction probability, wherein the essence of the relationship network is a residual error network.
Preferably, step S4 includes the steps of:
in the training process, a cross entropy loss function is required to be used, an Adam optimizer is used for training, and the cross entropy loss function of multi-label text classification is as follows:
Figure BDA0003534798700000101
wherein, y ij And finally obtaining a trained deep learning multi-label text classification model for the actual probability of the ith sample to the jth label and the prediction probability of the ith sample to the jth label, wherein L refers to the number of label categories, and N refers to the number of text samples.
The specific embodiments described herein are merely illustrative of the spirit of the invention. Various modifications or additions may be made to the described embodiments or alternatives may be employed by those skilled in the art without departing from the spirit or ambit of the invention as defined in the appended claims.

Claims (10)

1. A hierarchical multi-label text classification method based on mixed attention is characterized by comprising the following steps:
step S1, preprocessing the multi-label text data in the data set D;
step S2, acquiring prior hierarchy information of a hierarchy classification system aiming at the text labels, wherein the prior hierarchy information refers to prior probability of mutual dependence between the labels and can be obtained by calculating transition probability between a father label and a son label;
step S3, constructing a deep learning level multi-label text classification model;
the deep learning multi-label text classification model comprises a word embedding module, a text coding module, a label coding module, a text representation module based on a label attention mechanism, a text representation module based on an attention mechanism, a feature fusion module, a vector regression layer, a relation network module and a label probability prediction layer;
and step S4, inputting the text data after the data set preprocessing to model training, and after the model training is finished, classifying the multi-label text by using the trained model.
2. The method of claim 1, wherein the hierarchical multi-label text classification based on label attention comprises: in step S1, preprocessing the text data in the data set D includes the following steps:
step 1.1, performing word segmentation, removing stop words and removing punctuation marks;
step 1.2, counting word frequency word _ frequency in the text in the data set D, deleting words with the occurrence frequency less than X1, recording the filtered words, and constructing a word list.
After the data set D is preprocessed, the data set D is divided into a training set, a verification set and a test set according to a certain proportion.
3. The method of claim 1, wherein the hierarchical multi-label text classification based on label attention comprises: the specific implementation of the step S2 includes;
for data in dataset D, assume parent node v i And child node v j There is a hierarchical path e between i,j Then the characteristic f (e) of the edge formed by the path of the parent and child nodes i,j ) From a priori probability p (U) j |U i ) And p (U) i |U j ) Represents:
Figure FDA0003534798690000011
f(e i,j ) The relation of two nodes is represented, the relation is described by the transition probability or the co-occurrence probability of the two nodes, and the transition probability of the two nodes respectively comprises the transition probability p (U) from a parent node to a child node j |U i ) Transition probability p (U) of child node to parent node i |U j ) In the formula, U j And U i Respectively representing text data marked v j Node label and is marked as v i Node label, p (U) j |U i ) Is given by v i In the case marked v j Conditional probability of node label, P (U) j ∩U i ) Is { v j ,v i Probability of being marked simultaneously, N j And N i Respectively representing a training set v j Node labels and v i The number of node labels.
4. The method of claim 3, wherein the hierarchical multi-label text classification based on label attention comprises: in step S3, word embedding processing is performed on the input text and the tag thereof by the word embedding module, and the word embedding processing method specifically includes:
step 2.1, obtaining the preprocessed text sequence, and embedding words { x ] in the text into a dictionary table through the query words 1 ,x 2 ,...,x n Convert to word vector representation w 1 ,w 2 ,...,w n };
Step 2.2, obtaining a label set { l ] of the hierarchical multi-label text classification 1 ,l 2 ,...,l n Converting the label set into a label embedding set with dimension d { c } by means of kaiming coding 1 ,c 2 ,...,c n N refers to the number of words of the preprocessed text;
5. the method of claim 4, wherein the hierarchical multi-label text classification based on label attention comprises: in step S3Representing the word vector by the text encoding module w 1 ,w 2 ,...,w n Performing coding processing, wherein the coding processing method specifically comprises the following steps:
word vector representation of text using Bi-GRU network w 1 ,w 2 ,...,w n Encode, generating an implicit representation h with context semantic information 1 ,h 2 ,...,h n Will then implicitly represent h 1 ,h 2 ,...,h n Sending the data into three convolutions with different convolution kernel sizes, obtaining semantic vectors under three different receptive fields, and finally splicing the 3 semantic vectors into a new semantic expression vector S ═ S 1 ,s 2 ,...,s n }。
In step S3, label vector is represented by label encoding module { c 1 ,c 2 ,...,c n Performing encoding processing, wherein the label encoding processing method specifically comprises the following steps:
using single-layer GCN to represent the tag vector c 1 ,c 2 ,...,c n Encoding to generate an implicit expression M ═ M with label hierarchy association information 1 ,m 2 ,...,m n The implementation process is as follows:
hierarchical GCN aggregates data flows from top to bottom, bottom to top and within self-looping edges, in hierarchical GCN each directed edge represents a pair of label related features, these data flows are node transformed using edgewise linear transformation;
to implement the node transformation, a weighted adjacency matrix is used to represent the linear transformation, and the initial value of the weighted adjacency matrix comes from the prior hierarchical information of the hierarchical classification system in step S2, and the hierarchical GCN formally encodes its hidden state according to the relevant neighborhood of node k, where the neighborhood n (k) { n } n k ,child(k),parent(k)},n k Referring to the kth label node in the hierarchical label tree, child (k) refers to the child label node of the kth node, parent (k) refers to the parent label node of the kth node, and the hidden state of the node k is calculated as follows:
Figure FDA0003534798690000021
in the above formula, v j ,v k Is a parameter that can be trained on,
Figure FDA0003534798690000022
and
Figure FDA0003534798690000023
is a trainable bias parameter; for u k,j And g k,j In other words, u may be k,j Understood as information between nodes k, j, g k,j Understood as a gated value, control u k,j Finally, the influence on the node k is large; sigma means that the activation function in deep learning can be taken as sigmoid function,
Figure FDA0003534798690000024
b l ∈R N×dim and b is g ∈R N Dim is the dimension of the vector and belongs to a predefined hyperparameter; d (j, k) represents the hierarchical direction from node j to node k, including top-down, bottom-up, and self-circulating edges; wherein, a k,j E.r denotes the hierarchical probability f d(k,j) (e kj ),f d(k,j) (e kj ) Refers to the transition probability from the kth node to the jth tag node, which is determined by f (e) i,j ) Is obtained by self-circulation side using a k,k 1, used from top to bottom
Figure FDA0003534798690000031
Using from bottom to top f p (e j,k ) 1 is ═ 1; the feature matrix F of the above edge is { a ═ a 0,0 ,a 0,1 ,...,a c-1,c-1 The expression is the weighted adjacency matrix of the text label directed hierarchical graph, and finally, the output hidden state h of the node k k Indicating that it corresponds to a tag representation of the hierarchy information.
6. The method of claim 5, wherein the hierarchical multi-label text classification based on label attention comprises: the extraction method of the text representation module based on the label attention mechanism in the step S3 is as follows: for text representation from text coding layer
Figure FDA0003534798690000032
And label representation from a label coding layer
Figure FDA0003534798690000033
d c The dimensional size of the text encoding vector is represented, and is a predetermined fixed value, and the text representation based on the attention of the label is calculated by the following formula:
Figure FDA0003534798690000034
wherein alpha is kj Represents the amount of information, v, of the jth text feature vector to the kth tag k I.e. a text representation based on the tag attention.
7. The method of claim 6, wherein the hierarchical multi-label text classification based on label attention comprises: the extraction method based on the self-attention mechanism text representation module in step S3 specifically includes: hidden layer text representation of Bi-GRU output from text encoding layer
Figure FDA0003534798690000035
A textual representation based on the self-attention mechanism is calculated by the following formula:
Figure FDA0003534798690000036
wherein w 1 ,w 2 H is a text representation, α, as a parameter kt In a text representationWeight, u, occupied by the t-th vector k Is a textual representation based on the self-attention mechanism.
8. The method of claim 7, wherein the hierarchical multi-label text classification based on label attention comprises: the feature fusion module in step S3 is: the text features based on the label attention mechanism and the text features based on the self-attention mechanism are subjected to self-adaptive fusion to obtain final text features d ik-fusion The calculation method is as follows:
Figure FDA0003534798690000037
wherein w 1 ,w 2 Is a parameter, v k For text representation based on tag attention, u k For self-attention based text representation, β k Is v is k The occupied weight.
9. The method of claim 8, wherein the hierarchical multi-label text classification based on label attention comprises: the relationship network module in step S3 further mines the association information between the tags: the mining method is to fuse the text features d generated by the feature fusion module ik-fusion Inputting the data into a full connection layer to obtain a logits vector O ═ O corresponding to each label 1 ,o 2 ,...,o n And inputting the vector O into a relational network module to obtain a prediction vector y ═ y 1 ,y 2 ,...,y n And finally, inputting the prediction vector y into a multilayer perceptron to obtain the label prediction probability, wherein the nature of the relational network is a residual network.
10. The label attention based hierarchical multi-label text classification method according to claim 1, characterized in that: in the training process in the step S4, a cross entropy loss function is used, and an Adam optimizer is used for training, and the cross entropy loss function of multi-label text classification is as follows:
Figure FDA0003534798690000041
wherein, y ij The actual probability of the ith sample to the jth label,
Figure FDA0003534798690000042
and finally obtaining a trained deep learning multi-label text classification model for the prediction probability of the ith sample to the jth label, wherein L refers to the number of label categories, and N refers to the number of text samples.
CN202210216140.7A 2022-03-07 2022-03-07 Hierarchical multi-label text classification method based on mixed attention Pending CN114896388A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210216140.7A CN114896388A (en) 2022-03-07 2022-03-07 Hierarchical multi-label text classification method based on mixed attention

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210216140.7A CN114896388A (en) 2022-03-07 2022-03-07 Hierarchical multi-label text classification method based on mixed attention

Publications (1)

Publication Number Publication Date
CN114896388A true CN114896388A (en) 2022-08-12

Family

ID=82714905

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210216140.7A Pending CN114896388A (en) 2022-03-07 2022-03-07 Hierarchical multi-label text classification method based on mixed attention

Country Status (1)

Country Link
CN (1) CN114896388A (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115374285A (en) * 2022-10-26 2022-11-22 思创数码科技股份有限公司 Government affair resource catalog theme classification method and system
CN115757823A (en) * 2022-11-10 2023-03-07 魔方医药科技(苏州)有限公司 Data processing method and device, electronic equipment and storage medium
CN116089618A (en) * 2023-04-04 2023-05-09 江西师范大学 Drawing meaning network text classification model integrating ternary loss and label embedding
CN116187419A (en) * 2023-04-25 2023-05-30 中国科学技术大学 Automatic hierarchical system construction method based on text chunks
CN116304845A (en) * 2023-05-23 2023-06-23 云筑信息科技(成都)有限公司 Hierarchical classification and identification method for building materials
CN116542252A (en) * 2023-07-07 2023-08-04 北京营加品牌管理有限公司 Financial text checking method and system
CN116932765A (en) * 2023-09-15 2023-10-24 中汽信息科技(天津)有限公司 Patent text multi-stage classification method and equipment based on graphic neural network
CN117453921A (en) * 2023-12-22 2024-01-26 南京华飞数据技术有限公司 Data information label processing method of large language model

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115374285A (en) * 2022-10-26 2022-11-22 思创数码科技股份有限公司 Government affair resource catalog theme classification method and system
CN115374285B (en) * 2022-10-26 2023-02-07 思创数码科技股份有限公司 Government affair resource catalog theme classification method and system
CN115757823A (en) * 2022-11-10 2023-03-07 魔方医药科技(苏州)有限公司 Data processing method and device, electronic equipment and storage medium
CN115757823B (en) * 2022-11-10 2024-03-05 魔方医药科技(苏州)有限公司 Data processing method, device, electronic equipment and storage medium
CN116089618A (en) * 2023-04-04 2023-05-09 江西师范大学 Drawing meaning network text classification model integrating ternary loss and label embedding
CN116089618B (en) * 2023-04-04 2023-06-27 江西师范大学 Drawing meaning network text classification model integrating ternary loss and label embedding
CN116187419A (en) * 2023-04-25 2023-05-30 中国科学技术大学 Automatic hierarchical system construction method based on text chunks
CN116187419B (en) * 2023-04-25 2023-08-29 中国科学技术大学 Automatic hierarchical system construction method based on text chunks
CN116304845B (en) * 2023-05-23 2023-08-18 云筑信息科技(成都)有限公司 Hierarchical classification and identification method for building materials
CN116304845A (en) * 2023-05-23 2023-06-23 云筑信息科技(成都)有限公司 Hierarchical classification and identification method for building materials
CN116542252A (en) * 2023-07-07 2023-08-04 北京营加品牌管理有限公司 Financial text checking method and system
CN116542252B (en) * 2023-07-07 2023-09-29 北京营加品牌管理有限公司 Financial text checking method and system
CN116932765A (en) * 2023-09-15 2023-10-24 中汽信息科技(天津)有限公司 Patent text multi-stage classification method and equipment based on graphic neural network
CN116932765B (en) * 2023-09-15 2023-12-08 中汽信息科技(天津)有限公司 Patent text multi-stage classification method and equipment based on graphic neural network
CN117453921A (en) * 2023-12-22 2024-01-26 南京华飞数据技术有限公司 Data information label processing method of large language model
CN117453921B (en) * 2023-12-22 2024-02-23 南京华飞数据技术有限公司 Data information label processing method of large language model

Similar Documents

Publication Publication Date Title
CN114896388A (en) Hierarchical multi-label text classification method based on mixed attention
CN111914558B (en) Course knowledge relation extraction method and system based on sentence bag attention remote supervision
CN110020438B (en) Sequence identification based enterprise or organization Chinese name entity disambiguation method and device
CN109783818B (en) Enterprise industry classification method
CN112732916B (en) BERT-based multi-feature fusion fuzzy text classification system
Zhang et al. Aspect-based sentiment analysis for user reviews
CN113516198B (en) Cultural resource text classification method based on memory network and graphic neural network
CN113806547B (en) Deep learning multi-label text classification method based on graph model
CN112749274A (en) Chinese text classification method based on attention mechanism and interference word deletion
CN113515632A (en) Text classification method based on graph path knowledge extraction
CN112163089B (en) High-technology text classification method and system integrating named entity recognition
CN113987187A (en) Multi-label embedding-based public opinion text classification method, system, terminal and medium
CN114925205B (en) GCN-GRU text classification method based on contrast learning
CN116304066A (en) Heterogeneous information network node classification method based on prompt learning
CN115952794A (en) Chinese-Tai cross-language sensitive information recognition method fusing bilingual sensitive dictionary and heterogeneous graph
CN112732872A (en) Biomedical text-oriented multi-label classification method based on subject attention mechanism
CN111651597A (en) Multi-source heterogeneous commodity information classification method based on Doc2Vec and convolutional neural network
CN115292490A (en) Analysis algorithm for policy interpretation semantics
CN113590827B (en) Scientific research project text classification device and method based on multiple angles
CN111709225A (en) Event cause and effect relationship judging method and device and computer readable storage medium
CN113051886B (en) Test question duplicate checking method, device, storage medium and equipment
CN117787283A (en) Small sample fine granularity text named entity classification method based on prototype comparison learning
CN115795037B (en) Multi-label text classification method based on label perception
CN116756605A (en) ERNIE-CN-GRU-based automatic speech step recognition method, system, equipment and medium
CN116956228A (en) Text mining method for technical transaction platform

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination