CN114896388A - A Hierarchical Multi-Label Text Classification Method Based on Mixed Attention - Google Patents
A Hierarchical Multi-Label Text Classification Method Based on Mixed Attention Download PDFInfo
- Publication number
- CN114896388A CN114896388A CN202210216140.7A CN202210216140A CN114896388A CN 114896388 A CN114896388 A CN 114896388A CN 202210216140 A CN202210216140 A CN 202210216140A CN 114896388 A CN114896388 A CN 114896388A
- Authority
- CN
- China
- Prior art keywords
- label
- text
- node
- hierarchical
- attention
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 31
- 239000013598 vector Substances 0.000 claims abstract description 51
- 230000007246 mechanism Effects 0.000 claims abstract description 29
- 230000004927 fusion Effects 0.000 claims abstract description 12
- 238000000605 extraction Methods 0.000 claims abstract description 8
- 239000010410 layer Substances 0.000 claims description 26
- 238000012549 training Methods 0.000 claims description 20
- 230000007704 transition Effects 0.000 claims description 19
- 238000013135 deep learning Methods 0.000 claims description 12
- 230000006870 function Effects 0.000 claims description 12
- 239000011159 matrix material Substances 0.000 claims description 12
- 238000013145 classification model Methods 0.000 claims description 10
- 238000003672 processing method Methods 0.000 claims description 9
- 230000009466 transformation Effects 0.000 claims description 9
- 238000007781 pre-processing Methods 0.000 claims description 7
- 238000012545 processing Methods 0.000 claims description 7
- 230000008569 process Effects 0.000 claims description 6
- 238000005065 mining Methods 0.000 claims description 5
- 238000004364 calculation method Methods 0.000 claims description 4
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 claims description 3
- 230000004913 activation Effects 0.000 claims description 3
- 230000011218 segmentation Effects 0.000 claims description 3
- 239000002356 single layer Substances 0.000 claims description 3
- 238000012360 testing method Methods 0.000 claims description 3
- 239000013604 expression vector Substances 0.000 claims 1
- 238000012795 verification Methods 0.000 claims 1
- 238000013528 artificial neural network Methods 0.000 abstract description 3
- 235000019580 granularity Nutrition 0.000 abstract description 2
- 238000011176 pooling Methods 0.000 abstract description 2
- 238000010586 diagram Methods 0.000 description 4
- 238000013527 convolutional neural network Methods 0.000 description 3
- 230000000694 effects Effects 0.000 description 2
- 238000000844 transformation Methods 0.000 description 2
- 238000010200 validation analysis Methods 0.000 description 2
- 238000007792 addition Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000007635 classification algorithm Methods 0.000 description 1
- 238000013523 data management Methods 0.000 description 1
- 238000007418 data mining Methods 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000003058 natural language processing Methods 0.000 description 1
- 239000002699 waste material Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2415—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/243—Classification techniques relating to the number of classes
- G06F18/24323—Tree-organised classifiers
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/253—Fusion techniques of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/12—Use of codes for handling textual entities
- G06F40/126—Character encoding
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/12—Use of codes for handling textual entities
- G06F40/151—Transformation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/166—Editing, e.g. inserting or deleting
- G06F40/183—Tabulation, i.e. one-dimensional positioning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/216—Parsing using statistical methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/237—Lexical tools
- G06F40/242—Dictionaries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Data Mining & Analysis (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Computation (AREA)
- Evolutionary Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Probability & Statistics with Applications (AREA)
- Biophysics (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Biomedical Technology (AREA)
- Databases & Information Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
技术领域technical field
本发明涉及计算机信息技术领域与自然语言处理领域,更具体地,涉及一种基于混合注意力的层级多标签文本分类方法。The present invention relates to the field of computer information technology and natural language processing, and more particularly, to a method for hierarchical multi-label text classification based on mixed attention.
背景技术Background technique
互联网时代的到来,使人们能够更加便捷地接触到各类信息,与此同时,各种媒体数据也在源源不断地产生,这为挖掘互联网上有价值的数据提供了基础条件,如果对海量的数据缺乏高效的管理方式与获取知识的手段,无疑是一种浪费。在数据挖掘中,文本分类问题是核心的问题之一。The advent of the Internet era has enabled people to access various types of information more conveniently. At the same time, various media data are also continuously generated, which provides the basic conditions for mining valuable data on the Internet. The lack of efficient data management methods and means of acquiring knowledge is undoubtedly a waste. In data mining, text classification is one of the core problems.
多标签文本分类的任务是在给定的标签集合中选择与文本内容最相关的子集。在实际场景中,很多数据与标签集合中的多个标签相关,这些标签可以简洁地展现出数据的具体内容,使人们能够更加方便有效地管理海量数据,对数据进一步分析。层级多标签文本分类是多标签文本分类的一种特例,其标签体系具有层次化的结构,通用的多标签文本分类算法没有考虑到层次化标签结构对分类效果的影响,没有充分利用到文本标签间的关联信息,导致进行分类时候对其所属类别识别不够准确,特别是对存在长尾分布的数据,其分类效果仍有较大的提升空间。同时,现有模型大多关注于文本的局部特征,或者全局特征,缺乏对局部特征及全局特征综合考虑,以至于对涉及分类的重要特征捕捉不足。The task of multi-label text classification is to select the subset most relevant to the text content within a given set of labels. In practical scenarios, a lot of data is related to multiple tags in the tag set, and these tags can concisely display the specific content of the data, enabling people to manage massive data more conveniently and effectively, and further analyze the data. Hierarchical multi-label text classification is a special case of multi-label text classification. Its label system has a hierarchical structure. The general multi-label text classification algorithm does not take into account the impact of the hierarchical label structure on the classification effect, and does not make full use of text labels. The correlation information between the data leads to inaccurate identification of the category to which it belongs during classification, especially for data with long-tailed distribution, the classification effect still has a large room for improvement. At the same time, most of the existing models focus on the local features or global features of the text, and lack comprehensive consideration of local features and global features, so that the important features involved in classification are insufficiently captured.
发明内容SUMMARY OF THE INVENTION
针对现有技术的不足,本发明提供一种基于混合注意力的层级多标签文本分类方法,通过利用标签层次结构进行标签语义表示,以及充分利用文本的全局及局部语义信息来达到改善层级多标签文本分类的性能问题。Aiming at the deficiencies of the prior art, the present invention provides a mixed attention-based hierarchical multi-label text classification method, which improves the hierarchical multi-label by using the label hierarchy for label semantic representation and making full use of the global and local semantic information of the text. Performance issues for text classification.
为了实现上述目的,本发明所采用的技术方案为:一种基于混合注意力的层级多标签文本分类方法,该方法包括如下步骤:In order to achieve the above-mentioned purpose, the technical solution adopted in the present invention is: a method for classifying multi-label text based on mixed attention, the method comprises the following steps:
S1,多标签文本数据预处理;文本数据用于训练模型,它由文本内容以及其对应的标签集构成,整个数据集所有的标签类别之间是一个树状图,具有层级关系,对于树图而言,它是由很多个节点构成,每一个节点代表一个标签类别,数据集中每个样例文本对应的标签来自于这个标签树图上的节点;S1, multi-label text data preprocessing; text data is used to train the model, which consists of text content and its corresponding label set. All label categories in the entire data set are a dendrogram with a hierarchical relationship. For the dendrogram In terms of it, it consists of many nodes, each node represents a label category, and the label corresponding to each sample text in the dataset comes from the node on this label tree graph;
S2,针对文本标签,获取层级分类体系的先验层级信息,所述先验层级信息指的是标签之间互相依赖的先验概率,可以通过计算父标签与子标签之间的转移概率得到;S2, for the text label, obtain the prior level information of the hierarchical classification system, the prior level information refers to the prior probability of the interdependence between the labels, which can be obtained by calculating the transition probability between the parent label and the child label;
S3,构建深度学习层级多标签文本分类模型;S3, build a deep learning hierarchical multi-label text classification model;
所述深度学习多标签文本分类模型包括词嵌入模块、文本编码模块、标签编码模块,基于标签注意力机制文本表示模块、基于自注意力机制的文本表示模块、特征融合模块、向量回归层,关系网络模块,标签概率预测层;The deep learning multi-label text classification model includes a word embedding module, a text encoding module, a label encoding module, a text representation module based on a label attention mechanism, a text representation module based on a self-attention mechanism, a feature fusion module, a vector regression layer, and a relationship. Network module, label probability prediction layer;
S4,输入数据集预处理后的文本数据到模型训练;模型训练完成之后,利用训练好的模型对多标签文本进行分类。S4, input the preprocessed text data of the data set to the model training; after the model training is completed, use the trained model to classify the multi-label text.
上述技术方案中,步骤S1包括,对数据集D中样本进行数据预处理,具体包括如下步骤:步骤1.1,进行分词、去除停止词、去除标点符号;步骤1.2,统计数据集D中的文本中的单词频率word_frequence,删除出现次数小于X1的单词,将过滤后的单词记录,构建单词表。数据集D经过预处理后,将数据集D按一定比例划分为训练集,验证集,测试集。In the above technical solution, step S1 includes performing data preprocessing on the samples in the data set D, which specifically includes the following steps: Step 1.1, performing word segmentation, removing stop words, and removing punctuation marks; The word frequency word_frequence, delete words whose occurrences are less than X1, record the filtered words, and build a word list. After the data set D is preprocessed, the data set D is divided into a training set, a validation set, and a test set according to a certain proportion.
上述技术方案中,步骤S2包括:对于数据集D中的训练集的数据,假设父节点vi和子节点vj之间存在层次路径ei,j,那么由父子节点路径构成的边的特征f(ei,j)由先验概率p(Uj|Ui)和p(Ui|Uj)表示:In the above technical solution, step S2 includes: for the data of the training set in the data set D, assuming that there is a hierarchical path e i, j between the parent node v i and the child node v j , then the feature f of the edge formed by the parent-child node path (e i,j ) is represented by the prior probabilities p(U j |U i ) and p(U i |U j ):
f(ei,j)表示的是两个节点的关系,这种关系用两个节点的转移概率或者说共现概率来描述,两个节点的转移概率分别包括父节点到某一个子节点的转移概率p(Uj|Ui),子节点到父节点的转移概率p(Ui|Uj),一个父标签节点下可能会包含多个子标签节点,父标签节点到每一个子标签节点转移概率的和为1,假如父标签节点下只有一个子节点,那么值此时为1;假如存在多个子标签,那么这个值此时是小于1,但是它们的和为1;式中,Uj和Ui分别表示文本样例被标记为vj节点标签及被标记为vi节点标签,p(Uj|Ui)是给定vi情况下被标记为vj节点标签的条件概率。P(Uj∩Ui)是{vj,vi}同时被标记的概率。Nj和Ni分别表示训练集中vj节点标签及vi节点标签的数量。f(e i,j ) represents the relationship between two nodes. This relationship is described by the transition probability or co-occurrence probability of the two nodes. The transition probability of the two nodes includes the relationship between the parent node and a child node. The transition probability p(U j |U i ), the transition probability p(U i |U j ) from the child node to the parent node, a parent label node may contain multiple child label nodes, the parent label node to each child label node The sum of the transition probabilities is 1. If there is only one child node under the parent label node, the value is 1 at this time; if there are multiple child labels, the value is less than 1 at this time, but their sum is 1; in the formula, U j and U i denote that the text sample is labeled as v j node label and labeled as v i node label, respectively, p(U j |U i ) is the conditional probability of being labeled as v j node label given v i . P(U j ∩U i ) is the probability that {v j ,v i } are marked simultaneously. N j and N i represent the number of v j node labels and v i node labels in the training set, respectively.
上述技术方案中,步骤3还包括通过词嵌入模块对输入文本及其标签进行词嵌入处理,词嵌入处理方法具体为:In the above technical solution,
步骤2.1、获得预处理后的文本序列,通过查询词嵌入字典表将文本中的单词{x1,x2,...,xn}转换为词向量表示{w1,w2,...,wn},n指的是预处理后的文本的单词数量。Step 2.1. Obtain the preprocessed text sequence, and convert the words {x 1 ,x 2 ,...,x n } in the text into word vector representations {w 1 ,w 2 ,... by querying the word embedding dictionary table. .,w n }, where n refers to the number of words in the preprocessed text.
步骤2.2、获得层级多标签文本分类的标签集{l1,l2,...,ln},通过kaiming编码的方式,将标签集转换成一个维度为d的标签嵌入集{c1,c2,...,cn}。Step 2.2. Obtain the label set {l 1 ,l 2 ,...,l n } of hierarchical multi-label text classification, and convert the label set into a label embedding set {c 1 , c 2 ,...,c n }.
上述技术方案中,步骤S3还包括,通过文本编码模块对词向量表示{w1,w2,...,wn}进行编码处理,编码处理方法具体为:In the above technical solution, step S3 further includes, performing encoding processing on the word vector representation {w 1 , w 2 , . . . , wn } through the text encoding module, and the encoding processing method is specifically:
使用Bi-GRU网络对文本的词向量表示{w1,w2,...,wn}进行编码,生成具有上下文语义信息的隐含表示{h1,h2,...,hn}。然后将隐含表示{h1,h2,...,hn}进一步送入三个卷积核大小不同的卷积,并得到三个不同感受野下的语义向量,最后将3个语义向量拼接成一个新的语义表示向量S={s1,s2,…,sn}。The word vector representations {w 1 ,w 2 ,...,w n } of the text are encoded using the Bi-GRU network to generate implicit representations {h 1 ,h 2 ,...,h n with contextual semantic information }. Then, the implicit representation {h 1 , h 2 ,...,h n } is further sent to three convolutions with different kernel sizes, and three semantic vectors under different receptive fields are obtained. Finally, the three semantic The vectors are concatenated into a new semantic representation vector S={s 1 , s 2 ,...,s n }.
上述技术方案中,步骤S3还包括,通过标签编码模块对标签向量表示{c1,c2,...,cn}进行编码处理,标签编码处理方法具体为:In the above technical solution, step S3 further includes: performing encoding processing on the label vector representation {c 1 ,c 2 ,...,c n } by the label encoding module, and the label encoding processing method is specifically:
使用单层GCN对标签向量表示{c1,c2,...,cn}进行编码,生成具有标签层次关联信息的隐含表示M={m1,m2,...,mn}。其实现过程如下:The label vector representation {c 1 ,c 2 ,...,c n } is encoded using a single-layer GCN to generate an implicit representation M={m 1 ,m 2 ,...,m n with label-level association information }. The implementation process is as follows:
层次结构GCN聚合了自上而下、自下而上和自循环边缘内的数据流。在层次GCN中,每个有向边代表一个成对的标签相关特征,这些数据流使用沿边线性变换进行节点变换。Hierarchical GCNs aggregate top-down, bottom-up, and self-loop data flows within edges. In hierarchical GCNs, each directed edge represents a pair of label-related features, and these data streams are transformed at nodes using linear transformations along the edges.
为了实现节点变换,本发明使用了加权邻接矩阵来表示这种线性变换,而加权邻接矩阵的初始值来自于步骤二中层级分类体系的先验层级信息。形式上,层次GCN根据节点k的相关邻域对其隐藏状态进行编码,其中邻域N(k)={nk,child(k),parent(k)},nk指的是层级标签树中的第k个标签节点,child(k)是指第k个节点的子标签节点,parent(k)指的是第k个节点的父标签节点,节点k的隐藏状态计算方式如下:In order to realize the node transformation, the present invention uses a weighted adjacency matrix to represent this linear transformation, and the initial value of the weighted adjacency matrix comes from the prior level information of the hierarchical classification system in step two. Formally, a hierarchical GCN encodes its hidden state according to the relevant neighborhood of node k, where the neighborhood N(k) = {n k , child(k), parent(k)}, n k refers to the hierarchical label tree The kth label node in , child(k) refers to the child label node of the kth node, parent(k) refers to the parent label node of the kth node, and the hidden state of node k is calculated as follows:
上述公式中,vj,vk是可以训练的参数,及是可训练的偏置参数;对于uk,j及gk,j而言,可以将uk,j理解成结点k,j之间的信息,gk,j理解成门控值,控制uk,j最后对In the above formula, v j , v k are parameters that can be trained, and is a trainable bias parameter; for uk ,j and gk ,j , uk ,j can be understood as the information between nodes k, j, gk,j as the gate value, control u k,j last pair
节点k的影响大小;σ是指深度学习中的激活函数,可以取为sigmoid函数, bl∈RN×dim,及bg∈RN,dim为向量的维度大小,属于预先定义的超参数。d(j,k)表示从节点j到节点k的层次方向,包括自顶向下、自下而上和自循环边。其中,ak,j∈R表示的是层次概率fd(k,j)(ekj),fd(k,j)(ekj)指的是从第k个节点到第j个标签节点间的转移概率。它是通过上文中的f(e_(i,j))得到,自循环边采用ak,k=1,自上而下的边使用自下而上的边使用fp(ej,k)=1。上述边的特征矩阵F={a0,0,a0,1,…,ac-1,c-1}表示的是文本标签有向层次图的加权邻接矩阵。最后,节点k的输出隐藏状态hk表示其对应于层次结构信息的标签表示。The influence of node k; σ refers to the activation function in deep learning, which can be taken as the sigmoid function, b l ∈R N×dim , and b g ∈R N , dim is the dimension of the vector and belongs to the predefined hyperparameters. d(j, k) represents the hierarchical direction from node j to node k, including top-down, bottom-up and self-loop edges. Among them, a k,j ∈R represents the hierarchical probability f d(k,j) (e kj ), and f d(k,j) (e kj ) refers to the node from the kth node to the jth label node the transition probability between. It is obtained by f(e_(i,j)) above, using a k,k = 1 for the self-loop edge, and using the top-down edge Bottom-up edges use f p (e j,k )=1. The feature matrix F={a 0,0 ,a 0,1 ,..., ac-1,c-1 } of the above edge represents the weighted adjacency matrix of the directed hierarchical graph of text labels. Finally, the output hidden state h k of node k represents its label representation corresponding to the hierarchical information.
上述技术方案中,步骤S3还包括基于标签注意力机制文本表示模块:对来自文本编码层的文本表示以及来自标签编码层的标签表示dc表示的是文本编码向量的维度大小,是一个预先确定的固定的值,通过以下公式计算基于标签注意力的文本表示:In the above technical solution, step S3 also includes a text representation module based on the label attention mechanism: representing the text from the text encoding layer and the label representation from the label encoding layer d c represents the dimension size of the text encoding vector, which is a predetermined fixed value. The text representation based on label attention is calculated by the following formula:
其中,αkj表示第j个文本特征向量对于第k个标签的信息量。vk即基于标签注意的文本表示。Among them, α kj represents the information amount of the j-th text feature vector for the k-th label. v k is the text representation based on label attention.
上述技术方案中,步骤S3还包括基于自注意力机制文本表示模块:对来自文本编码层中Bi-GRU输出的隐层文本表示通过以下公式计算基于自注意力机制的文本表示:In the above technical solution, step S3 further includes a text representation module based on a self-attention mechanism: representing the hidden layer text output from the Bi-GRU in the text encoding layer The text representation based on the self-attention mechanism is calculated by the following formula:
其中w1,w2为参数,h为文本表示,αkt为文本表示中第t个向量所占的权重,uk为基于自注意力机制的文本表示。Where w 1 , w 2 are parameters, h is the text representation, α kt is the weight occupied by the t-th vector in the text representation, and uk is the text representation based on the self-attention mechanism.
上述技术方案中,步骤S3还包括特征融合模块:将与基于标签注意力机制的文本特征和基于自注意力机制的文本特征进行自适应融合,得到最终的文本特征dik-fusio,计算方式如下:In the above technical solution, step S3 also includes a feature fusion module: adaptively fuse with the text feature based on the label attention mechanism and the text feature based on the self-attention mechanism to obtain the final text feature d ik-fusio , the calculation method is as follows :
其中w1,w2为参数,vk为基于标签注意力的文本表示,uk为基于自注意力的文本表示,βk为vk所占的权重。where w 1 and w 2 are parameters, v k is the text representation based on label attention, uk is the text representation based on self-attention, and β k is the weight occupied by v k .
上述技术方案中,步骤S3还包括关系网络模块对标签间的关联信息进一步挖掘:挖掘方法为将特征融合模块产生的文本特征dik-fusion输入到全连接层,得到每标签对应的logits向量O={o1,o2,...,on},然后将向量O输入到关系网络模块得到预测向量y={y1,y2,...,yn},最后将预测向量y输入到多层感知机,即能得到标签预测概率,其中关系网络本质是一个残差网络。In the above technical solution, step S3 also includes the relationship network module further mining the association information between the tags: the mining method is to input the text feature d ik-fusion generated by the feature fusion module into the fully connected layer, and obtain the logits vector corresponding to each tag O. = { o 1 ,o 2 ,...,on }, then input the vector O to the relational network module to get the prediction vector y={y 1 ,y 2 ,...,y n }, and finally the prediction vector y Input to the multi-layer perceptron, the label prediction probability can be obtained, and the relationship network is essentially a residual network.
上述技术方案中,步骤S4包括训练过程中需要使用采用交叉熵损失函数,使用Adam优化器进行训练,多标签文本分类的交叉熵损失函数如下:In the above technical solution, step S4 includes the need to use a cross-entropy loss function in the training process, and use the Adam optimizer for training. The cross-entropy loss function for multi-label text classification is as follows:
其中,yij为第i个样本对于第j个标签的实际概率,为第i个样本对于第j个标签的预测概率,最终得到训练好的深度学习多标签文本分类模型,L指的是标签类别的数量,N指的是样例文本的数量。Among them, y ij is the actual probability of the i-th sample for the j-th label, is the predicted probability of the i-th sample for the j-th label, and finally a trained deep learning multi-label text classification model is obtained, where L refers to the number of label categories, and N refers to the number of sample texts.
本发明的优点和有益效果如下:The advantages and beneficial effects of the present invention are as follows:
本发明使用了Bi-GRU联合CNN来提取文本的语义表示,能够较为充分的获得文本的局部语义信息。本发明通过图神经网络来表征层级多标签分类的层级信息,能够获得具有层级关联信息的标签表示。本发明使用了自注意力机制来提取文本的语义表示,能够获得文本全局关联的语义表示。本发明使用了自适应融合基于标签表示的文本特征与基于自注意力表示的文本特征,能够获得联系全局、局部文本,及标签息的文本表示。本发明在模型的最后一层使用了关系网络,能够使得原始标签预测向量进一步的获得标签关联性。The present invention uses Bi-GRU combined with CNN to extract the semantic representation of the text, which can sufficiently obtain the local semantic information of the text. In the present invention, the hierarchical information of hierarchical multi-label classification is represented by the graph neural network, and the label representation with hierarchical correlation information can be obtained. The present invention uses the self-attention mechanism to extract the semantic representation of the text, and can obtain the semantic representation of the global association of the text. The present invention uses the self-adaptive fusion of the text feature based on label representation and the text feature based on self-attention representation, and can obtain text representation that links global, local text and label information. The present invention uses a relational network in the last layer of the model, so that the original label prediction vector can further obtain the label correlation.
本发明包括四个方面:一是使用图卷积神经网络来提取蕴含层级关系的标签表示;二是使用了多个不同粒度的卷积提取了局部特征;三是使用基于标签的注意力机制和基于自注意力机制进一步提取文本特征,并对其自适应融合(FA)。四是使用了关系网络进一步提取标签关联性。本发明提供的一种基于混合注意力的层级多标签分类方法,通过对输入的待分类的文本进行文本特征提取,然后通过多层感知机分类,能对该文本标记一个或多个标签,可广泛应用于电商、新闻、科技论文等领域。The present invention includes four aspects: firstly, using graph convolutional neural network to extract label representations containing hierarchical relationships; secondly, using multiple convolutions with different granularities to extract local features; thirdly, using label-based attention mechanism and Based on the self-attention mechanism, the text features are further extracted and adaptively fused (FA). The fourth is to use the relational network to further extract label associations. A method for hierarchical multi-label classification based on mixed attention provided by the present invention can mark the text with one or more labels by extracting text features from the input text to be classified, and then classifying the text through a multi-layer perceptron. Widely used in e-commerce, news, scientific papers and other fields.
附图说明Description of drawings
图1为本发明基于混合注意力的层级多标签文本分类方法流程图;Fig. 1 is the flow chart of the hierarchical multi-label text classification method based on mixed attention of the present invention;
图2为本发明基于混合注意力的层级多标签文本分类模型网络结构图;Fig. 2 is the network structure diagram of the hierarchical multi-label text classification model based on mixed attention of the present invention;
图3为本发明层级多标签文本分类标签层级结构示意图;3 is a schematic diagram of the hierarchical structure of the hierarchical multi-label text classification label of the present invention;
图4为本发明层级多标签文本分类图卷积神经网络计算示意图;4 is a schematic diagram of calculation of a hierarchical multi-label text classification graph convolutional neural network according to the present invention;
图5为本发明关系网络示意图。FIG. 5 is a schematic diagram of the relationship network of the present invention.
具体实施方式Detailed ways
为了使本发明的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本发明进行进一步详细说明。应当理解,此处所描述的具体实施例仅仅用以解释本发明,并不用于限定本发明。此外,下面所描述的本发明各个实施方式中所涉及到的技术特征只要彼此之间未构成冲突就可以相互组合。In order to make the objectives, technical solutions and advantages of the present invention clearer, the present invention will be further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only used to explain the present invention, but not to limit the present invention. In addition, the technical features involved in the various embodiments of the present invention described below can be combined with each other as long as they do not conflict with each other.
为了实现上述目的,本发明所采用的技术方案为:一种基于混合注意力的层级多标签文本分类方法,该方法包括如下步骤:In order to achieve the above-mentioned purpose, the technical solution adopted in the present invention is: a method for classifying multi-label text based on mixed attention, the method comprises the following steps:
步骤S1,对数据集D中的多标签文本数据进行预处理;Step S1, preprocessing the multi-label text data in the dataset D;
步骤S2,针对文本标签,获取层级分类体系的先验层级信息,所述先验层级信息指的是标签之间互相依赖的先验概率,可以通过计算父标签与子标签之间的转移概率得到;Step S2, for the text label, obtain the prior level information of the hierarchical classification system, the prior level information refers to the prior probability of the interdependence between the labels, which can be obtained by calculating the transition probability between the parent label and the child label. ;
步骤S3,构建深度学习层级多标签文本分类模型;Step S3, constructing a deep learning hierarchical multi-label text classification model;
所述深度学习多标签文本分类模型包括词嵌入模块,文本编码模块,标签编码模块,基于标签注意力机制文本表示模块,基于自注意力机制的文本表示模块,特征融合模块,向量回归层,关系网络模块和标签概率预测层;The deep learning multi-label text classification model includes a word embedding module, a text encoding module, a label encoding module, a text representation module based on a label attention mechanism, a text representation module based on a self-attention mechanism, a feature fusion module, a vector regression layer, and a relationship. Network module and label probability prediction layer;
步骤S4,输入数据集预处理后的文本数据到模型训练,模型训练完成之后,利用训练好的模型对多标签文本进行分类。In step S4, the preprocessed text data of the dataset is input to the model training, and after the model training is completed, the multi-label text is classified by using the trained model.
优选的,步骤S1包括以下步骤:Preferably, step S1 includes the following steps:
对数据集D中样本进行数据预处理,具体包括如下步骤:Data preprocessing is performed on the samples in the dataset D, which includes the following steps:
步骤1-1,对数据集D中的文本进行分词、去除停止词、去除标点符号;Step 1-1, perform word segmentation on the text in data set D, remove stop words, and remove punctuation;
步骤1-2,统计数据集D中的文本中的单词频率word_frequence,删除出现次数小于X1的单词,将过滤后的单词记录,构建单词表。Step 1-2, count the word frequency word_frequence in the text in the data set D, delete the words whose occurrence times are less than X1, record the filtered words, and construct a word list.
步骤1-3,数据集D经过预处理后,将数据集D按3:1:1划分为训练集,验证集,测试集。Steps 1-3, after the data set D is preprocessed, the data set D is divided into training set, validation set and test set according to 3:1:1.
优选的,步骤S2包括以下步骤:Preferably, step S2 includes the following steps:
对于数据集D中的训练集的数据,假设父节点vi和子节点vj之间存在层次路径ei,j。那么由父子节点路径构成的边的特征f(ei,j)由先验概率p(Uj|Ui)和p(Ui|Uj)表示:For the data of the training set in dataset D, assume that there is a hierarchical path e i,j between the parent node v i and the child node v j . Then the feature f(e i,j ) of the edge formed by the parent-child node path is represented by the prior probabilities p(U j |U i ) and p(U i |U j ):
f(ei,j)表示的是两个节点的关系,这种关系用两个节点的转移概率或者共现概率来描述,两个节点的转移概率分别包括父节点到某一个子节点的转移概率p(Uj|Ui),子节点到父节点的转移概率p(Ui|Uj),式中,Uj和Ui分别表示文本数据被标记为vj节点标签及被标记为νi节点标签,p(Uj|Ui)是给定vi情况下被标记为vj节点标签的条件概率,P(Uj∩Ui)是{vj,vi}同时被标记的概率,Nj和Ni分别表示训练集中vj节点标签及νi节点标签的数量。f(e i,j ) represents the relationship between two nodes. This relationship is described by the transition probability or co-occurrence probability of the two nodes. The transition probability of the two nodes includes the transition from the parent node to a certain child node. The probability p(U j |U i ), the transition probability p(U i |U j ) from the child node to the parent node, where U j and U i represent that the text data is marked as v j node label and marked as ν i node label, p(U j |U i ) is the conditional probability of being labelled as v j node label given v i , P(U j ∩U i ) is {v j ,v i } simultaneously labelled The probability of , N j and N i represent the number of v j node labels and v i node labels in the training set, respectively.
优选的,步骤S3还包括以下步骤:Preferably, step S3 also includes the following steps:
通过词嵌入模块对输入文本及其标签进行词嵌入处理,词嵌入处理方法具体为:The word embedding module is used to perform word embedding processing on the input text and its tags. The word embedding processing method is as follows:
步骤2-1、获得预处理后的文本序列,通过查询词嵌入表(Glove-300d)将文本中的单词{x1,x2,...,xn}转换为词向量表示{w1,w2,...,wn}。Step 2-1. Obtain the preprocessed text sequence, and convert the words {x 1 , x 2 ,...,x n } in the text into word vector representation {w 1 by querying the word embedding table (Glove-300d) ,w 2 ,...,w n }.
步骤2-2、获得层级多标签文本分类的标签集{l1,l2,...,ln},通过kaiming编码的方式,将标签集转换成一个维度为300的标签嵌入集{c1,c2,...,cn},n指的是预处理后的文本的单词数量。Step 2-2. Obtain the label set {l 1 , l 2 , ..., l n } of hierarchical multi-label text classification, and convert the label set into a label embedding set with dimension 300 by means of kaiming coding {c 1 ,c 2 ,...,c n }, where n refers to the number of words in the preprocessed text.
优选的,步骤S3还包括以下步骤:Preferably, step S3 also includes the following steps:
通过编码模块对词向量表示{w1,w2,...,wn}进行编码处理,编码处理方法具体为:The word vector representation {w 1 ,w 2 ,...,w n } is encoded by the encoding module, and the encoding processing method is as follows:
使用Bi-GRU网络对文本的词向量表示{w1,w2,...,wn}进行编码,生成具有上下文语义信息的隐含表示{h1,h2,...,hn}。然后将隐含表示{h1,h2,...,hn}进一步送入三个卷积核大小分别为2,3,4,隐藏层数量为100的卷积,并得到三个不同感受野下的语义向量,分别将其经过最大池化后再将3个语义向量拼接成一个新的维度为300语义表示向量S={s1,s2,…,sn}。The word vector representations {w 1 ,w 2 ,...,w n } of the text are encoded using the Bi-GRU network to generate implicit representations {h 1 ,h 2 ,...,h n with contextual semantic information }. Then the implicit representation {h 1 , h 2 ,...,h n } is further fed into three convolutions with kernel sizes of 2, 3, and 4, and the number of hidden layers is 100, and three different convolutions are obtained. Semantic vectors under the receptive field are respectively subjected to maximum pooling, and then three semantic vectors are spliced into a new semantic representation vector S={s 1 , s 2 ,...,s n } with a dimension of 300.
优选的,步骤S3还包括以下步骤:Preferably, step S3 also includes the following steps:
通过标签编码模块对标签向量表示{c1,c2,...,cn}进行编码处理,标签编码处理方法具体为使用单层GCN对标签向量表示{c1,c2,...,cn}进行编码,生成具有标签层次关联信息的隐含表示M={m1,m2,...,mn}。其实现过程如下:The label vector representation {c 1 ,c 2 ,...,c n } is encoded by the label encoding module. The label encoding processing method is specifically to use a single-layer GCN to encode the label vector representation {c 1 ,c 2 ,... ,c n } is encoded to generate an implicit representation M={m 1 ,m 2 ,...,m n } with label-level association information. The implementation process is as follows:
层次结构GCN聚合了自上而下、自下而上和自循环边缘内的数据流。在层次GCN中,每个有向边代表一个成对的标签相关特征,这些数据流使用沿边线性变换进行节点变换。Hierarchical GCNs aggregate top-down, bottom-up, and self-loop data flows within edges. In hierarchical GCNs, each directed edge represents a pair of label-related features, and these data streams are transformed at nodes using linear transformations along the edges.
为了实现节点变换,本发明使用了加权邻接矩阵来表示这种线性变换,而加权邻接矩阵的初始值来自于步骤S2中层级分类体系的先验层级信息。形式上,层次GCN根据节点k的相关邻域对其隐藏状态进行编码,其中邻域N(k)={nk,child(k),parent(k)},nk指的是层级标签树中的第k个标签节点,child(k)是指第k个节点的子标签节点,parent(k)指的是第k个节点的父标签节点,节点k的隐藏状态计算方式如下:In order to realize the node transformation, the present invention uses a weighted adjacency matrix to represent this linear transformation, and the initial value of the weighted adjacency matrix comes from the prior level information of the hierarchical classification system in step S2. Formally, a hierarchical GCN encodes its hidden state according to the relevant neighborhood of node k, where the neighborhood N(k) = {n k , child(k), parent(k)}, n k refers to the hierarchical label tree The kth label node in , child(k) refers to the child label node of the kth node, parent(k) refers to the parent label node of the kth node, and the hidden state of node k is calculated as follows:
上述公式中,vj,vk是可以训练的参数,及是可训练的偏置参数;对于uk,j及gk,j而言,可以将uk,j理解成结点k,j之间的信息,gk,j理解成门控值,控制uk,j最后对节点k的影响大小;σ是指深度学习中的激活函数可以取为sigmoid函数,bl∈RN×dim,及bg∈RN,dim为向量的维度大小,属于预先定义的超参数;d(j,k)表示从节点j到节点k的层次方向,包括自顶向下、自下而上和自循环边;其中,ak,j∈R表示的是层次概率fd(k,j)(ekj),fd(k,j)(ekj)指的是从第k个节点到第j个标签节点间的转移概率,它是通过f(ei,j)得到,自循环边采用ak,k=1,自上而下的边使用自下而上的边使用fp(ej,k)=1;上述边的特征矩阵F={a0,0,a0,1,…,ac-1,c-1}表示的是文本标签有向层次图的加权邻接矩阵,最后,节点k的输出隐藏状态hk表示其对应于层次结构信息的标签表示。In the above formula, v j , v k are parameters that can be trained, and is a trainable bias parameter; for uk ,j and gk ,j , uk ,j can be understood as the information between nodes k, j, gk,j as the gate value, control The final influence of u k, j on node k; σ means that the activation function in deep learning can be taken as the sigmoid function, b l ∈R N×dim , and b g ∈R N , dim is the dimension of the vector, which is a predefined hyperparameter; d(j, k) represents the hierarchical direction from node j to node k, including top-to-bottom Bottom, bottom-up, and self-loop edges; where a k,j ∈ R denotes the hierarchical probability f d(k,j) (e kj ), and f d(k,j) (e kj ) refers to The transition probability from the kth node to the jth label node, which is obtained by f(e i,j ), the self-loop edge adopts a k,k = 1, and the top-down edge uses The bottom-up edge uses f p (e j,k )=1; the characteristic matrix F={a 0,0 ,a 0,1 ,...,a c-1,c-1 } of the above edge represents A weighted adjacency matrix of a directed hierarchical graph of text labels, and finally, the output hidden state h k of node k represents its label representation corresponding to the hierarchical information.
优选的,步骤S3还包括以下步骤:Preferably, step S3 also includes the following steps:
基于标签注意力机制文本表示模块的提取方法为:对来自文本编码层的文本表示以及来自标签编码层的标签表示通过以下公式计算基于标签注意力的文本表示:The extraction method of the text representation module based on the label attention mechanism is: the text representation from the text encoding layer and the label representation from the label encoding layer The label attention based text representation is calculated by the following formula:
其中,αkj表示第j个文本特征向量对于第k个标签的信息量。vk即基于标签注意的文本表示。Among them, α kj represents the information amount of the j-th text feature vector for the k-th label. v k is the text representation based on label attention.
优选的,步骤S3还包括以下步骤:Preferably, step S3 also includes the following steps:
基于自注意力机制文本表示模块的提取方法具体为:对来自文本编码层中Bi-GRU输出的隐层文本表示通过以下公式计算基于自注意力机制的文本表示:The extraction method of the text representation module based on the self-attention mechanism is specifically: the hidden layer text representation from the Bi-GRU output in the text encoding layer The text representation based on the self-attention mechanism is calculated by the following formula:
其中w1,w2为参数,h为文本表示,αkt为文本表示中第t个向量所占的权重,uk为基于自注意力机制的文本表示。Where w 1 , w 2 are parameters, h is the text representation, α kt is the weight occupied by the t-th vector in the text representation, and uk is the text representation based on the self-attention mechanism.
优选的,步骤S3还包括以下步骤:Preferably, step S3 also includes the following steps:
特征融合模块为:将与基于标签注意力机制的文本特征和基于自注意力机制的文本特征进行自适应融合,得到最终的文本特征dik-fusion,计算方式如下:The feature fusion module is: adaptively fuse with the text features based on the label attention mechanism and the text features based on the self-attention mechanism to obtain the final text feature d ik-fusion , and the calculation method is as follows:
其中w1,w2为参数,vk为基于标签注意力的文本表示,uk为基于自注意力的文本表示,βk为vk所占的权重。where w 1 and w 2 are parameters, v k is the text representation based on label attention, uk is the text representation based on self-attention, and β k is the weight occupied by v k .
优选的,步骤S3还包括以下步骤:Preferably, step S3 also includes the following steps:
利用关系网络模块对标签间的关联信息进一步挖掘:挖掘方法为将特征融合模块产生的文本特征dik-fusion输入到全连接层,得到每标签对应的logits向量O={o1,o2,...,on},然后将向量O输入到关系网络模块得到预测向量y={y1,y2,...,yn},最后将预测向量y输入到多层感知机,即能得到标签预测概率,其中关系网络本质是一个残差网络。Use the relational network module to further mine the association information between tags: the mining method is to input the text feature d ik-fusion generated by the feature fusion module into the fully connected layer, and obtain the logits vector corresponding to each tag O={o 1 ,o 2 , ...,on }, then input the vector O to the relational network module to get the predicted vector y={y 1 ,y 2 ,...,y n }, and finally input the predicted vector y to the multi - layer perceptron, namely The label prediction probability can be obtained, and the relationship network is essentially a residual network.
优选的,步骤S4包括以下步骤:Preferably, step S4 includes the following steps:
训练过程中需要使用采用交叉熵损失函数,使用Adam优化器进行训练,多标签文本分类的交叉熵损失函数如下:In the training process, the cross-entropy loss function needs to be used, and the Adam optimizer is used for training. The cross-entropy loss function of multi-label text classification is as follows:
其中,yij为第i个样本对于第j个标签的实际概率,为第i个样本对于第j个标签的预测概率,最终得到训练好的深度学习多标签文本分类模型,L指的是标签类别的数量,N指的是文本样本的数量。Among them, y ij is the actual probability of the i-th sample for the j-th label, and is the predicted probability of the i-th sample for the j-th label, and finally the trained deep learning multi-label text classification model is obtained, and L refers to the label The number of categories, N refers to the number of text samples.
本文中所描述的具体实施例仅仅是对本发明精神作举例说明。本发明所属技术领域的技术人员可以对所描述的具体实施例做各种各样的修改或补充或采用类似的方式替代,但并不会偏离本发明的精神或者超越所附权利要求书所定义的范围。The specific embodiments described herein are merely illustrative of the spirit of the invention. Those skilled in the art to which the present invention pertains can make various modifications or additions to the described specific embodiments or substitute in similar manners, but will not deviate from the spirit of the present invention or go beyond the definitions of the appended claims range.
Claims (10)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210216140.7A CN114896388B (en) | 2022-03-07 | 2022-03-07 | A hierarchical multi-label text classification method based on hybrid attention |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210216140.7A CN114896388B (en) | 2022-03-07 | 2022-03-07 | A hierarchical multi-label text classification method based on hybrid attention |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114896388A true CN114896388A (en) | 2022-08-12 |
CN114896388B CN114896388B (en) | 2024-12-20 |
Family
ID=82714905
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210216140.7A Active CN114896388B (en) | 2022-03-07 | 2022-03-07 | A hierarchical multi-label text classification method based on hybrid attention |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114896388B (en) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115374285A (en) * | 2022-10-26 | 2022-11-22 | 思创数码科技股份有限公司 | Government affair resource catalog theme classification method and system |
CN115757823A (en) * | 2022-11-10 | 2023-03-07 | 魔方医药科技(苏州)有限公司 | Data processing method and device, electronic equipment and storage medium |
CN116089618A (en) * | 2023-04-04 | 2023-05-09 | 江西师范大学 | Drawing meaning network text classification model integrating ternary loss and label embedding |
CN116187419A (en) * | 2023-04-25 | 2023-05-30 | 中国科学技术大学 | Automatic hierarchical system construction method based on text chunks |
CN116304845A (en) * | 2023-05-23 | 2023-06-23 | 云筑信息科技(成都)有限公司 | Hierarchical classification and identification method for building materials |
CN116542252A (en) * | 2023-07-07 | 2023-08-04 | 北京营加品牌管理有限公司 | Financial text checking method and system |
CN116932765A (en) * | 2023-09-15 | 2023-10-24 | 中汽信息科技(天津)有限公司 | Patent text multi-stage classification method and equipment based on graphic neural network |
CN117453921A (en) * | 2023-12-22 | 2024-01-26 | 南京华飞数据技术有限公司 | Data information label processing method of large language model |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108073677A (en) * | 2017-11-02 | 2018-05-25 | 中国科学院信息工程研究所 | A kind of multistage text multi-tag sorting technique and system based on artificial intelligence |
US20200356851A1 (en) * | 2019-05-10 | 2020-11-12 | Baidu Usa Llc | Systems and methods for large scale semantic indexing with deep level-wise extreme multi-label learning |
CN113626589A (en) * | 2021-06-18 | 2021-11-09 | 电子科技大学 | Multi-label text classification method based on mixed attention mechanism |
CN113806547A (en) * | 2021-10-15 | 2021-12-17 | 南京大学 | Deep learning multi-label text classification method based on graph model |
-
2022
- 2022-03-07 CN CN202210216140.7A patent/CN114896388B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108073677A (en) * | 2017-11-02 | 2018-05-25 | 中国科学院信息工程研究所 | A kind of multistage text multi-tag sorting technique and system based on artificial intelligence |
US20200356851A1 (en) * | 2019-05-10 | 2020-11-12 | Baidu Usa Llc | Systems and methods for large scale semantic indexing with deep level-wise extreme multi-label learning |
CN113626589A (en) * | 2021-06-18 | 2021-11-09 | 电子科技大学 | Multi-label text classification method based on mixed attention mechanism |
CN113806547A (en) * | 2021-10-15 | 2021-12-17 | 南京大学 | Deep learning multi-label text classification method based on graph model |
Non-Patent Citations (1)
Title |
---|
肖琳;陈博理;黄鑫;刘华锋;景丽萍;于剑;: "基于标签语义注意力的多标签文本分类", 软件学报, no. 04, 30 April 2020 (2020-04-30) * |
Cited By (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115374285A (en) * | 2022-10-26 | 2022-11-22 | 思创数码科技股份有限公司 | Government affair resource catalog theme classification method and system |
CN115374285B (en) * | 2022-10-26 | 2023-02-07 | 思创数码科技股份有限公司 | Government affair resource catalog theme classification method and system |
CN115757823A (en) * | 2022-11-10 | 2023-03-07 | 魔方医药科技(苏州)有限公司 | Data processing method and device, electronic equipment and storage medium |
CN115757823B (en) * | 2022-11-10 | 2024-03-05 | 魔方医药科技(苏州)有限公司 | Data processing method, device, electronic equipment and storage medium |
CN116089618A (en) * | 2023-04-04 | 2023-05-09 | 江西师范大学 | Drawing meaning network text classification model integrating ternary loss and label embedding |
CN116089618B (en) * | 2023-04-04 | 2023-06-27 | 江西师范大学 | A Graph Attention Network Text Classification Model Fusing Triple Loss and Label Embeddings |
CN116187419A (en) * | 2023-04-25 | 2023-05-30 | 中国科学技术大学 | Automatic hierarchical system construction method based on text chunks |
CN116187419B (en) * | 2023-04-25 | 2023-08-29 | 中国科学技术大学 | A Method for Automatically Constructing Hierarchy Based on Text Chunking |
CN116304845B (en) * | 2023-05-23 | 2023-08-18 | 云筑信息科技(成都)有限公司 | Hierarchical classification and identification method for building materials |
CN116304845A (en) * | 2023-05-23 | 2023-06-23 | 云筑信息科技(成都)有限公司 | Hierarchical classification and identification method for building materials |
CN116542252A (en) * | 2023-07-07 | 2023-08-04 | 北京营加品牌管理有限公司 | Financial text checking method and system |
CN116542252B (en) * | 2023-07-07 | 2023-09-29 | 北京营加品牌管理有限公司 | Financial text checking method and system |
CN116932765A (en) * | 2023-09-15 | 2023-10-24 | 中汽信息科技(天津)有限公司 | Patent text multi-stage classification method and equipment based on graphic neural network |
CN116932765B (en) * | 2023-09-15 | 2023-12-08 | 中汽信息科技(天津)有限公司 | Patent text multi-stage classification method and equipment based on graphic neural network |
CN117453921A (en) * | 2023-12-22 | 2024-01-26 | 南京华飞数据技术有限公司 | Data information label processing method of large language model |
CN117453921B (en) * | 2023-12-22 | 2024-02-23 | 南京华飞数据技术有限公司 | Data information label processing method of large language model |
Also Published As
Publication number | Publication date |
---|---|
CN114896388B (en) | 2024-12-20 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN114896388B (en) | A hierarchical multi-label text classification method based on hybrid attention | |
CN111914558B (en) | Course knowledge relation extraction method and system based on sentence bag attention remote supervision | |
CN109783818B (en) | Enterprise industry classification method | |
CN110020438B (en) | Sequence identification based enterprise or organization Chinese name entity disambiguation method and device | |
CN108897857B (en) | Chinese text subject sentence generating method facing field | |
CN106980683B (en) | Blog text abstract generating method based on deep learning | |
CN111382575A (en) | An event extraction method based on joint annotation and entity semantic information | |
CN113806547B (en) | Deep learning multi-label text classification method based on graph model | |
CN107391565B (en) | Matching method of cross-language hierarchical classification system based on topic model | |
CN113626589A (en) | Multi-label text classification method based on mixed attention mechanism | |
CN113516198A (en) | Cultural resource text classification method based on memory network and graph neural network | |
CN113515632A (en) | Text classification method based on graph path knowledge extraction | |
CN113191148A (en) | Rail transit entity identification method based on semi-supervised learning and clustering | |
CN114722835A (en) | Text emotion recognition method based on LDA and BERT fusion improved model | |
CN113836896A (en) | Patent text abstract generation method and device based on deep learning | |
CN113051886A (en) | Test question duplicate checking method and device, storage medium and equipment | |
CN116821371A (en) | Method for generating scientific abstracts of multiple documents by combining and enhancing topic knowledge graphs | |
CN115795037B (en) | Multi-label text classification method based on label perception | |
CN116663539A (en) | Chinese entity and relationship joint extraction method and system based on RoBERTa and pointer network | |
CN118761416B (en) | Method for chemical production named entity recognition by pre-training BERT | |
CN118013038A (en) | Text increment relation extraction method based on prototype clustering | |
CN114860920B (en) | Method for generating single language theme abstract based on different composition | |
CN117271701A (en) | Method and system for extracting system operation abnormal event relation based on TGGAT and CNN | |
CN116383672A (en) | Cigarette relevance analysis method and system based on graph neural network | |
CN113821618B (en) | Method and system for extracting class items of electronic medical record |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |