CN114490951A

CN114490951A - Multi-label text classification method and model

Info

Publication number: CN114490951A
Application number: CN202210384987.6A
Authority: CN
Inventors: 李芳芳; 苏朴真; 黄惟; 康占英; 王青
Original assignee: Changsha Zhiwei Information Technology Co ltd
Current assignee: Changsha Zhiwei Information Technology Co ltd
Priority date: 2022-04-13
Filing date: 2022-04-13
Publication date: 2022-05-13
Anticipated expiration: 2042-04-13
Also published as: CN114490951B

Abstract

The invention discloses a multi-label text classification method and a model, wherein the classification method comprises a label pre-adaptation task, and pre-adaptation embedded feature representation is obtained according to input data of multi-label text classification, so that similarity matching is carried out; obtaining sharing characteristics, namely performing weight loading on a pre-training language model in a label pre-adaptation task according to input data classified by a multi-label text to obtain sharing characteristic representation; the parallel classification tasks are expressed by using the shared characteristics as the input of the parallel tasks, and the parallel tasks comprise a chapter-label classification task, a keyword-label classification task and a label-label correlation judgment task; the classification model comprises a label pre-adaptation module, a shared characteristic acquisition module, a keyword extraction module, a label sampling module, a chapter-label classification module, a keyword-label classification module and a label-label correlation judgment module. The invention increases the parallel tasks and improves the performance of the model.

Description

Multi-label text classification method and model

Technical Field

The invention relates to the technical field of text classification based on labels, in particular to a multi-label text classification method and a multi-label text classification model.

Background

Text, one of the most important information carriers today, can enter the network through various social platforms, news media, and the like. The text information has various formats, themes and contents and different lengths, and how to reasonably apply and process the text information becomes a very urgent need. Text classification is an important task in NLP, and its application scenarios are wide, such as case analysis and classification in wisdom judicial to assist judges of judges.

The multi-label text classification is a difficult task in text classification, and is widely applied to the fields of information retrieval, emotion analysis, label recommendation, intention identification and the like at present. Different from the traditional single label classification task, the corresponding relation between texts and labels in the multi-label classification task is complex, and one-to-one or one-to-many situations exist, and the complexity makes the multi-label text classification task become a very challenging task in NLP. Among the processing methods of multi-label text classification tasks, there are two types of methods: the traditional machine learning method and the method based on deep learning are divided into the following steps in the deep learning method because of the appearance of a pre-training language model: non-pre-training language model methods and pre-training language model based methods. The traditional machine learning method is simple to apply compared with the deep learning-based method, but the classification effect of the traditional machine learning method is usually poor. The classification effect of non-pre-training language model methods such as CNN, RNN and LSTM based on deep learning methods is weaker than that of pre-training language model methods such as BERT and RoBERTA due to the lack of feature extraction capability.

However, the following problems still exist based on pre-training language model methods such as BERT + Softmax, RoBERTA + Softmax, etc.:

(1) introduction of missing label information. People often combine a plurality of single-label classification tasks when modeling a multi-label text classification task, and the interaction between labels and texts and between labels is ignored by the too simple and violent task migration, so that the classification effect is influenced. Meanwhile, the pre-trained language model has the limitation of embedding length during embedding, so that the label information cannot be utilized or effectively utilized for classification.

(2) The task is single and cannot exhaust the utilization of the text information. The method based on the pre-training language model generally cannot use all the feature representations in application because of huge parameters, and the method is often adopted to classify single chapter feature representations, so that a large amount of fine-grained text features are discarded.

(3) The label distribution has a long tail phenomenon. Each given text is statistically and semantically related to a set of labels in the multi-label text classification task. And the real-world classification problem is often represented by long-tail label distribution, wherein low-frequency labels are only related to a few examples, and the model is difficult to learn.

Disclosure of Invention

Therefore, the technical problem to be solved by the present invention is to overcome the defects that interaction between tags and texts, between tags and labels is ignored, the pre-training language model has the limitation of embedding length during embedding, the classification task is single, and the tags have a long tail phenomenon in the prior art, so as to provide a multi-tag text classification method based on tag pre-adaptation and multi-task learning, specifically a multi-tag text classification method.

The invention provides a multi-label text classification method, which comprises the following steps:

a label pre-adaptation task;

s1: expanding input data of multi-label text classification, wherein the input data comprises texts and labels;

s2: embedding the text and part of the tags by adopting an embedding method according to the expanded input data to obtain a pre-adaptive embedded representation;

s3: inputting the pre-adaptive embedded representation into a pre-training language model to obtain a pre-adaptive embedded feature representation, and then performing similarity matching through a feature representation fusion layer and a full connection layer to enable the pre-training language model to learn the unique mapping between the label and the pre-adaptive embedded representation;

acquiring a sharing characteristic;

s4: embedding the text and the full amount of labels by adopting an embedding method according to input data classified by the multi-label text to obtain a shared embedded representation;

s5: carrying out weight loading on a pre-training language model in the label pre-adaptation task to obtain a loaded pre-training language model, and inputting the shared embedded representation into the loaded pre-training language model to obtain a shared characteristic representation;

parallel classification tasks;

s6: the shared characteristic representation is used as the input of the parallel tasks, the parallel tasks comprise a chapter-label classification task, a keyword-label classification task and a label-label correlation judgment task, the chapter-label classification task and the keyword-label classification task are used for classifying multi-label texts, and the label-label correlation judgment task is used for assisting the chapter-label classification task and the keyword-label classification task to better utilize label information.

Preferably, the embedding method embeds the tag into the input data with a separator as a separator to obtain the embedded representation.

Preferably, the embedding method includes a mapping method, the mapping method is a mapping in units of words in the input data, each bit value in the mapping method has a word uniquely corresponding to the bit value in the word list, and the tag is uniquely mapped and spliced into the embedded representation.

Preferably, in S1, the input data of the multi-Label Text classification is expanded, specifically, labels Label _ i _1, Label _ i _2, … …, and Label _ i _ n (n is the number of labels corresponding to the Text data) corresponding to each Text data Text _ i are denoted as Label _ i +; the remaining labels are labeled Label _ i-, generating one positive sample [ Text _ i, Label _ i +, 1] and a plurality of negative samples [ Text _ i, Label _ i _ k, 0] (Label _ i _ k ∈ Label _ i-), for each multi-Label datum.

Preferably, S5 further includes a feature representation generated after the weights of the pre-trained language model in the tag pre-adaptation task are loaded, where the feature representation includes a text feature representation and a tag feature representation, the text feature representation is processed by the keyword extraction module, the tag feature representation is processed by the tag sampling module, and finally the shared feature representation is obtained.

Preferably, in S6, the chapter-label classification task includes the steps of:

the method comprises the following steps: the weight of the current pre-training language model loading label pre-adaptive task and the training language model;

step two: embedding the text and the full-scale label together to obtain a shared embedded expression, and then inputting the shared embedded expression into the current pre-training language model to obtain a shared characteristic expression, wherein the shared characteristic expression comprises chapter characteristics, keyword characteristics and label characteristics;

step three: the discourse characteristics and each label characteristic form discourse-label characteristic pairs in sequence, and the relevance and importance ratio of each label in the discourse-label characteristic pair characteristic vector is obtained through an Attention structure (Attention structure);

step four: and performing multi-label classification according to the obtained relevance and importance ratio.

Preferably, in S6, the keyword-tag classification task includes the steps of:

step three: the keyword features obtained by the keyword extraction module and each label feature form a feature pair in sequence, and then the relevance and importance ratio of each label in the keyword-label feature pair feature vector is obtained through an Attention structure (Attention structure);

step four: and performing multi-label classification according to the obtained relevance and importance ratios.

Preferably, in S6, the task of determining tag-tag correlation includes the following steps:

the method comprises the following steps: performing statistical analysis on all the data to obtain the proportion of each label corresponding to the other labels appearing at the same time;

step two: dividing the tags according to input data of multi-tag Text classification, dividing corresponding related tags in each piece of data Text _ i into related tag sets, marking as relationship _ i +, and dividing the rest unrelated tags into unrelated tag sets, marking as relationship _ i-;

step three: generating a data subset of the label correlation judgment task according to the proportion obtained in the step one and the label set obtained in the step two; for each tag Y + j (j =0, 1.., Num (relationship _ i +)) in relationship _ i +, a positive sample is generated: [ Y + j, Y + k, 1] (j ≠ k, and a ratio (Y + j: Y + k) > sets a threshold), negative sample: [ Y + j, Y-k, 0] (ratio (Y + j: Y-k) < set threshold);

step four: and performing a tag-tag correlation judgment task according to the data subset, and optimizing tag feature representation.

Preferably, the data includes training set data, validation set data, and test set data.

The invention also provides a multi-label text classification model, which comprises the following components: the system comprises a label pre-adaptation module, a shared characteristic acquisition module, a keyword extraction module, a label sampling module, a chapter-label classification module, a keyword-label classification module and a label-label correlation judgment module;

the label pre-adaptation module is used for obtaining the similarity of labels and texts according to input data of multi-label text classification; the shared characteristic acquisition module is used for acquiring shared characteristic representation according to input data of multi-label text classification, and the shared characteristic representation comprises chapter characteristics, keyword characteristics and label characteristics;

the keyword extraction module is used for processing the text feature representation in the feature representation generated after the weight of the pre-training language model in the label pre-adaptation task is loaded;

the label sampling module is used for processing label feature representation in feature representation generated after the weight of a pre-training language model in the label pre-adaptation task is loaded;

the chapter-label classification module is used for obtaining chapter-label characteristic pairs according to input data of multi-label text classification and further performing multi-label classification;

the keyword-label classification module is used for obtaining keyword-label characteristic pairs according to input data of multi-label text classification, and further performing multi-label classification;

and the label-label correlation judgment module is used for performing statistical analysis on all data according to the input data classified by the multi-label text to obtain a data subset, and further optimizing the label characteristic representation.

The technical scheme of the invention has the following advantages:

1. in order to solve the technical problems that interaction between labels and texts and interaction between labels are ignored, and the embedding length of a pre-training language model is limited during embedding, the embedding method is used for obtaining the embedding representation, so that label information can be utilized by the pre-training language model; through the label pre-adaptation task, the pre-training language model can adapt to the label mapping relation, and further label characteristic representation with richer semantic information is generated;

2. in order to solve the technical problems of single classification task and long tail phenomenon of labels, the invention respectively learns from three dimensions of chapter-label, keyword-label and label-label by introducing a parallel multi-task learning method, and classifies the label feature representation with rich semantic information obtained by using the label pre-adaptive task and the interaction of chapter and keyword feature representation; meanwhile, implicit information supplement is provided for the low-frequency tags by utilizing the dependency and the correlation among the tags, and tag feature representation is further enhanced and enriched.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.

FIG. 1 is a flow chart of a classification method in the practice of the present invention;

FIG. 2 is a schematic diagram of an improved embedding method in the practice of the present invention;

FIG. 3 is a flow chart of tag pre-adaptation in the practice of the present invention;

FIG. 4 is a diagram of a tag pre-adaptation architecture in the practice of the present invention;

FIG. 5 is a flow chart of shared feature acquisition in an implementation of the present invention;

FIG. 6 is a block diagram illustrating shared feature acquisition in an embodiment of the present invention;

FIG. 7 is a structural diagram of a conventional [ CLS ] classification method;

FIG. 8 is a block diagram of a chapter-label classification task in accordance with the present invention;

FIG. 9 is a block diagram of a keyword extraction module in accordance with an embodiment of the present invention;

FIG. 10 is a diagram of a keyword-tag classification task architecture in the practice of the present invention;

FIG. 11 is a diagram illustrating a task structure for determining tag-tag correlation in accordance with an embodiment of the present invention;

fig. 12 is a histogram of the relative proportion of the simultaneous occurrence of label 1 and the remaining labels in the practice of the present invention.

Detailed Description

The technical solutions of the present invention will be described clearly and completely with reference to the accompanying drawings, and it should be understood that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

In the description of the present invention, it should be noted that the terms "center", "upper", "lower", "left", "right", "vertical", "horizontal", "inner", "outer", etc., indicate orientations or positional relationships based on the orientations or positional relationships shown in the drawings, and are only for convenience of description and simplicity of description, but do not indicate or imply that the device or element being referred to must have a particular orientation, be constructed and operated in a particular orientation, and thus, should not be construed as limiting the present invention. Furthermore, the terms "first," "second," and "third" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.

In the description of the present invention, it should be noted that, unless otherwise explicitly specified or limited, the terms "mounted," "connected," and "connected" are to be construed broadly, e.g., as meaning either a fixed connection, a removable connection, or an integral connection; can be mechanically or electrically connected; they may be connected directly or indirectly through intervening media, or they may be interconnected between two elements. The specific meanings of the above terms in the present invention can be understood in specific cases to those skilled in the art.

In addition, the technical features involved in the different embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.

Text classification is the most fundamental task in natural language processing, and plays a crucial role in managing text data as an effective information retrieval and mining technique. Multi-label text classification is a difficult task in text classification. Since the multi-label text classification needs to fully consider the semantic relationship between labels, and the embedding and encoding of the model is a lossy compression process. Therefore, how to reduce the loss of the hierarchical semantics in the training process and how to keep rich and complex document semantic information is still a problem to be solved urgently.

In order to solve the above problems, the method provided in this embodiment introduces a parallel multi-task learning method, which performs learning from three dimensions of chapter-label, keyword-label, and label-label, and performs classification of "don't avoid heavy but not light" by using label feature representation with rich semantic information obtained by the label pre-adaptation task and interaction of chapter and keyword feature representation. Meanwhile, implicit information supplement is provided for the low-frequency tags by utilizing the dependency and the correlation among the tags, and tag feature representation is further enhanced and enriched. Due to the structural innovation of each parallel task and the innovation of the serial-parallel task combined with the self process, the method has remarkable advantages and excellent performance on the multi-label text classification task.

As shown in fig. 1, the present embodiment provides a multi-tag text classification method, which includes an embedding method, where the embedding method embeds tags into input data by using separators as separators to obtain an embedded representation.

For multi-label text classification tasks and even NLP tasks, how to obtain feature representation of texts is always a long-standing problem, but pre-training language models such as BERT are in the world, so that NLP researchers can obtain text feature representation with rich semantic information. Since pre-training language models such as BERT are limited by a certain length when text embedding is carried out, tags cannot be effectively embedded into an input sequence when a multi-tag text classification task with a large number of tags is processed, and thus help cannot be provided for multi-tag text classification by using information carried by the tags. Based on the above considerations, the present embodiment improves on the embedding method of the BERT pre-training language model.

The improvement of the Embedding method is specifically shown in fig. 2, the left side in the figure is a common Embedding method, tag information is not merged in the common Embedding method, in the improved Embedding method, a tag Token is embedded into an embedded tag of input data finally obtained by using a separator ([ SEP ]) as a separation, wherein Token Embedding (tag Embedding) represents an Embedding mapping in units of words in input, the value of each bit in Token Embedding has a word uniquely corresponding to the bit in a word list, and particularly, due to the limitation of the Embedding length, each Label _ i cannot be completely embedded, so that only a mapping method can be adopted, that is, the corresponding Token Embedding of each Label _ i is a special Token in the word list: [ Ununsed _ i ]. [ CLS ] stands for Classification Token as discriminant Token, and [ SEP ] stands for Separation Token as a separator between input texts. Segment Embedding represents the paragraph information of the text in the input, and as shown in the above figure, "T _ 1" to "T _ 5" belong to the text paragraph part, so their Segment Embedding is 0, and "L _ 1" to "L _ 5" belong to the tag paragraph part, so their Segment Embedding is 1. Position Embedding represents a Position vector of an input sequence, and the calculation of the Position vector is calculated by sin and cos functions in the embodiment, and the calculation formula is:

wherein the content of the first and second substances,

is the length of the position vector and is,

is the position of the words and phrases,

then represents the dimension of the word. The essence is to convert a word with pos in a sentence into one

Dimension position vector, where the ith value is PE. However, [ Ununsed _ i [ ]]Token does not have practical significance, and even the language model is not pre-trained, so that the labels are addedEmbedding can not effectively utilize the Label information, so the method introduces a serial task of Label pre-adaptation, so that a pre-training model can learn Label _ i and Ununsed _ i]The mapping relation between the tags enables the next parallel task to better utilize the information carried by the tags.

Compared with a common embedding method, the embedding method provided by the embodiment introduces the unique label mapping Token during embedding, and adds a serial task for label pre-adaptation.

As shown in fig. 3 and 4, the classification method further includes a tag pre-adaptation task, and the specific steps of the tag pre-adaptation task are as follows:

the method comprises the following steps: expanding Text data classified according to a multi-Label Text, wherein corresponding labels Label _ i _1, Label _ i _2, … … and Label _ i _ n (n is the number of labels corresponding to the Text data) in each Text data Text _ i are recorded as Label _ i +, and the rest labels are recorded as Label _ i-, and a positive sample [ Text _ i, Label _ i +, 1] and seven negative samples [ Text _ i, Label _ i _ k, 0] (Label _ i _ k belongs to Label _ i-);

step two: according to the data expanded in the first step, embedding the text and partial tags by adopting the embedding method to obtain a pre-adaptive embedded representation, inputting the pre-adaptive embedded representation into a pre-training language model BERT to obtain a pre-adaptive embedded feature representation, and performing similarity matching through a feature representation fusion layer and a full connection layer to enable the pre-training language model to learn the unique mapping between the tags and the pre-adaptive embedded representation.

Unique mapping refers to the unique mapping of a tag when embedding, for example, tag: sports news, when embedded, will have only one embedded representation Ununsed _1 corresponding to it. The main reason is that the labels are different from the texts, each word in the texts has an embedded representation corresponding to the word, and if each word is still used for embedding when the labels are introduced, the embedding length is too long, so that the pre-training model cannot be input, and therefore, for each label, a unique mapping embedded representation Ununsed _ n is introduced for embedding.

In this embodiment, the part of the partial tags is related tags corresponding to one piece of data in the multi-tag classification data, and the average value is usually 3-4, and at most not more than 8 (the total tag amount is 54).

The effect of this task is: because the pre-trained language model is not contacted with the embedded representation of Ununsed during pre-training, in order to enable the pre-trained language model to be better adapted to the embedded representation, and accordingly generate proper label feature representation, a label pre-adaptation task is specially used for providing rich feature representation for parallel tasks in the next step by fully utilizing semantic information carried by a label, meanwhile, the pre-trained language model BERT can be enabled to be pre-adapted to label unique mapping Token (embedded representation) through a label similarity matching task, and the mapping relation between the label and the Token (embedded representation) is learned, so that the effect of the multi-label text classification model can be effectively improved.

As shown in fig. 5, the classification method further includes shared feature acquisition: on the basis of the label pre-adaptation task, loading weights in a pre-training language model BERT in the label pre-adaptation task, and generating shared characteristics required by three subtasks according to an input text and a full amount of labels; the method comprises the following specific steps:

the method comprises the following steps: embedding the text and the full label through a coding layer according to the text data to obtain shared embedded representation;

step two: carrying out weight loading on a pre-training language model BERT in a label pre-adaptation task to obtain a loaded pre-training language model BERT; and inputting the shared embedded representation into the loaded pre-training language model BERT to obtain a shared feature representation, wherein the shared feature representation is used as the input of three parallel tasks.

And step two, according to the feature representation generated after the weight of the pre-training language model BERT in the label pre-adaptation task is loaded, processing the text feature representation in the feature representation through a keyword extraction module, and processing the label feature representation through a label sampling module.

As shown in fig. 6, the method of embedding text and tags when performing shared feature acquisition is different from the embedding method in the tag pre-adaptation task, in which embedding of tags is performed for partial tags related to text, and embedding of tags is performed for full tags when performing shared feature acquisition. In addition, for the feature representation generated after the weight of the pre-training language model BERT in the label pre-adaptation task is loaded, the text feature representation in the feature representation is processed through a keyword extraction module, the label feature representation is processed through a label sampling module, and finally the shared feature representation is obtained and used as the input of three parallel tasks.

Compared with generating a separate feature representation for each parallel task, sharing feature representations can greatly reduce the risk of overfitting of each task, and intuitively, the more tasks a model learns at the same time, the more difficult it is to find a feature representation containing all tasks, i.e., the less likely it is to overfit the original task. Meanwhile, due to the fact that a plurality of tasks are mutually influenced by simultaneously training a plurality of tasks, the influence is reflected on the shared characteristics, when all the tasks are converged, the current structure is equivalent to the fusion of all the tasks, and therefore the performance of the model is greatly improved.

Based on the consideration of various conventional methods in the multi-label text classification task, the effect of the method for directly using [ CLS ] for classification is far superior to that of various conventional methods which are not based on a pre-training language model, so the embodiment adopts a classification method which is based on [ CLS ] and is improved on the classification method selection of the parallel subtasks. As shown in fig. 7, the conventional [ CLS ] classification method does not utilize tag information, but only utilizes [ CLS ] information alone for classification, but the chapter-tag classification method adopted in this embodiment fully utilizes semantic information carried in tags, and combines with an Attention structure, so as to obtain the association and importance ratio after interaction between chapters and tags, thereby obtaining a classification result; as shown in fig. 8, the chapter-label classification task includes the following steps:

the method comprises the following steps: loading the weight of the pre-training language model BERT with the same structure in the pre-adaptation task by the current pre-training language model BERT;

step two: embedding an input text and an input full-scale label together to obtain a shared embedded expression, and inputting the shared embedded expression into a current pre-training language model BERT to obtain a shared characteristic expression, wherein the shared characteristic expression comprises chapter characteristics, keyword characteristics and label characteristics;

step three: the discourse characteristics and the label characteristics form a discourse-label characteristic pair in sequence, and the relevance and importance ratio of each label in the discourse-label characteristic pair characteristic vector is obtained through the Attention structure;

Compared with the conventional [ CLS ] classification method, the chapter-label classification method in the embodiment relies on the label and the adaptive task, can effectively utilize semantic information carried in the label, and simultaneously combines with the Attention structure to enable interaction between chapter features and label features to be more efficient, so that the effect of the model in multi-label classification is greatly improved.

Generally speaking, when people manually classify multiple labels, people usually scan repeatedly whether some keywords in the current text are related to the labels, so as to determine the related labels one by one, and in addition, in the output characteristics of the pre-trained language model BERT, certain noise exists in the word level Token, and the resources are excessively occupied during operation, which can greatly reduce the efficiency and effect of model training and prediction. For the above reasons, as shown in fig. 9, the present embodiment employs a keyword-tag classification task, a chapter feature representation generated by using a pre-training language model BERT, then a word-level feature representation is extracted for K-gram words/phrases, and finally, Num words/phrases most similar to the whole chapter are searched as a keyword feature representation by using cosine similarity.

For Num keyword feature representations obtained in the keyword extraction module, further combining with tag feature representations for classification, as shown in fig. 10, for the keyword feature representations extracted in the keyword extraction module, in this embodiment, dimension reduction and fusion are performed on the keyword feature representations through a convolutional neural network in combination with maximum pooling, and the keyword feature representations and tags are interacted one by one to input an Attention structure so as to obtain the correlation and importance between each tag and the keyword, thereby obtaining a final classification result, wherein the specific steps of the keyword-tag classification task are as follows:

step three: the keyword features obtained by the keyword extraction module and each label feature form a keyword-label feature pair in sequence, and the relevance and importance ratio of each label in the keyword-label feature pair feature vector is obtained through the Attention structure;

The method is derived from a mode of manually classifying multiple labels in life, the key word feature representation is extracted based on cosine similarity, interaction between the key word feature representation and the label feature representation is more efficient by combining an Attention structure, the effect of the model in the process of classifying the multiple labels is greatly improved, and meanwhile, compared with the mode of completely classifying the multiple labels by adopting word-level Token, the method saves resource overhead, accelerates the efficiency in prediction and training and improves the overall performance of the model.

In a multi-label text classification task, learning dependencies between labels helps to model low frequency labels, since real-world classification problems tend to exhibit long-tailed label distributions, where low frequency labels are associated with only a few instances and are very difficult for the model to learn. Meanwhile, through analyzing and discovering a large amount of multi-label text classification data, the dependency relationship among a plurality of labels can assist the prediction of the labels. Based on the method, the dependency relationship between the labels is learned by adopting the label-label correlation judgment task, so that the multi-label classification task is assisted to better utilize the label information to improve the classification effect. As shown in fig. 11, the task of determining tag-tag correlation includes the following steps:

the method comprises the following steps: performing statistical analysis on all data (training set, verification set, and test set) to obtain a ratio of each label to the rest labels, for example, a ratio (label 1: label i) shown in fig. 12;

step three: generating a data subset of the label correlation judgment task according to the proportion obtained in the step one and the label set obtained in the step two; for each tag Y + j (j =0, 1.., Num (relationship _ i +)) in relationship _ i +, a positive sample is generated: [ Y + j, Y + k, 1] (j ≠ k, and the occupancy (Y + j: Y + k) > sets the threshold), negative sample: [ Y + j, Y-k, 0] (ratio (Y + j: Y-k) < set threshold); in this embodiment, the threshold value should be determined according to the distribution of specific labels in the actual data, and should be used as a hyper-parameter, which is continuously adjusted before each model training, and finally, a threshold value most suitable for the actual data is obtained through experiments;

Compared with other traditional deep learning models (such as RNN, LSTM, CNN and the like), the method has essential differences and advantages. From the aspect of feature extraction, the BERT pre-training language model with the self-attention mechanism can well obtain context-dependent bidirectional feature representation. In terms of downstream tasks, the BERT after large-scale data pre-training can be more conveniently merged into the downstream tasks. In terms of model performance, the method adopting the BERT pre-training language model is far higher than the traditional deep learning model and the machine learning method in all indexes. Compared with the common multi-label text classification model (such as BERT + Softmax, RoBERTA + Softmax and the like) based on the pre-training language model, the method has more excellent effect. According to the method, a multi-task learning method combining serial tasks and parallel tasks is adopted, wherein on the serial tasks, the label pre-adaptation task can enable the BERT pre-training language model to generate label feature representation with richer semantic information, and deep information contained in the learned labels further provides help for the multi-label text classification task. Meanwhile, on the parallel task, a chapter-label classification task, a keyword-label classification task and a label-label correlation judgment task are adopted, so that the model can learn the correlation between the text and the label and the dependency between the label and the label in multiple dimensions. Furthermore, due to the parallel multi-task learning method, different tasks can share some parameters and characteristics, and the different tasks can share the information learned by the different tasks in the learning process, so that a better generalization effect is achieved, the risk of overfitting is reduced, and the performance of the model is greatly improved.

The embodiment further provides a multi-label text classification model, which includes: the system comprises a label pre-adaptation module, a shared characteristic acquisition module, a keyword extraction module, a label sampling module, a chapter-label classification module, a keyword-label classification module and a label-label correlation judgment module;

The multi-label text classification method and the multi-label text classification model provided by the embodiment have the following beneficial effects:

the method improves an embedding method based on a pre-training language model, and enables label information to be utilized by the model by uniquely mapping and splicing labels to an embedded representation. Further, in order to enable the label information to be more effectively utilized by the model, the method enables the BERT pre-training language model to adapt to label mapping embedding by adding a serial task (label pre-adaptation task) so as to generate label feature representation with richer semantic information. In addition, by adding parallel tasks (chapter-label classification task, keyword-label classification task, label-label correlation judgment task), the method not only enables the model to learn the correlation between the text and the label and the dependency between the label from multiple dimensions and multiple levels, specifically: the chapter-label classification task aims at learning the relevant relation on the text structure and the full text idea, the keyword-label classification task aims at learning the relevant relation of the keyword dimension according to the habit of human beings in multi-label classification, and the label-label relevance judgment task aims at learning the dependency relation among labels on the label level, continuously optimizes the label characteristic representation, and further assists the effect promotion of the two classification tasks on the whole. Moreover, since sharing feature representations among parallel tasks can greatly reduce the risk of overfitting of each task, intuitively, the more tasks a model learns at the same time, the more difficult it is to find a feature representation containing all tasks, i.e., the less likely it is to overfit the original task. Meanwhile, due to the fact that a plurality of tasks are mutually influenced by simultaneously training a plurality of tasks, the influence is reflected on the shared characteristics, when all the tasks are converged, the current structure is equivalent to the fusion of all the tasks, and therefore the performance of the model is greatly improved. Due to the serial and parallel multi-task learning method, the model has excellent effect.

It should be understood that the above examples are only for clarity of illustration and are not intended to limit the embodiments. Other variations and modifications will be apparent to persons skilled in the art in light of the above description. And are neither required nor exhaustive of all embodiments. And obvious variations or modifications therefrom are within the scope of the invention.

Claims

1. A multi-label text classification method is characterized by comprising the following steps:

a label pre-adaptation task;

s1: expanding input data of a multi-tag text classification, wherein the input data comprises texts and tags;

obtaining a sharing characteristic;

parallel classification tasks;

s6: and utilizing the shared characteristic representation as an input of parallel tasks, wherein the parallel tasks comprise a chapter-label classification task, a keyword-label classification task and a label-label correlation judgment task, the chapter-label classification task and the keyword-label classification task are used for classifying multi-label texts, and the label-label correlation judgment task is used for assisting the chapter-label classification task and the keyword-label classification task to better utilize label information.

2. The method for classifying multi-label text according to claim 1, wherein in S1, the embedding method embeds the label into the input data by using the separator as a separator to obtain the embedded representation.

3. The method of claim 2, wherein the embedding method comprises a mapping method, the mapping method is a word-by-word mapping in the input data, each bit value in the mapping method has a word uniquely corresponding to it in the vocabulary, and the tags are uniquely mapped and spliced into the embedded representation.

4. The method of claim 1, wherein in S1, the input data of the multi-Label Text classification is expanded, and labels Label _ i _1, Label _ i _2, … …, and Label _ i _ n corresponding to each Text data Text _ i are marked as Label _ i +, and n is the number of labels corresponding to the Text data; the remaining labels are labeled Label _ i-, generating a positive sample [ Text _ i, Label _ i +, 1] and negative samples [ Text _ i, Label _ i _ k, 0] for each multi-Label datum, Label _ i _ k ∈ Label _ i-.

5. The method according to claim 1, wherein S5 further includes a feature representation generated after the weights of the pre-trained language model in the tag pre-adaptation task are loaded, where the feature representation includes a text feature representation and a tag feature representation, the text feature representation is processed by a keyword extraction module, and the tag feature representation is processed by a tag sampling module, so as to obtain the shared feature representation.

6. The method for classifying multi-label text according to claim 1, wherein in S6, the chapter-label classification task comprises the steps of:

step two: embedding the text and the full-scale label together to obtain a shared embedded representation, and then inputting the shared embedded representation into the current pre-training language model to obtain a shared characteristic representation, wherein the shared characteristic representation comprises chapter characteristics, keyword characteristics and label characteristics;

step three: the discourse characteristics and the label characteristics form discourse-label characteristic pairs in sequence, and the relevance and importance ratio of each label in the discourse-label characteristic pair characteristic vector is obtained through the attention structure;

7. The method for multi-label text classification according to claim 5, wherein the keyword-label classification task comprises the steps of:

step two: embedding the text and the full label together to obtain a shared embedded expression, and then inputting the shared embedded expression into the current pre-training language model to obtain a shared characteristic expression, wherein the shared characteristic expression comprises chapter characteristics, keyword characteristics and label characteristics;

step three: the keyword features obtained by the keyword extraction module and each label feature form a feature pair in sequence, and then the relevance and importance ratio of each label in the keyword-label feature pair feature vector is obtained through the attention structure;

8. The method for classifying multi-label texts according to claim 5, wherein in S6, the task of determining label-label correlation comprises the steps of:

step three: generating a data subset of a label correlation judgment task according to the proportion obtained in the first step and the label set obtained in the second step; for each tag Y + j, j =0,1, in Relation to relationship i +, Num (relationship i +), a positive sample is generated: [ Y + j, Y + k, 1], j is not equal to k, and the occupation ratio (Y + j: Y + k) > is used for setting a threshold value; negative sample: [ Y + j, Y-k, 0], an occupation ratio (Y + j: Y-k) < set threshold;

9. The method of claim 8, wherein the data comprises training set data, validation set data, and test set data.

10. A multi-label text classification model, comprising: the system comprises a label pre-adaptation module, a shared characteristic acquisition module, a keyword extraction module, a label sampling module, a chapter-label classification module, a keyword-label classification module and a label-label correlation judgment module;

the label pre-adaptation module is used for obtaining the similarity of labels and texts according to input data of multi-label text classification;

the shared characteristic acquisition module is used for acquiring shared characteristic representation according to input data of multi-label text classification, and the shared characteristic representation comprises chapter characteristics, keyword characteristics and label characteristics;

the label sampling module is used for processing label feature representation in feature representation generated after the weight of a pre-training language model in a label pre-adaptation task is loaded;

the discourse-label classification module is used for obtaining discourse-label characteristic pairs according to input data of multi-label text classification and further performing multi-label classification;

the label-label correlation judgment module is used for performing statistical analysis on all data according to input data classified by the multi-label text to obtain a data subset, and further optimizing label feature representation.