CN113076426A

CN113076426A - Multi-label text classification and model training method, device, equipment and storage medium

Info

Publication number: CN113076426A
Application number: CN202110630360.XA
Authority: CN
Inventors: 张倩汶; 闫昭; 曹云波
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2021-06-07
Filing date: 2021-06-07
Publication date: 2021-07-06
Anticipated expiration: 2041-06-07
Also published as: CN113076426B

Abstract

The application provides a multi-label text classification and model training method, a device, equipment and a storage medium, in the process of training a multi-label text classification model, a classifier capable of capturing the correlation among labels is trained based on label prediction characteristics of each label output by the multi-label text classification model, the classifier and the multi-label text classification model are synchronously trained, so that the trained multi-label text classification model can more accurately capture the correlation among labels, the correlation of the labels provides more information bases for determining the labels related to texts, the multi-label text classification model can more accurately identify the labels related to the texts, and the accuracy of determining the labels related to the texts by using the multi-label text classification model is improved.

Description

Multi-label text classification and model training method, device, equipment and storage medium

Technical Field

The invention relates to the technical field of artificial intelligence, in particular to a multi-label text classification and model training method, a device, equipment and a storage medium.

Background

Text classification is widely applied to various fields such as information retrieval, emotion analysis and the like. The text classification is the assignment of the correct label to a given text.

Among them, Multi-Label Text Classification (MLTC) is a common Text Classification method. In the multi-label text classification, each given text may be associated with, i.e., assigned multiple labels. For example, a news article is often rich in semantics, such that the news article may belong to news of both "sports" and "economic" categories, and the news article needs to be labeled with two labels of "economic" and "cultural".

Currently, multi-label text classification is increasingly applied. However, the accuracy of multi-label text classification is generally low, and therefore, how to improve the accuracy of multi-label text classification is a technical problem to be solved by those skilled in the art.

Disclosure of Invention

In view of the above, to solve the above problems, the present invention provides a method, an apparatus, a device and a storage medium for multi-label text classification and model training, so as to improve the accuracy of multi-label text classification.

In order to achieve the above object, in one aspect, the present application provides a text classification model training method, including:

obtaining a plurality of text samples and a label set labeled by the text samples, wherein the label set of the text samples comprises: a plurality of labels labeled with a relevance to the text sample, the text sample comprising a sequence of characters comprised of at least one character;

for each text sample, determining a relevance prediction feature of the text sample and a label prediction feature of each label in the label sequence based on a character sequence of the text sample and a label sequence formed by each label in a label set of the text sample, and by using a network model to be trained, wherein the relevance prediction feature of the text sample is used for representing the relevance of each label in the predicted label sequence of the text sample and the text sample;

for each text sample, selecting at least one label sample group from a label set of the text sample, and determining a prediction related class of the label sample group based on the label prediction characteristics of each label in the label sample group and by using a classifier to be trained, wherein the label sample group comprises at least two labels, and the prediction related class is used for representing whether the correlation of each label in the label sample group is the same or not;

determining a first loss function value of the network model based on the relevance prediction characteristics of each text sample and the relevance of each label in the label sequence of each text sample, wherein the relevance is actually marked by each label;

determining a second loss function value of the classifier based on the actual relevant category and the prediction relevant category of the label sample group of each text sample, wherein the actual relevant category of the label sample group of the text sample represents whether the relevance of the actual label in the label sample group of the text sample is the same or not;

and if the training ending condition is determined not to be reached based on the first loss function value and the second loss function value, adjusting internal parameters of the network model and the classifier, continuing training until the training ending condition is reached, and determining the trained network model as the multi-label text classification model.

In a possible implementation manner, the determining a first loss function value of the network model based on the correlation prediction feature of each text sample and the correlation actually labeled by each label in the label sequence of each text sample includes:

for each text sample, inputting the relevance prediction characteristics of the text sample into a full-connection network layer to be trained to obtain the prediction relevance between each label in the label sequence of the text sample predicted by the full-connection network layer and the text sample;

and determining a first loss function value of the network model based on the predicted relevance and the actually marked relevance of each label in the label sequence of each text sample.

In a possible implementation manner, the determining, based on the character sequence of the text sample and the tag sequence formed by each tag in the tag set of the text sample and by using a network model to be trained, a relevance prediction feature of the text sample and a tag prediction feature of each tag in the tag sequence includes:

constructing an input characteristic sequence based on the character sequence of the text sample and a label sequence formed by all labels in a label set of the text sample; the input feature sequence comprises a character vector sequence, a label vector sequence and a separator arranged in front of the character vector sequence, wherein the character vector sequence consists of character vectors of all characters in the character sequence, and the label vector sequence consists of label vectors of all labels in the label sequence;

and inputting the input feature sequence into a network model to be trained to obtain an output feature sequence output by the network model, wherein the output feature sequence comprises the output features of the separator and the label prediction features of all labels in the label sequence, the output features of the separator are used for representing the correlation prediction features of the text samples, and the network model is a bidirectional coding characterization BERT model based on a converter.

In another aspect, the present application further provides a multi-label text classification method, including:

the method comprises the steps of obtaining a text to be processed and a set label set, wherein the text comprises a character sequence formed by at least one character, and the label set comprises a plurality of labels;

based on the character sequence of the text and the label sequence formed by the labels in the label set, determining the relevance prediction characteristics of the text by utilizing a multi-label text classification model, wherein the relevance prediction characteristics of the text are used for representing the relevance of the labels in the label sequence and the text;

determining a plurality of labels related to the text from the label set based on relevance prediction features of the text;

the multi-label text classification model comprises a network model obtained through multi-task synchronous training;

the multi-task synchronous training comprises the following steps: training the network model by using training character sequences corresponding to a plurality of text samples and training label sequences corresponding to a training label set labeled by the text samples to predict the relevance of the text samples and each label in the training label sequences as a training target; in the process of training the network model, synchronously training a classifier by predicting whether the relevance of each label in at least one label sample group of the text sample is the same as a training target based on the label prediction characteristics of each label in the label sample group;

wherein the training label sequence consists of each label in a training label set; the label sample group of the text sample comprises at least two labels selected from a training label set of the text sample, and the label prediction features of the labels are the label features of the labels predicted by the network model.

In another aspect, the present application further provides a text classification model training apparatus, including:

a sample obtaining unit, configured to obtain a plurality of text samples and a label set labeled by the text samples, where the label set of the text samples includes: a plurality of labels labeled with a relevance to the text sample, the text sample comprising a sequence of characters comprised of at least one character;

a first training unit, configured to determine, for each text sample, a correlation prediction feature of the text sample and a label prediction feature of each label in a label sequence of the text sample based on the character sequence of the text sample and the label sequence formed by each label in the label set of the text sample, and by using a network model to be trained, where the correlation prediction feature of the text sample is used to characterize the correlation between each label in the predicted label sequence of the text sample and the text sample;

the second training unit is used for selecting at least one label sample group from a label set of the text sample for each text sample, determining a prediction related class of the label sample group based on the label prediction characteristics of each label in the label sample group and by using a classifier to be trained, wherein the label sample group comprises at least two labels, and the prediction related class is used for representing whether the correlation of each label in the label sample group is the same or not;

the first loss determining unit is used for determining a first loss function value of the network model based on the relevance prediction characteristics of each text sample and the relevance of each label in the label sequence of each text sample;

a second loss determining unit, configured to determine a second loss function value of the classifier based on an actual relevant category and a prediction relevant category of the tag sample group of each text sample, where the actual relevant category of the tag sample group of the text sample represents whether correlations actually labeled by each tag in the tag sample group of the text sample are the same or not;

and the training control unit is used for adjusting the internal parameters of the network model and the classifier if the training ending condition is not met based on the first loss function value and the second loss function value, continuing training until the training ending condition is met, and determining the trained network model as the multi-label text classification model.

In another aspect, the present application further provides a multi-label text classification apparatus, including:

the information acquisition unit is used for acquiring a text to be processed and a set label set, wherein the text comprises a character sequence consisting of at least one character, and the label set comprises a plurality of labels;

the feature determination unit is used for determining a relevance prediction feature of the text based on a character sequence of the text and a label sequence formed by labels in the label set and by using a multi-label text classification model, wherein the relevance prediction feature of the text is used for representing the relevance of each label in the label sequence and the text; the multi-label text classification model comprises a network model obtained through multi-task synchronous training;

wherein the training label sequence consists of each label in a training label set; the label sample group of the text sample comprises at least two labels selected from the training label set of the text sample, and the label prediction characteristic of the label is the label characteristic of the label predicted by the network model

And the label determining unit is used for determining a plurality of labels related to the text from the label set based on the relevance prediction characteristics of the text.

In yet another aspect, the present application further provides a computer device, including: the system comprises a processor and a memory, wherein the processor and the memory are connected through a communication bus;

the processor is used for calling and executing the program stored in the memory;

the memory is configured to store a program for implementing the text classification model training method as described in any one of the above items or the multi-label text classification method as described above.

In yet another aspect, the present application further provides a computer-readable storage medium, on which a computer program is stored, the computer program being loaded and executed by a processor to implement the text classification model training method as described in any one of the above items, or the multi-label text classification method as described above.

According to the method, the multi-label text classification model is trained based on the character sequences of the text samples and the label sequences corresponding to the label sets labeled by the text samples, so that in the process of training the multi-label text classification model, not only the relation among the characters in the text samples but also the relation among the characters, the labels and the labels can be considered, the classification performance of the trained multi-label text classification model is improved, multiple labels with relevance in the text can be determined more accurately based on the multi-label text classification model, and the accuracy of multi-label text classification is improved.

Meanwhile, in the process of training the multi-label text classification model, the classification model capable of capturing the correlation among labels is trained based on the label prediction characteristics of each label output by the multi-label text classification model, the classifier and the multi-label text classification model are synchronously trained, so that the trained multi-label text classification model can capture the correlation among labels more accurately, the correlation of the labels provides more information basis for determining the labels related to the text, the multi-label text classification model can identify the labels related to the text more accurately, and the accuracy of determining the labels related to the text by using the multi-label text classification model is improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.

Fig. 1 is a schematic diagram illustrating a composition architecture of a scene to which a multi-label classification is applied according to an embodiment of the present disclosure;

FIG. 2 is a schematic diagram illustrating an application principle of the solution according to the embodiment of the present application in an application scenario;

FIG. 3 is a flow chart illustrating a text classification model training method provided by the present application;

FIG. 4 is a schematic framework diagram of an implementation of the text classification model training method provided in the present application;

FIG. 5 is a schematic flow chart illustrating a method for training a text classification model provided by the present application;

FIG. 6 is a flow chart illustrating a multi-label text classification method provided herein;

FIG. 7 is a schematic flow chart diagram illustrating a multi-label text classification method provided by the present application;

FIG. 8 is a schematic diagram illustrating a structure of a text classification model training apparatus provided in the present application;

fig. 9 is a schematic diagram illustrating a structure of a multi-label text classification apparatus provided in the present application;

fig. 10 shows a schematic structural diagram of a computer device provided in the present application.

Detailed Description

The solution of the present application is applicable to any computing platform involving multi-label text classification or training of text classification models for multi-label text classification, which may include one or more computer devices.

The multi-label text classification refers to that the classification of one text corresponds to a plurality of labels. A label of a text is a word that can express the content, semantics, or characteristics of the text. For example, the label of the text may be a category to which the text content belongs or a related attribute, and the like. For example, for an article related to the olympic games, the labels of the text may include "sports" and "economy" labels, because the olympic games are related to not only sports but also economy and the like.

By marking the text with the label, a user can know the text content more quickly, the retrieval efficiency of the text in a search system is improved, or the accuracy of recommending the related text is improved.

The multi-label text classification can be applied to scenes such as text recommendation and text classification, and text advancement, text classification and the like can be more accurately performed by determining a plurality of labels of the text.

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

For ease of understanding, an application scenario to which the scheme of the present application is applicable will be described first.

Fig. 1 is a schematic diagram showing a composition structure of an application scenario to which the solution of the present application is applicable.

The application scenario may include a computing platform 110 and a text classification application platform 120.

Wherein, the computing platform 101 may comprise at least one computer device 111, e.g., the computing platform may be a stand-alone single computer device; the server system may be configured by a plurality of servers, and the computing platform may be a cloud computing platform, for example.

The text classification application platform 120 may include one or more servers 121.

The text classification application platform may be a platform that needs related applications based on multiple labels of text labels, for example, the text classification application platform may be a text recommendation platform, an information extraction platform of a knowledge graph, a text classification platform, and the like.

For example, the application scenario of extracting knowledge graph information is taken as an example. Fig. 2 shows an example application diagram of the application of multi-label classification to knowledge graph construction in this scenario.

As can be seen from fig. 2, the text "user a's wife is user B and their children are user C" is taken as an example.

After the text is obtained by the computing platform, a plurality of tags related to the text may be determined from a predefined set of tags by a multi-tag classification. As shown in fig. 2, the tag set may include: wife, husband, child, sister, brother, grandfather, milky way, secretary, driver, and boss, etc.

And by multi-label text classification, a plurality of labels associated with the text "user a's wife is user B and their children are user C" can be determined from the set of labels, assuming that the plurality of labels associated with the text includes: "wife", "husband" and "child".

On the basis, the text classification application platform performs entity extraction and triple generation based on the three labels of the text, and finally a knowledge graph can be constructed.

It should be noted that fig. 1 illustrates a computing platform for multi-label text classification and a text classification application platform for applying multi-label text classification results as two platforms, but it is understood that in practical applications, the computing platform and the text classification application platform may be the same platform. For example, the text classification platform may classify text based on the plurality of labels the text has after determining the plurality of labels the text has based on the multi-label text classification. Of course, scenes such as text recommendation and knowledge graph construction are also similar, and are not described again.

It is understood that the solution of the present application relates to artificial intelligence and natural language processing. For example, in the process of training the multi-label text classification model, not only natural language processing of text samples but also techniques related to model training, such as machine learning in artificial intelligence, may be involved.

Among them, Artificial Intelligence (AI) is a theory, method, technique and application system that simulates, extends and expands human Intelligence using a digital computer or a machine controlled by a digital computer, senses the environment, acquires knowledge and uses the knowledge to obtain the best result. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.

Natural Language Processing (NLP) is an important direction in the fields of computer science and artificial intelligence. It studies various theories and methods that enable efficient communication between humans and computers using natural language. Natural language processing is a science integrating linguistics, computer science and mathematics. Therefore, the research in this field will involve natural language, i.e. the language that people use everyday, so it is closely related to the research of linguistics. Natural language processing techniques typically include text processing, semantic understanding, machine translation, robotic question and answer, knowledge mapping, and the like.

The following describes an artificial intelligence technique and a natural language processing technique according to the embodiments of the present application with reference to flowcharts.

For ease of understanding, the text classification model training method in the present application is described first.

As shown in fig. 3, which shows a flowchart of the text classification model training method provided in the present application, the method of the present embodiment can be applied to the aforementioned computer device. The method of the embodiment may include:

s301, obtaining a plurality of text samples and a label set labeled by each text sample.

It is to be understood that the text samples are text used to train a text classification model that enables multi-label text classification. The text sample includes a sequence of characters comprised of at least one character. For example, when a text sample is a document or a paragraph in a document, the text sample is a sequence of words in the document or the paragraph.

Wherein the labelset of the text sample comprises: a plurality of labels labeled with relevance to the text sample.

The relevance of the label to the text sample may reflect how relevant the label is to the text.

In one implementation, in the model training phase, the relevance of each label in the label set of the text label can be divided into two types: related and not related. For example, the labeled set of text samples includes: a first set of labels related to a sample of text, and a second set of labels unrelated to the sample of text, the first set of labels comprising a plurality of first labels labeled as related to the sample of text, and the second set of labels comprising a plurality of second labels labeled as unrelated to the text.

It should be noted that, for the purpose of distinguishing from the label set in the subsequent multi-label text classification, the label set of the text sample is also referred to as a training label set.

S302, for each text sample, based on the character sequence of the text sample and the label sequence formed by the labels in the label set of the text sample, and by using the network model to be trained, the relevance prediction characteristics of the text sample and the label prediction characteristics of the labels in the label sequence are determined.

The correlation prediction feature of the text sample is predicted by the network model and can represent the correlation between each label in the label sequence of the text sample and the text sample, so that the correlation prediction feature can be used for representing the correlation between each label in the label sequence of the predicted text sample and the text sample.

The label prediction characteristic of the label is a characteristic representation of the label output by the network model.

The network model may be any neural network model, and the like, which is not limited in the present application.

In one possible implementation, the network model may be a language model, e.g., a transformer-based Bidirectional encoding representation from transforms (BERT) model.

Correspondingly, for each text sample, based on the character sequence and the label sequence, and by using the BERT model to be trained, the correlation prediction feature of the text sample and the label prediction feature of each label in the label sequence can be obtained.

It can be understood that the character sequence and the tag sequence are used as the input text of the BERT model, and the BERT model adopts a multi-layer Transformer (Transformer) to perform bidirectional learning on the input text (such as the character sequence and the tag sequence in the present application), so that the context relationship between the input texts can be learned more accurately, that is, the correlations between characters, between characters and tags, and between tags in the character sequence and the tag sequence can be learned sufficiently, so that the feature representation of each character and tag can be determined more accurately.

S303, aiming at each text sample, at least one label sample group is selected from the label set of the text sample, and the prediction related category of the label sample group is determined by utilizing the classifier to be trained on the basis of the label prediction characteristics of each label in the label sample group.

Wherein each label exemplar set includes at least two labels.

It is to be understood that at least two tags in the set of tag exemplars can both be tags that have the same relevance to the text exemplar or tags that have a different relevance to the text label. In order to improve the classification accuracy of the classifier, a plurality of label sample groups can be selected, and the plurality of label sample groups include: the relevance of each label to the text sample is the same, and the relevance of each label to the text sample is different.

For ease of understanding, the labelsets in text samples include: a first set of labels associated with a text sample and a second set of labels not associated with a text sample are illustrated.

In one possible scenario, the set of label samples may be selected as: selecting at least one first label exemplar set and at least one second label exemplar set from a label set of text exemplars, the first label exemplar set comprising: two first labels from the first label set, the second sample set of labels comprising: a first tag in the first set of tags and a second tag in the second set of tags.

It can be understood that, since the first label set of the text exemplar is the label related to the text exemplar, both of the first labels selected from the first label set are the labels related to the text exemplar, and therefore, the two labels in the first label exemplar group are the labels having the same correlation with the text exemplar, and the two labels belong to the correlated label pair and are the co-occurrence labels that commonly occur in the first label set.

Accordingly, since the two labels in the second label exemplar group are respectively from the first label set and the second label set, one label in the second label exemplar group has a correlation with the text exemplar, and the other label does not have a correlation with the text exemplar, the two labels in the second label exemplar have different correlations with the text exemplar, that is, the two labels belong to unrelated label pairs and naturally also do not belong to co-occurrence labels.

In the above case of determining the label sample group, for each label sample group of the text sample, the label prediction features of the labels in the label sample group determined in step S302 may be obtained, and a classifier may be trained based on the label prediction features of the labels in the label sample groups of the text sample.

In yet another possible scenario, for each text sample, the present application may select a subset of tags from a first set of tags for the text sample, the subset of tags including at least two first tags in the first set of tags. On the basis, at least one label is selected from the labels out of the label subset in the label set, and each selected label and the label subset form a label sample group respectively to obtain at least one label sample group.

For example, after selecting the tag subset from the first tag set, the tags in the first tag set and the tags in the second tag set that do not belong to the tag subset may be respectively combined with the tag subset to form a tag sample group, so as to obtain a plurality of tag sample groups. For example, assuming that a first tag set of a text sample includes tag 1, tag 2 and tag 3, and a second tag set thereof includes tag 4 and tag 5, if tag 1 and tag 2 are selected as a tag subset, the tag subset may form a tag sample group with tag 3, and the tag subset may also form a tag sample group with tag 4, and the tag subset may also form a tag subset with tag 5.

In this possible implementation manner, since the tag subset includes a plurality of tags, an average value of the tag prediction features of the first tags in the tag subset may be determined as the tag prediction feature of the tag subset. On the basis of the label prediction characteristics of the label subset in the label sample group and the label prediction characteristics of the labels outside the label subset in the label sample group, and by using the classifier to be predicted, the prediction related category of the label sample group can be determined.

It is to be understood that, in practical applications, the present application may adopt any one or two combinations of the above two ways of selecting the label sample group as required, and this is not limited.

In the present application, the correlation category of the tag sample group is used to characterize whether the correlation of each tag in the tag sample group is the same. It is understood that whether each label in a sample group of labels is related to the same text may include two cases, one is related to the same text, and the other is related to different text. In the present application, the predicted relevant class is a relevant class predicted by the classifier and used for characterizing whether the relevance of each label in the label sample is the same or not.

The classifier can be a two-classifier, and the specific form of the classifier can be various possibilities, which is not limited in the present application.

S304, determining a first loss function value of the network model based on the relevance prediction features of the text samples and the relevance of the actual label in the label sequence of the text samples.

It is understood that the correlation prediction feature of the text sample may be a correlation between each tag in the predicted tag sequence of the text sample and the text sample, and on this basis, in the case that the correlation between each tag actually labeled in the tag sequence of the text sample and the text sample is known, the loss function of the network model may be determined based on the set loss function.

The loss function value of the network model is used for representing the accuracy of the network model prediction correlation prediction characteristics, and is also a basis for measuring whether the network model reaches the training target or not. For ease of distinction, the loss function values of the network model are referred to herein as first loss function values.

In a possible implementation manner, in order to more intuitively embody the correlation between each label in the label sequence predicted by the network model and the text sample, for each text sample, the correlation prediction feature of the text sample may be further input to the fully-connected network layer to be trained, so as to obtain the prediction correlation between each label in the label sequence of the text sample predicted by the fully-connected network layer and the text sample. On the basis of the prediction correlation and the actual labeling correlation of each label in the label sequence of each text sample, a first loss function value of the network model can be determined.

It will be appreciated that there are many possibilities for determining the specific form of the loss function of the network model, which is not limited by the present application.

S305, determining a second loss function value of the classifier based on the actual related class and the predicted related class of the tag sample group of each text sample.

And the actual relevant category of the label sample group of the text sample represents whether the actual labeled relevance of each label in the label sample group of the text sample is the same.

For example, in a case that the label set includes the aforementioned first label set and second label for each text sample, if each label in the label sample group is from the first label set of the text sample, since each label in the first label set has a correlation with the text sample, the actual correlation category of the label sample group is the same as the correlation, that is, the correlation actually labeled by each label in the label sample group is the same.

According to the prediction related category of the label sample group of the text sample predicted by the classifier and the actual related category of the label sample group, a loss function value used for reflecting the prediction accuracy of the classifier can be determined. For ease of distinction, the loss function value of this classifier is referred to as the second loss function value.

The loss function value of the classifier can be calculated according to the loss function set by the classifier, and the specific form of the loss function of the classifier is not limited in the application.

Of course, the loss function value used to characterize the prediction accuracy of the classifier can also be determined in other ways.

S306, if the training end condition is determined not to be reached based on the first loss function value and the second loss function value, adjusting internal parameters of the network model and the classifier, continuing training until the training end condition is reached, and determining the trained network model as the multi-label text classification model.

If the first loss function value and the second loss function value converge, it is determined that the training end condition is reached.

For example, the first loss function value and the second loss function value do not converge, but the number of times of the loop training reaches the set number of times, and it may be considered that the set training end condition is reached.

According to the method and the device, the multi-label text classification model is trained based on the character sequence of the text sample and the label sequence corresponding to the label set labeled by the text sample, therefore, in the process of training the multi-label text classification model, not only the relation among all characters in the text sample can be considered, but also the relation among the characters, the labels and the labels can be considered, the classification performance of the trained multi-label text classification model is further facilitated to be improved, a plurality of labels with relevance of the text can be more accurately determined based on the multi-label text classification model, and the accuracy of multi-label text classification is improved.

Meanwhile, in the process of training the multi-label text classification model, a classifier capable of capturing the correlation among labels is trained based on the label prediction characteristics of each label output by the multi-label text classification model, the classifier and the multi-label text classification model are synchronously trained, so that the trained multi-label text classification model can capture the correlation among labels more accurately, the correlation of the labels provides more information basis for determining the labels related to the text, the multi-label text classification model can identify the labels related to the text more accurately, and the accuracy of determining the labels related to the text by using the multi-label text classification model is improved.

For the convenience of understanding, the scheme of the present application is described below by taking a network model as a BERT model, that is, a multi-label text classification model obtained by training based on the BERT model as an example.

Fig. 4 is a schematic diagram of a framework of an implementation principle of training a multi-label text classification model in the present application.

In fig. 1, the input text of the BERT model is composed of a character sequence and a tag sequence.

Because the input of the BERT model is the feature vector, the method and the device can firstly convert each character in the character sequence into the character vector to obtain the character vector sequence, and convert each label in the label sequence corresponding to the text sample into the label vector to obtain the label vector sequence.

On the basis, the input-based feature sequence of the BERT model comprises the following steps:

character vector sequence corresponding to text sample

，

The total number of characters in the character sequence;

labeling label vector sequence corresponding to label set of text sample

The total number of labels in the label set of the text sample;

meanwhile, the input characteristic sequence is inserted with a separator [ CLS ] at the front end of the character sequence]And in order to separate the character sequence and the label sequence, the input characteristic sequence also comprises separators [ SEP ] arranged at two ends of the label sequence]. Thus, the input signature sequence is:

。

based on the above, [ CL ] output by BERT modelS]Output characteristics corresponding to position

The method can be input into a task of multi-label text classification to predict whether each label in the label sequence is related to the text sample.

Label prediction features for individual labels output by the BERT model

The method and the device can form a label pair, and perform label pair co-occurrence prediction based on the label prediction characteristics of the label pair to judge whether the label pair formed by the two labels has the same correlation with the text sample, and the part corresponds to the task processing of the label pair co-occurrence prediction in fig. 4.

Meanwhile, label prediction characteristics of each label output by the BERT model

The application also selects a subset of tags from the plurality of tags

Then, the tags are sub-set

Forming a label sample group with the rest labels, and predicting the label subset based on the label prediction characteristics of each label in the label subset and the labels combined with the label subset

And the tag subset

Whether the combined label belongs to both the first set of labels or the second set of labels, i.e. performing conditional label co-occurrence prediction in fig. 4.

The multi-label classification task is combined with the label pair co-occurrence prediction task and the condition label co-occurrence prediction task, and the performance of the correlation of the BERT model identification labels is effectively improved.

For ease of understanding, the text classification model training method of the present application is described below with reference to fig. 4, and taking an implementation case as an example.

As shown in fig. 5, which shows another flow chart of the text classification model training method of the present application, the method of this embodiment may include:

s501, obtaining a plurality of text samples and label sets labeled by the text samples.

The labelset of the text sample comprises: a first set of tags and a second set of tags, the first set of tags comprising: a plurality of first labels labeled as being related to the text sample, the second set of labels comprising: a plurality of second labels labeled as irrelevant to the text sample.

S502, aiming at each text sample, constructing an input feature sequence based on the character sequence of the text sample and the label sequence formed by the labels in the label set of the text sample.

The input characteristic sequence comprises a character vector sequence corresponding to the character sequence and a label vector sequence corresponding to the label sequence, and the input characteristic sequence comprises separators arranged in front of the character vector sequence and separators arranged at two ends of the label vector sequence.

The character vector sequence is a sequence formed by character vectors of all characters in the character sequence, and the label vector sequence is a sequence formed by label vectors of all labels in the label sequence.

Here, the separator is a symbol having no explicit semantic information, and the separator is used to separate two sequences (sentences) before and after.

For example, taking the text "wife of user a is user B" as an example, the text corresponds to the character sequence: user, a, old, yes, user, B, thus, a vector is determined that would connect each word in the character sequence, the word vector for each word can be represented as in fig. 4

，

Is from 1 to

Of natural number, in this example

Is 10. Correspondingly, the vectors of each word

And sequentially forming a sequence to obtain a character vector sequence.

And assume that the label sequence corresponding to the label set of the text label is: wife, husband, son, father, mother, colleague, and boss need to sequentially convert the vectors of each tag in the tag sequence, as shown in fig. 4

，

Is from 1 to

Since there are 7 tags in the present example, the number of the tags in the natural number (2) is not limited to the number of the tags in the present example

Is 7. Correspondingly, vectors of all labels are sequentially arranged

And forming a sequence, thereby obtaining a label vector sequence.

On the basis, a separator [ CLS ] is inserted into the forefront of the two vector sequences, and separators [ SEP ] are respectively inserted into the front end and the rear end of the label vector sequence, so that an input feature vector can be formed. As shown in particular in fig. 4.

And S503, inputting the input characteristic sequence into a BERT model to be trained to obtain an output characteristic sequence output by the BERT model.

Wherein the output feature sequence comprises the output features of the separators and the tag prediction features of the tags in the tag sequence. Of course, the output feature sequence also includes character prediction features of each character in the character sequence and output features corresponding to each segmenter.

As shown in fig. 4, the output feature sequence output by the BERT model includes: separator [ CLS]Corresponding output features (i.e., feature vector representation)

(ii) a Each character

Corresponding character prediction features

(character)

Character vector of

Output vector of corresponding position); each label

Corresponding tag prediction feature

(of characters)

Character vector

Output vector of corresponding position) and, a separator [ SEP ]]Corresponding output characteristics

。

It can be understood that the BERT model can more comprehensively analyze the relationship between characters and labels in the text sample based on the context relationship between objects (such as vectors of characters and vectors of labels) in the input feature sequence, so that each vector in the output feature sequence can more accurately express the feature corresponding to the corresponding position symbol.

It can be understood that, since the separator is a symbol without explicit semantic information, and this symbol can more fairly fuse the semantic information of each object (word/word) in the input feature sequence, the output vector corresponding to the separator output by the BERT model can be used as the semantic representation of the input feature sequence. Based on this, the output characteristics of the separator of the present application

The correlation prediction characteristic used for representing the text sample represents the correlation between each label in the label sequence and the text sample through the output characteristic of the separator.

S504, aiming at each text sample, predicting the relevance of the text sample into the characteristics

And inputting the predicted labels into a full-connection network layer to be trained to obtain the predicted correlation between each label in the label sequence of the text sample predicted by the full-connection network layer and the text sample.

The predicted correlation refers to the correlation between each label in the label sequence and a text sample predicted by a BERT model and a full-connection network layer. It will be appreciated that the predicted relevance may be a probability of relevance of the tag to the text sample.

For example, the fully-connected network layer may output a predicted correlation probability vector including a correlation probability that each tag in the sequence of tags is correlated with the text sample. The fully-connected network layer may determine the correlation probability vector P by the following formula one:

(formula one)

Wherein the content of the first and second substances,

is an offset vector in the fully-connected network layer, which is determined through training and has the length of 1.

For parameters in the fully connected network layer that need to be adjusted continuously during the training process,

the dimension of the matrix is

，

The total number of labels in the label set for a text sample.

To represent

Will follow

And continuously adjusting.

Of course, the prediction correlation may also represent whether the category is correlated, for example, if the prediction correlation corresponds to the correlated category, the value corresponding to the prediction correlation is 1; if the corresponding category is not relevant, the value of the prediction relevance is 0.

It should be noted that, in this embodiment, the prediction correlation between each label in the label sequence and the text sample is determined by inputting the correlation prediction features corresponding to the text samples output by the BERT model into the fully-connected network model, but it may be understood that, in practical applications, the prediction correlation between each label and the text sample may also be determined by using a classifier or other manners based on the correlation prediction features output by the BERT model, which is not limited to this.

S505, based on the prediction correlation and the actual labeling correlation of each label in the label sequence of each text sample, determining a first loss function value of the BERT model, and jumping to step S512.

In this embodiment, the first loss function value of the BERT model is actually a loss function value of a model composed of the BERT model and a fully-connected network layer, or a loss function value of a multi-label text classification.

For example, the first loss function value may be determined by the cross entropy loss function formula as shown in equation two below

：

(formula two);

wherein the content of the first and second substances,

is a label

The predicted relevance to the text sample,

is a label

The relevance of the actual annotation to the text sample, wherein,

is 0 or1, when tag 1 is marked as being associated with a text sample, then

Is 1; if not, then,

is 0.

It should be noted that the second formula is only a loss function for determining the accuracy of the multi-label text classification, and in practical applications, the form of the loss function may have other possibilities, which is not limited to this.

S506, at least one first label sample group and at least one second label sample group are selected from the label set of the text samples.

Wherein the first label exemplar set comprises: two first labels from the first set of labels, the second sample set of labels comprising: a first tag in a first set of tags and a second tag in the second set of tags.

Each label exemplar set selected in this step S506 is actually a label pair composed of two labels, wherein the first label exemplar set includes: two labels marked as related to the text sample, namely a first label sample group is a related label pair; and the second label exemplar set comprises: two labels, i.e., unrelated label pairs, labeled as unrelated to the text sample.

It is understood that, in practical applications, the number of the first label sample group and the second label sample group can be set according to requirements, for example, in an alternative manner, the ratio of the number of the first label sample group to the number of the second label sample group can be 1: 2.

And S507, aiming at each label sample group in the first label sample group and the second label sample group, determining a first prediction related category of the label sample group based on the label prediction characteristics of each label in the label sample group and by using a first classifier to be trained.

The first predictive relevance class is used for characterizing whether the relevance of each label in the label sample group is the same. For the convenience of distinction, the prediction related class predicted based on the first classifier by determining the label sample group in the manner of steps S506 to S507 is referred to as a first prediction related class.

For example, since the first set of label exemplars is two correlated label pairs and the second set of label exemplars is two uncorrelated label pairs, the first predictive correlation category may include: related (or related tag pairs), and unrelated (or unrelated tag pairs).

The above steps S506 to S507 correspond to the task processing part of the label pair co-occurrence prediction in fig. 4, for example, for any one of the first label exemplar group and the second label exemplar group, there may be a label in the label exemplar group

And a label

The composition of the components, wherein,

and

are any natural number from 1 to n, and

and

different.

Accordingly, the label can be attached

Signature predictive feature of

And a label

Signature predictive feature of

Input into the first classifier so that the label predicted by the first classifier

And a label

Whether it belongs to a relevant pair of tags. E.g. labels predicted by the first classifier

And a label

Probability of belonging to related label pair

。

S508, determining the second loss function value of the first classifier based on the actual related class and the first prediction related class of the tag sample group of each text sample, and proceeding to step S512.

For a text sample, the actual related category of the label sample group represents whether the actually labeled relevance of each label in the label sample group of the text sample is the same. The actual correlation category is determined by the correlation of the actual labels in the sample set of labels.

For example, for the first sample group of labels, since both labels in the first sample group of labels are from the first set of labels associated with the text sample, both labels in the first sample group of labels are associated with the text sample, i.e., the actual associated category of the first sample group of labels is associated (or belongs to a pair of associated labels). Similarly, the actual relevant category of the second sample set of labels is irrelevant (or belongs to a irrelevant label pair)

On the basis, for each label sample group of each text sample, based on the actual relevant category and the predicted first prediction relevant category, whether the first classifier accurately predicts the relevant category to which the label sample group belongs can be determined.

It will be appreciated that the loss function value of the first classifier is used to characterize the accuracy with which the first classifier predicts the prediction-related class of the sample set of labels. To facilitate differentiation from the loss function values of the multi-label classification task, the loss function values corresponding to the label-pair co-occurrence prediction and condition-label co-occurrence prediction tasks are referred to as second loss function values.

It is understood that there are many possibilities for determining the loss function of the second loss function value of the first classifier, which are similar to the above formula two and will not be described herein again.

S509, for each text sample, selecting a tag subset from the first tag set of the text sample, selecting at least one tag from the tags in the tag set except the tag subset, and combining each selected tag with the tag subset to form a tag sample group, so as to obtain at least one tag sample group.

Wherein the subset of tags includes at least two first tags in the first set of tags.

As shown in fig. 4, for a certain text sample, it is assumed that a first set of labels comprises at least label 1, label 3 and label n related to the text sample, and a second set of labels comprises at least label 2 and label 4 not related to the text sample.

In FIG. 4, tag 1 and tag n are selected as the tag subset

. Accordingly, the tag subset

Can be combined with the labels 3 in the first label set into a label sample group, label subset

Or the label sample group and the labels 2 in the second label set can be formed into a label sample group, and the labels in the second label set are in the subset

It is also possible to form a sample set of labels with labels 4 in the second set of labels.

And S510, aiming at each label sample group of the text sample, determining the average value of the label prediction characteristics of the first labels in the label subset in the label sample group as the label prediction characteristics of the label subset, and determining a second prediction related category of the label sample group by using a second classifier to be predicted based on the label prediction characteristics of the label subset in the label sample group and the label prediction characteristics of the labels except the label subset in the label sample group.

And the second prediction correlation category is used for representing whether the correlation of each label in the label sample group predicted by the second classifier is the same or not. For example, the second prediction related category may include: a related category (or co-occurrence label), and a non-related category (or non-co-occurrence label). If the second prediction correlation category is a correlation label (or a co-occurrence label), indicating that all labels in the predicted label sample group belong to the co-occurrence labels appearing in the first label set; similarly, if the second predicted correlation category is a non-correlation category, it indicates that all the labels in the label sample group are predicted to belong to the co-occurrence labels co-occurring in the first label set.

For the sake of easy distinction, the prediction related category predicted by determining the label sample group through steps S509 to S510 and based on the second classifier is referred to as a second prediction related category.

The above steps S509 to S510 correspond to the task processing of the condition label co-occurrence prediction.

Such as is still described in connection with fig. 4 and the above example.

For a subset of tags

The tag prediction feature of (a) may be an average of the tag prediction feature of tag 1 and the tag prediction feature of tag n.

Accordingly, for a subset of tags

A sample set of labels, formed with label 2, that can be sub-set

Tag prediction feature of (1) and tag prediction feature of tag 2

And inputting the labels into a second classifier to obtain whether each label and the label 2 in the label subset predicted by the second classifier belong to labels related to the text sample, namely whether the labels belong to labels co-occurring in the first label set. E.g., a subset of tags predicted by the second classifier

Probability of belonging to related label pair with label 2

。

For a subset of tags

The process of the label sample group combined with the label 3 or the label 4 is similar, and the description is omitted.

S511, determining a second loss function value of the second classifier based on the actual related class and the second prediction related class of the tag sample group of each text sample.

This step S511 is similar to the previous step S508, and is not described herein again.

S512, if the training ending condition is determined not to be met based on the first loss function value, the second loss function value of the first classifier and the second loss function of the second classifier, if so, ending the training, and determining the trained BERT model and the fully-connected network layer as a multi-label text classification model; if not, adjusting internal parameters of the BERT model, the full-connection network layer, the first classifier and the second classifier, returning to the step S502 and continuing training until a training end condition is reached, and taking the BERT model and the full-connection network model obtained by training as a multi-label text classification model.

If the first loss function, the second loss function of the first classifier, and the second loss function of the second classifier converge, it is determined that the training end condition is reached.

For another example, it may be detected whether the number of training cycles reaches the set number, and if so, it may be determined to reach the training end condition based on the loss function values. In this case, if the training end condition is not satisfied currently, the number of training cycles is increased by one, and after adjusting the parameters of each model, the process returns to step S502 to be executed.

For example, in one example, a composite loss function is determined based on a first loss function value, a second loss function value of a first classifier, and a second loss function of a second classifier.

As explained in connection with fig. 4, the composite loss function can be expressed as the following equation three:

(formula three);

wherein the content of the first and second substances,

the value of the first loss function is expressed,

is the second loss function value of the first classifier,

is the second loss function value of the second classifier.

Can be 0, 1 or any value between 0 and 1.

It is understood that the embodiment is described by taking an example of determining the second loss function of the first classifier and the second loss function of the second classifier at the same time, but it is understood that the tag-pair co-occurrence prediction and the conditional tag co-occurrence prediction are both based on analyzing whether the correlations between at least two tags and the text sample are the same, i.e. whether there is a correlation between the tags, so the present application may also perform only the tag-pair co-occurrence prediction (i.e. steps S506 to S508), or perform only the operation of the conditional tag co-occurrence prediction (i.e. steps S509 to S511). Thus, by setting

The value of (a) may realize that the synthetic loss function may include a second loss function corresponding to one or both of the first classifier and the second classifier.

In the embodiments of fig. 4 and 5, in the process of training the multi-label text classification model, the co-occurrence relationship between the labels is comprehensively captured through one or two of the co-occurrence prediction task of the labels and the co-occurrence prediction task of the condition labels, and the co-occurrence relationship of the labels is one of important signals capable of accurately reflecting the relevance of the labels and can be obtained without additional manual labeling.

After the multi-label text classification model is obtained through training in any one of the above embodiments, the multi-label text classification model can be used for multi-label classification of the text.

The multi-label text classification method in the present application is described below.

As shown in fig. 6, which shows a flowchart of an implementation of the multi-label text classification method provided in the present application, this embodiment may also be applied to the aforementioned computer device, and the method of this embodiment includes:

s601, obtaining a text to be processed and a set tag set.

Wherein the text comprises a sequence of characters consisting of at least one character. The text to be processed is the text of which a plurality of labels related to the text need to be determined.

The set of tags includes a plurality of tags. The set of labels is a set that is preset and includes a plurality of labels that may be related to different texts, and therefore, the application needs to determine a plurality of labels related to the text from the set of labels.

As shown in fig. 2, for the text "user a's wife is user B and their child is user C" is a pending text, and the set of labels includes, in addition to the labels that may be associated with the text: in addition to wife, husband, and children, other labels that may be related to text, such as boss, driver, and grandmother, etc., are also included.

S602, based on the character sequence of the text and the label sequence formed by the labels in the label set, and by using a multi-label text classification model, determining the relevance prediction characteristics of the text.

The relevance prediction feature of the text is predicted by a multi-label text classification model and is used for representing the relevance of each label in the label sequence and the text.

The multi-label text classification model comprises a network model obtained through multi-task synchronous training.

The multi-task synchronous training comprises the following steps: training the network model by using training character sequences corresponding to a plurality of text samples and training label sequences corresponding to a training label set labeled by the text samples to predict the relevance of the text samples and each label in the training label sequences as a training target; and in the process of training the network model, based on the label prediction characteristics of each label in at least one label sample group of the text sample, predicting whether the relevance of each label in the label sample group is the same as a training target, and synchronously training the classifier.

Wherein, the training label sequence is composed of each label in the training label set. The label sample group of the text sample comprises at least two labels selected from the training label set of the text sample, and the label prediction characteristic of the label is the label characteristic of the label predicted by the network model.

It can be understood that, in order to distinguish from a set label set, in the present application, a character sequence corresponding to a text sample used for training a multi-label text classification model is referred to as a training character sequence, a label set labeled by the text sample is referred to as a training label set, and a label sequence corresponding to the training label set is referred to as a training label sequence.

For a specific process of training the multi-label text classification model, reference may be made to the related description of the foregoing embodiment, which is not described herein again.

In one possible scenario, the multi-label text classification model includes only the network model, which, as previously described, may have a variety of possibilities. As a preferred approach, the network model may be a BERT model.

In yet another possible case, the multi-label text classification model may include a network model and a fully-connected network model, and a training portion of the multi-label text classification model is specifically visible and is not described herein again.

S603, based on the relevance prediction features of the text, a plurality of labels relevant to the text are determined from the label set.

For example, the relevance between each label in the label set and the text is determined based on the relevance prediction features, and a plurality of labels relevant to the text are selected based on the relevance between each label and the text.

In a possible implementation manner, the multi-label text classification model further includes a fully-connected network layer, and accordingly, the relevance prediction feature of the text can be input into the trained fully-connected network layer, so as to obtain the prediction relevance between each label in the label sequence and the text. For example, the predicted relevance may be a relevance probability that characterizes how relevant the label is to the text, or a classification result that characterizes whether the label is relevant to the text.

Accordingly, a plurality of labels in the set of labels that are associated with the text are determined based on the predicted association of each label in the sequence of labels with the text. If the predicted correlation is a correlation probability, the label with the correlation probability greater than the set threshold in the label set may be determined as the label related to the text.

It can be understood that, in combination with the training process of the multi-label text classification model, the multi-label text classification model can have a relatively strong capability of capturing the label correlation, so that the text-related label can be more accurately determined by using the multi-label text classification model.

Meanwhile, the relevance prediction characteristics of the text can be predicted based on the character sequence of the text and the label sequence formed by each label in the label set, so that the multi-label text classification model can analyze the relation between all characters in the text and also analyze the relevance between each label in the label set and characters in the label and the text, the relevance between each label in the label set and the text can be determined more accurately, and the accuracy of multi-label text classification can be further improved.

In the following, a network model in the multi-label text classification model is taken as a BERT model as an example. Meanwhile, for the convenience of understanding, the multi-label text classification model including the BERT model and the fully connected network layer is taken as an example for illustration. As shown in fig. 7, which shows another flow diagram of the multi-label text classification method provided in the present application, the method of this embodiment may include:

s701, obtaining a text to be processed and a set tag set.

Wherein the text comprises a sequence of characters consisting of at least one character.

S702, constructing an input characteristic sequence of the text based on the character sequence of the text and the label sequence formed by the labels in the label set.

The input feature sequence of the text comprises a character vector sequence, a label vector sequence and a separator arranged before the character vector sequence. The character vector sequence is composed of character vectors of all characters in the character sequence, and the label vector sequence is composed of label vectors of all labels in the label sequence; the character vector of the character and the label vector of the label may be determined by a word vector technique, and the like, which is not limited to this.

It can be understood that the process of constructing the input feature sequence of the text is similar to the process of constructing the input feature sequence of the text sample, which can be specifically referred to the related description of the foregoing embodiment, and is not repeated herein.

And S703, inputting the input feature sequence of the text into a BERT model of the multi-label text classification model to obtain the output feature of the separator output by the BERT model.

The output features of the separator are used to represent relevance prediction features of the text.

S704, inputting the relevance prediction features of the text into a fully-connected network layer of the multi-label text classification model to obtain relevance probabilities of each label in the label sequence output by the fully-connected network layer and the text.

S705, a plurality of labels of which the relevance probability with the text in the label set is larger than a set threshold value are determined as labels relevant to the text.

It should be noted that, steps S704 and S705 are described by taking the input of the relevance prediction feature to the fully-connected network layer as an example, and taking the output of the relevance probability by the fully-connected network layer as an example, but it is understood that if the relevance prediction feature output based on the BERT model is used, a plurality of labels for determining text relevance by other ways are also applicable to the present embodiment.

It can be understood that, because the BERT model can more accurately and comprehensively capture the correlations between the labels, between the characters, and between the characters and the labels in the input feature sequence, it is more favorable for accurately determining the correlation features of the texts corresponding to the character sequences and each label in the label set, and thus the accuracy of multi-label text classification can be further improved.

In another aspect, the present application further provides a text classification model training apparatus corresponding to the text classification model training method of the present application.

As shown in fig. 8, which shows a schematic structural diagram of a component of the training apparatus for text classification model of the present application, the apparatus of this embodiment may include:

a sample obtaining unit 801, configured to obtain a plurality of text samples and a label set labeled by the text samples, where the label set of a text sample includes: a plurality of labels labeled with a relevance to the text sample, the text sample comprising a sequence of characters consisting of at least one character;

a first training unit 802, configured to determine, for each text sample, a correlation prediction feature of the text sample and a label prediction feature of each label in the label sequence based on a character sequence of the text sample and a label sequence formed by each label in a label set of the text sample, and by using a network model to be trained, where the correlation prediction feature of the text sample is used to characterize the correlation between each predicted label in the label sequence of the text sample and the text sample;

a second training unit 803, configured to select, for each text sample, at least one label sample group from a label set of the text sample, determine a prediction related class of the label sample group based on a label prediction feature of each label in the label sample group and using a classifier to be trained, where the label sample group includes at least two labels, and the prediction related class is used to characterize whether the correlations of the labels in the label sample group are the same;

a first loss determining unit 804, configured to determine a first loss function value of the network model based on the correlation prediction features of each text sample and the correlation actually labeled by each label in the label sequence of each text sample;

a second loss determining unit 805, configured to determine a second loss function value of the classifier based on an actual relevant category and a prediction relevant category of the tag sample group of each text sample, where the actual relevant category of the tag sample group of the text sample represents whether the actually labeled relevance of each tag in the tag sample group of the text sample is the same;

and the training control unit 806 is configured to, if it is determined based on the first loss function value and the second loss function value that the training end condition has not been reached, adjust internal parameters of the network model and the classifier, continue training until the training end condition is reached, and determine the trained network model as the multi-label text classification model.

In one possible implementation, the first loss determining unit includes:

the correlation processing subunit is used for inputting the correlation prediction characteristics of the text samples into a fully-connected network layer to be trained aiming at each text sample to obtain the prediction correlation between each label in the label sequence of the text sample predicted by the fully-connected network layer and the text sample;

and the first loss determining subunit is used for determining a first loss function value of the network model based on the prediction correlation and the actual labeling correlation of each label in the label sequence of each text sample.

In yet another possible implementation manner, the first training unit includes:

the sequence construction subunit is used for constructing an input characteristic sequence based on the character sequence of the text sample and a label sequence formed by all labels in the label set of the text sample; the input feature sequence comprises a character vector sequence, a label vector sequence and a separator arranged in front of the character vector sequence, wherein the character vector sequence consists of character vectors of all characters in the character sequence, and the label vector sequence consists of label vectors of all labels in the label sequence;

and the first training subunit is used for inputting the input feature sequence into a network model to be trained to obtain an output feature sequence output by the network model, the output feature sequence comprises the output features of the separator and the label prediction features of the labels in the label sequence, the output features of the separator are used for representing the correlation prediction features of the text samples, and the network model is a bidirectional coding characterization BERT model based on a transformer.

In yet another possible implementation manner, the tag set of the text sample obtained by the sample obtaining unit includes: a first set of tags and a second set of tags, the first set of tags comprising: a plurality of first labels labeled as being related to the text sample, the second set of labels comprising: a plurality of second labels labeled as irrelevant to the text sample;

when at least one label sample group is selected from the label set of the text sample, the second training unit is specifically configured to:

selecting at least one first label exemplar set and at least one second label exemplar set from the label set of the text exemplar, the first label exemplar set comprising: two first labels from the first label set, the second sample set of labels comprising: a first tag in the first set of tags and a second tag in the second set of tags.

a second training unit comprising:

a subset selection subunit, configured to select, for each text sample, a tag subset from a first tag set of the text sample, where the tag subset includes at least two first tags in the first tag set;

the tag combination subunit is used for selecting at least one tag from the tags outside the tag subset in the tag set, and combining each selected tag with the tag subset to form a tag sample group to obtain at least one tag sample group;

a feature determining subunit, configured to determine, as the tag prediction feature of the tag subset, an average value of the tag prediction features of each first tag in the tag subset in the tag sample group;

and the second training subunit is used for determining the prediction related category of the label sample group by utilizing the classifier to be predicted based on the label prediction characteristics of the label subset in the label sample group and the label prediction characteristics of the labels except the label subset in the label sample group.

In another aspect, the application further provides a multi-label text classification device.

As shown in fig. 9, which shows a schematic structural diagram of a multi-label text classification apparatus according to the present application, the apparatus may include:

an information obtaining unit 901, configured to obtain a text to be processed and a set tag set, where the text includes a character sequence composed of at least one character, and the tag set includes multiple tags;

a feature determining unit 902, configured to determine, based on a character sequence of the text and a tag sequence formed by tags in a tag set, a relevance prediction feature of the text by using a multi-tag text classification model, where the relevance prediction feature of the text is used to characterize relevance between each tag in the tag sequence and the text; the multi-label text classification model comprises a network model obtained through multi-task synchronous training;

the multi-task synchronous training comprises the following steps: training character sequences corresponding to a plurality of text samples and training label sequences corresponding to a training label set labeled by the text samples are utilized to predict the relevance of the text samples and each label in the training label sequences as a training target training network model; in the process of training the network model, based on the label prediction characteristics of each label in at least one label sample group of the text sample, predicting whether the relevance of each label in the label sample group is the same as a training target, and synchronously training a classifier;

wherein, the training label sequence is composed of all labels in a training label set; the label sample group of the text sample comprises at least two labels selected from the training label set of the text sample, and the label prediction characteristic of the label is the label characteristic of the label predicted by the network model

A label determining unit 903, configured to determine, based on the relevance prediction feature of the text, a plurality of labels relevant to the text from the label set.

In one possible implementation, the multi-label text classification model further includes: a fully connected neural network layer;

the tag determination unit includes:

the relevance determining subunit is used for inputting the relevance prediction characteristics of the text into a fully-connected network layer to obtain the prediction relevance of each label in the label sequence and the text;

and the label determining subunit is used for determining a plurality of labels in the label set related to the text based on the predicted relevance of each label in the label sequence and the text.

In another possible implementation manner, the feature determining unit includes:

a sequence construction subunit, configured to construct an input feature sequence of the text based on a character sequence of the text and a tag sequence formed by tags in the tag set, where the input feature sequence includes a character vector sequence, a tag vector sequence, and a separator arranged before the character vector sequence, the character vector sequence is formed by character vectors of characters in the character sequence, and the tag vector sequence is formed by tag vectors of tags in the tag sequence;

and the characteristic determining subunit is used for inputting the input characteristic sequence of the text into the multi-label text classification model to obtain the output characteristic of the separator output by the multi-label text classification model, wherein the output characteristic of the separator is used for representing the relevance prediction characteristic of the text, and a network model in the multi-label text classification model is a bidirectional coding representation BERT model based on a transformer.

As shown in fig. 10, a block diagram of an implementation manner of a computer device provided in the embodiment of the present application, where the computer device may be the aforementioned terminal, and the computer device may include:

a memory 1001 for storing a program;

the processor 1002 is configured to invoke and execute a program stored in the memory, where the program is specifically configured to execute the text classification model training method or the multi-label text classification method provided in any one of the foregoing embodiments.

The processor 1002 may be a central processing unit CPU or an Application Specific Integrated Circuit (ASIC).

The computer device may further include a communication interface 1003 and a communication bus 1004, wherein the memory 1001, the processor 1002 and the communication interface 1003 communicate with each other through the communication bus 1004.

The embodiment of the present application further provides a computer-readable storage medium, where a computer program is stored, and the computer program is loaded and executed by a processor to implement each step of the text classification model training method or the multi-label text classification method.

The present application also proposes a computer program product or a computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instruction from the computer-readable storage medium, and executes the computer instruction, so that the computer device executes the methods provided in various optional implementation manners in the aspect of the text classification model training method or the multi-label text classification method, or in the aspect of the text classification model training device or the multi-label text classification device, and the specific implementation process may refer to the description of the corresponding embodiments, which is not described in detail.

It should be noted that, in the present specification, the embodiments are all described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments may be referred to each other. Meanwhile, the features described in the embodiments of the present specification may be replaced or combined with each other, so that those skilled in the art can implement or use the present application. The device disclosed by the embodiment corresponds to the method disclosed by the embodiment, so that the description is simple, and the relevant points can be referred to the method part for description.

It is further noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include or include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A text classification model training method is characterized by comprising the following steps:

for each text sample, selecting at least one label sample group from a label set of the text sample, and determining a prediction related class of the label sample group based on the label prediction features of the labels in the label sample group and by using a classifier to be trained, wherein the label sample group comprises at least two labels, and the prediction related class is used for representing whether the correlation of the labels in the label sample group is the same or not;

2. The method of claim 1, wherein determining the first loss function value of the network model based on the correlation prediction feature of each text sample and the correlation actually labeled by each label in the label sequence of each text sample comprises:

3. The method according to claim 1 or 2, wherein the determining the relevance prediction features of the text sample and the label prediction features of the labels in the label sequence based on the character sequence of the text sample and the label sequence composed of the labels in the label set of the text sample and by using a network model to be trained comprises:

4. The method of claim 1, wherein the set of labels for the text sample comprises: a first set of tags and a second set of tags, the first set of tags comprising: a plurality of first labels labeled as being related to the text sample, the second set of labels comprising: a plurality of second labels labeled as irrelevant to the text sample;

the selecting at least one label exemplar set from the label set of text exemplars includes:

selecting at least one first label exemplar set and at least one second label exemplar set from the label set of text exemplars, the first label exemplar set comprising: two first labels from the first label set, the second sample set of labels comprising: a first tag in the first set of tags and a second tag in the second set of tags.

5. The method of claim 1, wherein the set of labels for the text sample comprises: a first set of tags and a second set of tags, the first set of tags comprising: a plurality of first labels labeled as being related to the text sample, the second set of labels comprising: a plurality of second labels labeled as irrelevant to the text sample;

selecting a subset of tags from a first set of tags for the text sample, the subset of tags including at least two first tags in the first set of tags;

selecting at least one label from the labels out of the label subsets in the label set, and combining each selected label with the label subsets to form a label sample group to obtain at least one label sample group;

the determining the prediction related category of the label sample group based on the label prediction features of the labels in the label sample group and by using the classifier to be trained comprises:

determining an average value of the tag prediction features of each first tag in the tag subset in the tag sample group as the tag prediction features of the tag subset;

and determining a prediction related class of the label sample group by utilizing a classifier to be predicted based on the label prediction characteristics of the label subset in the label sample group and the label prediction characteristics of labels except the label subset in the label sample group.

6. A multi-label text classification method is characterized by comprising the following steps:

7. A text classification model training device, comprising:

a second training unit, configured to select, for each text sample, at least one label sample group from a label set of the text sample, determine a prediction related class of the label sample group based on the label prediction features of the labels in the label sample group and by using a classifier to be trained, where the label sample group includes at least two labels, and the prediction related class is used to characterize whether the correlations of the labels in the label sample group are the same;

8. A multi-label text classification apparatus, comprising:

wherein the training label sequence consists of each label in a training label set; the label sample group of the text sample comprises at least two labels selected from a training label set of the text sample, and the label prediction characteristics of the labels are the label characteristics of the labels predicted by the network model;

9. A computer device, comprising: the system comprises a processor and a memory, wherein the processor and the memory are connected through a communication bus;

the memory is used for storing a program, and the program is used for implementing the text classification model training method according to any one of claims 1 to 5 or the multi-label text classification method according to claim 6.

10. A computer-readable storage medium, having stored thereon a computer program which, when being loaded and executed by a processor, carries out a method for training a text classification model according to any one of claims 1 to 5 or a method for multi-label text classification according to claim 6.