CN115272780B

CN115272780B - Method for training multi-label classification model and related product

Info

Publication number: CN115272780B
Application number: CN202211197000.6A
Authority: CN
Inventors: 贺婉佶; 史晓宇; 和超
Original assignee: Beijing Airdoc Technology Co Ltd
Current assignee: Beijing Airdoc Technology Co Ltd
Priority date: 2022-09-29
Filing date: 2022-09-29
Publication date: 2022-12-23
Anticipated expiration: 2042-09-29
Also published as: CN115272780A

Abstract

The invention relates to a method for training a multi-label classification model, which comprises the following steps: acquiring a first picture set to be used as a training sample of a multi-label classification model, wherein the multi-label classification model is a pre-trained basic model based on a second picture set, pictures in the second picture set have N class labels, and pictures in the first picture set have M class labels and lack N-M class labels; creating a new picture set based on the first picture set and the second picture set, wherein pictures in the new picture set which lack part of class labels are configured with N-M soft labels determined based on the multi-label classification model so as to label the classes which are missing from the pictures based on the soft labels; and performing fine tuning training on the multi-label classification model based on the new image set. By the scheme of the invention, the performance of the multi-label classification model can be improved by effectively utilizing the data resources with the missing labels. In addition, the invention also provides equipment and a computer readable storage medium.

Description

Method for training multi-label classification model and related product

Technical Field

The present invention relates generally to the field of picture processing technology. More particularly, the present invention relates to a method of training a multi-label classification model, an apparatus for performing the aforementioned method, and a computer-readable storage medium.

Background

This section is intended to provide a background or context to the embodiments of the invention that are recited in the claims. The description herein may include concepts that could be pursued, but are not necessarily ones that have been previously conceived or pursued. Thus, unless otherwise indicated herein, what is described in this section is not prior art to the description and claims in this application and is not admitted to be prior art by inclusion in this section.

Nowadays, multi-label classification models based on deep learning (mainly convolutional neural networks and transform series at present) are increasingly applied to practical application scenarios. However, in practical applications (for example, in an image class identification process), a problem of data increment learning is often encountered. Specifically, a model of a multi-label classification task for image classification has limited data resources at the initial development stage, and a basic multi-label classification model is obtained by supervised learning training using the limited data resources. After a period of time, a new labeled data set may be obtained from a location (e.g., a public data set) to continue training the model based on the new data set. But the new dataset's category system is not exactly the same as the category system needed for the task, e.g., the new dataset's category system is a proper subset of the category system needed for the task. That is, only a part of categories are labeled in the new data set, and for other categories, even if the data of the categories exist in the pictures of the new data set, the data do not have corresponding labels. In other words, the pictures in the new data set lack part of the category labels. For such a new data set without class labels, if the new data set is directly used to train the multi-label classification model, the model loses the recognition capability of the part of classes without labels, thereby affecting the performance of the multi-label classification model. Therefore, how to effectively utilize the data resources with missing tags to improve the performance of the multi-tag classification model becomes an urgent technical problem to be solved.

Disclosure of Invention

In order to solve at least the technical problems described in the background section, the present invention provides a scheme for training a multi-label classification model. By using the scheme of the invention, the identification capability of the multi-label classification model to the types with labels can be improved, and the identification capability of the types without labels is not forgotten, so that the overall performance of the multi-label classification model is effectively improved.

In view of this, the present invention provides solutions in the following aspects.

A first aspect of the present invention provides a method for training a multi-label classification model, comprising: acquiring a first picture set to be used as a training sample of a multi-label classification model, wherein the multi-label classification model is a pre-trained basic model based on a second picture set, pictures in the second picture set have N class labels, the pictures in the first picture set have M class labels and have N-M class labels, and M is less than N; creating a new picture set based on the first picture set and the second picture set, wherein pictures in the new picture set which lack part of class labels are configured with N-M soft labels determined based on the multi-label classification model so as to label the classes which are missing from the pictures based on the soft labels; and performing fine tuning training on the multi-label classification model based on the new picture set.

In one embodiment, creating a new picture set based on the first picture set and the second picture set comprises: merging the first picture set and the second picture set to obtain the new picture set, wherein the pictures in the new picture set comprise all pictures in the first picture set and all pictures in the second picture set.

In one embodiment, further comprising: and determining a class label of the picture serving as the training sample based on the source of the picture in the new picture set.

In one embodiment, determining the class labels that the pictures as training samples have based on the source of the pictures in the new picture set comprises: in response to determining that the pictures in the new picture set serving as training samples are from the second picture set, determining N class labels corresponding to the second picture set as class labels of the pictures serving as training samples; or in response to determining that the pictures in the new picture set serving as training samples are from the first picture set, determining M class labels and N-M soft labels corresponding to the first picture set as class labels of the pictures serving as training samples.

In one embodiment, wherein the method further comprises: performing class prediction on unlabeled classes of pictures from the first picture set based on the multi-label classification model; and determining N-M soft labels corresponding to the unlabeled categories according to the result of the category prediction.

In one embodiment, the class predicting the unlabeled class of the picture from the first picture set based on the multi-label classification model comprises: predicting a category probability value corresponding to an unlabeled category of a picture from the first picture set based on the multi-label classification model, and determining the category probability value as a result of the category prediction.

In one embodiment, fine-tuning training the multi-label classification model based on the new picture set comprises: and carrying out supervised fine tuning training on the multi-label classification model by taking the pictures in the new picture set as supervision signals, wherein a target activation function is taken as the last layer of the multi-label classification model in the training process, and cross entropy is taken as a loss function of the multi-label classification model.

In one embodiment, further comprising: and when the multi-label classification model is finely adjusted by using the pictures from the first picture set, determining a target of cross entropy according to class probability values corresponding to the unlabeled classes of the pictures serving as training samples.

A second aspect of the invention provides an apparatus comprising: a processor; and a memory storing computer instructions for training a multi-label classification model, which, when executed by the processor, cause the apparatus to perform the method according to the foregoing first aspect and in the following embodiments.

A third aspect of the invention provides a computer readable storage medium comprising computer instructions for training a multi-label classification model, which program instructions, when executed by the processor, cause the implementation of the method according to the first aspect above and in the following embodiments.

By utilizing the scheme provided by the invention, a new picture set can be constructed by the picture set with the complete label and the picture set with the missing type label, and the corresponding soft label is configured for the picture marked by the missing type in the new picture set, so that all the images in the new picture set have the complete label. Therefore, the multi-label classification model is trained by the new image set, so that the recognition capability of the multi-label classification model on the classes with labels can be improved, and the recognition capability of the classes without labels is not forgotten, and the overall performance of the multi-label classification model is effectively improved.

Drawings

The above and other objects, features and advantages of exemplary embodiments of the present invention will become readily apparent from the following detailed description, which proceeds with reference to the accompanying drawings. In the accompanying drawings, which are meant to be exemplary and not limiting, several embodiments of the invention are shown and indicated by like or corresponding reference numerals, wherein:

FIG. 1 is a flow diagram illustrating a method of training a multi-label classification model according to one embodiment of the invention;

FIG. 2 is a flow diagram illustrating a method of training a multi-label classification model according to another embodiment of the invention;

fig. 3 is a schematic diagram showing the structure of each picture set according to an embodiment of the present invention;

FIG. 4 is a flowchart illustrating a method for predicting a category of a picture with no category labeled in a new picture set according to an embodiment of the present invention; and

fig. 5 is a schematic configuration diagram illustrating an apparatus according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be obtained by a person skilled in the art without making creative efforts based on the embodiments of the present invention, belong to the protection scope of the present invention.

It should be understood that the terms "first," "second," "third," and "fourth," etc. in the claims, description, and drawings of the present invention are used for distinguishing between different objects and not for describing a particular order. The terms "comprises" and "comprising," when used in the specification and claims of this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It is also to be understood that the terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in the specification and claims of this application, the singular form of "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should be further understood that the term "and/or" as used in the specification and claims of this specification refers to any and all possible combinations of one or more of the associated listed items and includes such combinations.

In the image classification process by using deep learning, especially for some application scenes such as auxiliary fundus disease classification, the problem of data increment learning is often encountered. Generally, in the early development stage of a multi-label classification task model for image classification, due to limited data resources, a basic multi-label classification model is obtained by supervised learning training by using the limited data resources. Then, after a period of time, the model continues to be trained using new data resources that are available. The inventor has found that, when some classes of pictures in a new data resource are marked in the absence of the pictures (it can be understood that pictures in the new data resource themselves include all classes, but only some classes have corresponding labels), training a model by directly using the new data resource without the labels of some classes causes the model to lose the recognition capability of the part of classes without the marks. For example, the multi-label classification task as the basic model is to identify { apple, pear, banana } on the picture, and if the new data resource only marks whether each picture contains { apple, pear } and does not mark whether the picture contains { banana }, the subsequent model loses the identification capability for { banana }.

In this regard, the inventors also found that, when performing category labeling on pictures, in the related art, unlabeled categories are mainly manually labeled by human, or unlabeled pictures are subjected to category labeling using a hard label (0 or 1). It can be seen that the manual labeling mode consumes a great deal of manpower, material resources and time, and the whole training period is greatly prolonged, and the hard label mode is suitable for the labeling requirement of the image without the label but not suitable for the image with the label. Therefore, how to effectively utilize the data resources with missing tags to improve the performance of the multi-tag classification model becomes an urgent technical problem to be solved.

In order to solve the technical problem, the inventor finds that a new picture set can be constructed by using a picture set with complete labels and a picture set with missing partial labels, and configures soft labels for pictures marked by the missing partial categories in the new picture set, so that all images in the new picture set have complete labels. Therefore, the multi-label classification model is trained by the new picture set, so that the recognition capability of the multi-label classification model on the classes with labels can be improved, and the recognition capability of the classes without labels is not forgotten, so that the overall performance of the multi-label classification model is effectively improved.

The following detailed description of embodiments of the invention refers to the accompanying drawings.

FIG. 1 is a flow diagram illustrating a method 100 of training a multi-label classification model according to one embodiment of the invention.

As shown in fig. 1, at step S101, a first set of pictures to be used as training samples of a multi-label classification model may be obtained. The aforementioned multi-label classification model may be a pre-trained base model based on the second picture set. It should be noted that, the specific sources of the first photo set and the second photo set are not limited herein, and may be selected from some published photo resources, for example. The pictures in the second picture set have N category labels, that is, the pictures in the second picture set have complete category labels required by the task of the multi-label classification model. The pictures in the first picture set have M category labels and lack N-M category labels (where M < N), that is, the pictures in the first picture set have some category labels required by the task of the multi-label classification model, and they lack labels for the partial category labels. For example, the task of the multi-label classification model as the basic model is to identify { apple, pear, banana } on the pictures, at this time, the pictures in the second picture set have 3 category labels of apple, pear, banana, and the like, and if only each picture in the first picture set is labeled with 2 category labels of apple and pear, the pictures in the first picture set lack the 1 category label of banana.

It should be noted that, in the process of training the multi-label classification model of the basic model by using the second image set of the complete class labels, a general classification neural network architecture may be used for implementation, and the specific training process is not described herein any more.

After the acquisition of the first picture set is completed, then at step S102, a new picture set may be created based on the first picture set and the second picture set. And the pictures marked by the missing part of categories in the new picture set are configured with N-M soft labels determined based on a multi-label classification model so as to mark the missing categories of the pictures based on the soft labels. The soft label is different from the traditional hard label (only the value 0 or 1), and particularly defines the weight of the sample according to the probability. The soft label may contain more useful information representing the recognition of each class by the multi-label classification model.

Next, at step S103, the multi-label classification model may be fine-tuned and trained using the new picture set as described above.

Therefore, the diversity of samples can be effectively increased by using the first picture set and the second picture set to create the new picture set, and meanwhile, the soft label is configured for the pictures which are marked by the missing part of the categories in the new picture set, so that all the images in the new picture set have complete labels. Therefore, the multi-label classification model is trained by the new picture set, so that the recognition capability of the multi-label classification model on the classes with labels can be improved, and the recognition capability of the classes without labels is not forgotten, so that the overall performance of the multi-label classification model is effectively improved.

FIG. 2 is a flow diagram illustrating a method 200 of training a multi-label classification model according to another embodiment of the invention. It should be noted that the method 200 can be understood as a limitation or extension of the advancement of the method 100. Therefore, the same applies to the following description in relation to the details of fig. 1.

As shown in fig. 2, at step S201, a first picture set to be used as a training sample of the multi-label classification model may be obtained. For the detailed description of the first picture set, the second picture set for training the multi-label classification model, and the multi-label classification model, reference may be made to the foregoing related descriptions, for example, the pictures in the second picture set have the complete class labels required by the task of the multi-label classification model, while the pictures in the first picture set have the partial class labels required by the task of the multi-label classification model, and the labels for the classes are missing. In addition, the pictures in the first picture set and the second picture set may be pictures of a plurality of application fields, for example, medical field pictures such as fundus pictures. The pictures can be obtained from some public image resources and the like.

Next, at step S202, a merging process may be performed on the first picture set and the second picture set to obtain a new picture set. The pictures in the new picture set can comprise all pictures in the first picture set and all pictures in the second picture set, so that the diversification of the pictures in the new picture set can be ensured, and a precondition guarantee is provided for the effective training of a multi-label classification model in the follow-up implementation.

Next, at step S203, a category label that the picture as the training sample has can be determined based on the source of the picture in the new picture set. In some embodiments, in response to determining that the pictures in the new picture set that are training samples are derived from the second picture set, N class labels corresponding to the second picture set may be determined as class labels for the pictures that are training samples. That is, the complete class label in the second picture set may be regarded as the class label of the picture of the training sample. In other embodiments, in response to determining that the pictures in the new picture set serving as the training samples are derived from the first picture set, the M class labels and the N-M soft labels corresponding to the first picture set may be determined as the class labels of the pictures serving as the training samples. That is, the existing partial category labels in the second picture set may be used as partial category labels of the pictures of the training samples, and the soft labels are used to label the remaining categories in the pictures of the training samples.

The process of constructing a new picture set is further described with reference to the specific structure of each picture set in fig. 3. As shown in FIG. 3, assume that an image multi-label classification task of N classes { label _1, label _2, \ 8230;, label _ M, \ 8230;, label _ N } is required. The second picture set is a data set 02 with complete multi-label classification labels, and the data set 02 can be used for supervised training to obtain a multi-label classification basic model. Then, a new data set 01 (i.e., a first picture set) is obtained, wherein the first picture set comprises a plurality of pictures 01. While picture 01 has labels { label _1, label _2, \8230;, label _ M } for partial categories, where M < N.

At this point, the first picture set and the second picture set may be merged to obtain a new picture set. The new picture set includes all pictures of the first picture set and the second picture set. Secondly, for the pictures in the new picture set, if the picture comes from the data set 02, the labels { label _1, label _2, \ 8230, label _ N } of the data set 02 can be used. If the picture is from the data set 01, the labels { label _1, label _2, \ 8230;, label _ M } of the top M categories of the picture can be directly used with the labels { label _1, label _2, \ 8230;, label _ M } of the data set 01 itself, and the labels { prob _ (M + 1), \8230;, prob _ N } of the picture are soft labels of the corresponding categories derived from the prediction inference of the picture by the base model.

Fig. 4 is a flowchart illustrating a method for predicting a category of a picture with no category labeled in a new picture set according to an embodiment of the present invention. As shown in fig. 4, at step S401, a class prediction may be performed on an unlabeled class of a picture from the first picture set based on the multi-label classification model. For example, the class prediction may be performed for M +1 to n classes of pictures in the first picture set. In some embodiments, a category probability value corresponding to an unlabeled category of a picture from the first picture set may be specifically predicted based on the multi-label classification model, and the category probability value is determined as a result of the foregoing category prediction. Next, at step S402, N-M soft labels corresponding to the unlabeled category may be determined according to the result of the aforementioned category prediction.

After the merging and the label processing of the new picture set are completed, return to fig. 2 is made. Next, at step S204, the multi-label classification model may be fine-tuned trained using the new picture set. In some embodiments, the images in the new image set may be used as a supervision signal to perform supervised fine tuning training on the multi-label classification model. In the training process, a target activation function (wherein the target activation function includes but is not limited to a sigmoid function or other activation functions) is used as the last layer of the multi-label classification model, and cross entropy is used as a loss function of the multi-label classification model. In particular, when the multi-label classification model is finely tuned by using the pictures from the first picture set, the target of the cross entropy needs to be determined according to the class probability values corresponding to the unlabeled classes of the pictures serving as training samples. For example, in the multi-label classification model fine tuning process, the cross entropy of the portion of the soft label { prob _ (M + 1), \8230;, prob N } is targeted to the probability values from the base model generation.

According to the scheme, a new picture set is constructed by utilizing the picture set with the complete labels and the picture set with the missing part labels, and the soft labels are configured for the pictures marked with the missing part labels in the new picture set, so that all the pictures in the new picture set have the complete labels. Therefore, the performance of the model on the part of the category with the label can be improved on the basis of not increasing additional manual labeling labor force. Meanwhile, in the fine adjustment process, the probability value of the basic model to the part of the category without the label is learned by the model, so that the identification capability of the part of the category without the label is not lost by the fine-adjusted model.

In addition, the scheme of the invention does not need to change the backbone of the general classification neural network, so the method can be applied to the general classification neural network. In addition, different new picture sets can be obtained respectively by assuming that different first picture sets are obtained at different time points, and the model is finely adjusted by introducing the corresponding new picture sets at different time points, so that the model is finely adjusted for multiple times. That is, in the process of training the multi-label splitting model, the training method can be used for fine tuning the model at different time points for multiple times.

Fig. 5 is a block diagram illustrating an apparatus 500 according to an embodiment of the present invention. As shown in fig. 5, device 500 may include a processor 501 and a memory 502. Wherein the memory 502 stores computer instructions for training the multi-label classification model, which when executed by the processor 501, cause the apparatus 501 to perform the method according to the preceding description in connection with fig. 1 to 3. For example, in some embodiments, the apparatus 500 may obtain a first picture set to be used as a training sample of the multi-label classification model, train the multi-label classification model as a base model using a second picture set, construct a new picture set, fine tune the multi-label classification model using the new picture set, and so on. Based on this, the device 500 constructs a new picture set by using the picture set with the complete label and the picture set with the missing category label, and configures a corresponding soft label for the picture labeled with the missing category in the new picture set, so that all the images in the new picture set have the complete label. Therefore, the multi-label classification model is trained by the new picture set, so that the recognition capability of the multi-label classification model on the classes with labels can be improved, and the recognition capability of the classes without labels is not forgotten, so that the overall performance of the multi-label classification model is effectively improved.

It should also be appreciated that the subject matter (e.g., a device, module or component, etc.) performing the operations of the present examples can include or otherwise access a computer-readable medium, such as a storage medium, a computer storage medium, or a data storage device (removable) and/or non-removable) such as, for example, a magnetic disk, optical disk, or tape. Computer storage media may include volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information, such as computer readable instructions, data structures, program modules or other data. In this regard, the present invention also discloses a computer-readable storage medium having stored thereon computer-readable instructions for training a multi-label classification model, which, when executed by one or more processors, perform the methods and operations described above in connection with the figures.

While various embodiments of the present invention have been shown and described herein, it will be obvious to those skilled in the art that such embodiments are provided by way of example only. Numerous modifications, changes, and substitutions will occur to those skilled in the art without departing from the spirit and scope of the present invention. It should be understood that various alternatives to the embodiments of the invention described herein may be employed in practicing the invention. It is intended that the following claims define the scope of the invention and that the module compositions, equivalents, or alternatives falling within the scope of these claims be covered thereby.

Claims

1. A method of training a multi-label classification model, comprising:

acquiring a first picture set to be used as a training sample of a multi-label classification model, wherein the multi-label classification model is a pre-trained basic model based on a second picture set, pictures in the second picture set have N class labels, the pictures in the first picture set have M class labels and have N-M class labels, and M is less than N;

creating a new picture set based on the first picture set and the second picture set, wherein the pictures which lack part of class labels in the new picture set are configured with N-M soft labels determined based on the multi-label classification model so as to label the classes which lack the pictures based on the soft labels, the classes of the pictures from the first picture set which are not labeled are subjected to class prediction based on the multi-label classification model, and the N-M soft labels corresponding to the classes which are not labeled are determined according to the result of the class prediction; and

and carrying out fine tuning training on the multi-label classification model based on the new picture set.

2. The method of claim 1, wherein creating a new picture set based on the first picture set and the second picture set comprises:

and merging the first picture set and the second picture set to obtain the new picture set, wherein the pictures in the new picture set comprise all pictures in the first picture set and all pictures in the second picture set.

3. The method of claim 2, further comprising:

and determining a class label of the picture serving as the training sample based on the source of the picture in the new picture set.

4. The method of claim 3, wherein determining the class label that the picture as the training sample has based on the source of the picture in the new picture set comprises:

in response to determining that the pictures in the new picture set serving as the training samples are from the second picture set, determining N class labels corresponding to the second picture set as class labels of the pictures serving as the training samples; or

In response to determining that the pictures in the new picture set serving as training samples are from the first picture set, determining M class labels and N-M soft labels corresponding to the first picture set as class labels of the pictures serving as training samples.

5. The method of claim 1, wherein performing class prediction for unlabeled classes of pictures from the first set of pictures based on the multi-label classification model comprises:

predicting a category probability value corresponding to an unlabeled category of a picture from the first picture set based on the multi-label classification model, and determining the category probability value as a result of the category prediction.

6. The method of claim 5, wherein fine-tuning the multi-label classification model based on the new picture set comprises:

and carrying out supervised fine tuning training on the multi-label classification model by taking the pictures in the new picture set as supervision signals, wherein a target activation function is taken as the last layer of the multi-label classification model in the training process, and cross entropy is taken as a loss function of the multi-label classification model.

7. The method of claim 6, further comprising:

and when the multi-label classification model is finely adjusted by using the pictures from the first picture set, determining a target of cross entropy according to class probability values corresponding to the unlabeled classes of the pictures serving as training samples.

8. An electronic device, comprising:

a processor; and

a memory storing computer instructions for training a multi-label classification model, which when executed by the processor, cause the electronic device to perform the method of any of claims 1-7.

9. A computer readable storage medium containing computer instructions for training a multi-label classification model, which when executed by a processor, cause the method according to any one of claims 1-7 to be carried out.