CN112328823A

CN112328823A - Training method and device for multi-label classification model, electronic equipment and storage medium

Info

Publication number: CN112328823A
Application number: CN202011335091.6A
Authority: CN
Inventors: 罗彤; 郭彦东; 李亚乾; 杨林
Original assignee: Shanghai Jinsheng Communication Technology Co ltd; Guangdong Oppo Mobile Telecommunications Corp Ltd
Current assignee: Shanghai Jinsheng Communication Technology Co ltd; Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority date: 2020-11-25
Filing date: 2020-11-25
Publication date: 2021-02-05

Abstract

The application relates to a training method and device of a multi-label classification model, electronic equipment and a storage medium, and belongs to the technical field of image processing. The method comprises the following steps: acquiring a plurality of first labels; for each first label, determining a trained single-label classification model, wherein the single-label classification model is used for classifying the first label; obtaining a first sample set used by a training single label classification model, wherein the first sample set comprises a plurality of sample images, and each sample image is marked with a first label; labeling at least one second label on the sample image in the first sample set to obtain a second sample set; and training the first multi-label classification model according to the second sample set to obtain a second multi-label classification model. When the multi-label classification model is trained, only other labels which are not labeled in the sample image need to be labeled, so that the time cost of data labeling for multi-label classification model training through the sample image is reduced, and the efficiency of model training is improved.

Description

Training method and device for multi-label classification model, electronic equipment and storage medium

Technical Field

The embodiment of the application relates to the technical field of image processing, in particular to a training method and device for a multi-label classification model, electronic equipment and a storage medium.

Background

At present, more and more images are stored in an album by a user, and in order to facilitate the user to search for the images, a terminal adds a tag to each image, so that the user can search for the corresponding image by searching for the tag. For example, if an image includes a beach, then the terminal tags the image: the beach. Therefore, the user can quickly search the image by directly inputting the beach in the subsequent searching process.

Disclosure of Invention

The embodiment of the application provides a training method and device for a multi-label classification model, electronic equipment and a storage medium, and reduces the cost for constructing a sample set. The technical scheme is as follows:

in one aspect, a method for training a multi-label classification model is provided, the method including:

acquiring a plurality of first labels;

for each first label, determining a trained single-label classification model, wherein the single-label classification model is used for classifying the first label;

obtaining a first sample set used for training the single-label classification model, wherein the first sample set comprises a plurality of sample images, and each sample image is marked with the first label;

labeling at least one second label on the sample images in the first sample set to obtain a second sample set, wherein the second sample set comprises the plurality of sample images, and each sample image is labeled with the first label and the at least one second label;

and training the first multi-label classification model according to the second sample set to obtain a second multi-label classification model.

In one possible implementation, the first multi-label classification model includes a first classification module and a second classification module;

the training the first multi-label classification model according to the second sample set to obtain a second multi-label classification model, including:

training the first classification module according to the second sample set to obtain a third multi-label classification model;

determining, by a first classification module in the third multi-label classification model, image features of the plurality of sample images;

and training a second classification module in the third multi-label classification model according to the image characteristics of the sample images and the labels labeled on each sample image to obtain the second multi-label classification model.

In a possible implementation manner, the training the first classification module according to the second sample set to obtain a third multi-label classification model includes:

training the first classification module according to the second sample set to obtain a fourth multi-label classification model, wherein the fourth multi-label classification model comprises a third classification module and a second classification module, and the third classification module is obtained by training the first classification module;

deleting the full connection layer in the third classification module in the fourth multi-label classification model to obtain the third multi-label classification model.

In one possible implementation manner, the obtaining a first sample set used for training the single label classification model includes:

based on a preset distillation algorithm, carrying out data distillation on the single label classification model to obtain a plurality of sample images used for training the single label classification model;

composing the plurality of sample images into the first sample set.

In one possible implementation manner, the obtaining the plurality of first tags includes:

acquiring label information;

extracting a plurality of first keywords from the tag information;

determining the plurality of first tags based on the plurality of first keywords.

In one possible implementation manner, the determining the plurality of first labels based on the plurality of first keywords includes:

taking the plurality of first keywords as the plurality of first tags; alternatively, the first and second electrodes may be,

for each first keyword, determining at least one second keyword, wherein a first category to which the at least one second keyword belongs to the first keyword, and forming the first keyword and the at least one second keyword into a plurality of first labels; alternatively, the first and second electrodes may be,

for each first keyword, determining a second category to which the first keyword belongs, determining at least one third keyword included in the second category, and combining the first keyword and the at least one third keyword into a plurality of first tags.

In one possible implementation, the tag information includes at least one of a historical search record and questionnaire information;

the historical search record comprises a plurality of first keywords;

the questionnaire information includes a plurality of first keywords.

In one possible implementation, the method further includes:

acquiring a target image to be classified;

and performing label classification on the target image through the second multi-label classification model to determine a plurality of third labels.

In another aspect, an apparatus for training a multi-label classification model is provided, the apparatus comprising:

the first acquisition module is used for acquiring a plurality of first labels;

the determining module is used for determining a trained single-label classification model for each first label, and the single-label classification model is used for classifying the first labels;

the second acquisition module is used for acquiring a first sample set used for training the single-label classification model, wherein the first sample set comprises a plurality of sample images, and each sample image is labeled with the first label;

the labeling module is used for performing labeling processing on at least one second label on the sample images in the first sample set to obtain a second sample set, the second sample set comprises the plurality of sample images, and each sample image is labeled with the first label and the at least one second label;

and the training module is used for training the first multi-label classification model according to the second sample set to obtain a second multi-label classification model.

In one possible implementation, the first multi-label classification model includes a first classification module and a second classification module; the training module comprises:

the first training unit is used for training the first classification module according to the second sample set to obtain a third multi-label classification model;

a first determining unit, configured to determine, by a first classification module in the third multi-label classification model, image features of the plurality of sample images;

and the second training unit is used for training a second classification module in the third multi-label classification model according to the image characteristics of the plurality of sample images and the plurality of labels labeled on each sample image to obtain the second multi-label classification model.

In a possible implementation manner, the second training unit is configured to train the first classification module according to the second sample set to obtain a fourth multi-label classification model, where the fourth multi-label classification model includes a third classification module and a second classification module, and the third classification module is a classification module obtained by training the first classification module; deleting the full connection layer in the third classification module in the fourth multi-label classification model to obtain the third multi-label classification model.

In a possible implementation manner, the second obtaining module is configured to perform data distillation on the single-label classification model based on a preset distillation algorithm to obtain a plurality of sample images used for training the single-label classification model; composing the plurality of sample images into the first sample set.

In a possible implementation manner, the first obtaining module includes:

an acquisition unit configured to acquire tag information;

an extracting unit configured to extract a plurality of first keywords from the tag information;

a second determining unit configured to determine the plurality of first tags based on the plurality of first keywords.

In a possible implementation manner, the second determining unit is configured to use the plurality of first keywords as the plurality of first tags; alternatively, the first and second electrodes may be,

the second determining unit is configured to determine, for each first keyword, at least one second keyword, where a first category to which the at least one second keyword belongs to the first keyword, and combine the first keyword and the at least one second keyword into the plurality of first tags; alternatively, the first and second electrodes may be,

the second determining unit is configured to determine, for each first keyword, a second category to which the first keyword belongs, determine at least one third keyword included in the second category, and combine the first keyword and the at least one third keyword into the plurality of first tags.

the historical search record comprises a plurality of first keywords;

the questionnaire information includes a plurality of first keywords.

In one possible implementation, the apparatus further includes:

the third acquisition module is used for acquiring a target image to be classified;

and the classification module is used for performing label classification on the target image through the second multi-label classification model and determining a plurality of third labels.

In another aspect, an electronic device is provided, the electronic device comprising a processor and a memory; the memory stores at least one program code for execution by the processor to implement the method of training a multi-label classification model as described in the above aspect.

In another aspect, a computer-readable storage medium is provided, which stores at least one program code for execution by a processor of an electronic device to implement the method for training a multi-label classification model according to the above aspect.

In another aspect, a computer program product is provided, in which program code is enabled, when executed by a processor of an electronic device, to perform a method of training a multi-label classification model as described in the above aspect.

In the embodiment of the application, when the multi-label classification model is trained, the sample image used by the training single-label classification model is used as the sample image for training the multi-label classification model, and the sample image is labeled with one label when the single-label classification model is trained, so that the label is not required to be repeatedly labeled, and only other labels which are not labeled in the sample image are required to be labeled, so that the labeling quantity of the labels is reduced, the time cost for carrying out data labeling on the multi-label classification model training through the sample image is reduced, and the efficiency of model training is improved.

Drawings

Fig. 1 shows a block diagram of a terminal provided in an exemplary embodiment of the present application;

FIG. 2 illustrates a block diagram of a server provided in an exemplary embodiment of the present application;

FIG. 3 illustrates a flow chart of a method of training a multi-label classification model in accordance with an exemplary embodiment of the present application;

FIG. 4 illustrates a schematic diagram of a method of training a multi-label classification model according to an exemplary embodiment of the present application;

FIG. 5 illustrates a flow chart of a method for multi-label classification of images, shown in an exemplary embodiment of the present application;

FIG. 6 illustrates a schematic diagram of a target image shown in an exemplary embodiment of the present application;

fig. 7 is a block diagram illustrating a structure of a training apparatus for a multi-label classification model according to an exemplary embodiment of the present application.

Detailed Description

To make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.

Reference herein to "a plurality" means two or more. "and/or" describes the association relationship of the associated objects, meaning that there may be three relationships, e.g., a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship.

In the embodiment of the application, a training method of a multi-label classification model is provided, and the method is realized by electronic equipment. The trained multi-label classification model is used for identifying a plurality of objects included in the image, outputting a plurality of labels corresponding to the plurality of objects, and classifying the image according to the plurality of labels.

The image comprises a plurality of objects, and semantic information of each object is the same or different. Tags are used to represent semantic information of objects in an image. For example, if an object "sun" is included in one image, the "sun" is visual information that can be seen by naked eyes, and the electronic device labels the "sun" for the object, thereby converting the visual information of the image into semantic information.

For example, the objects in image a are the sun, the cloud, and the house, the electronic device inputs image a into the multi-label classification model, the model output is the labels, "the sun," "the cloud," and "the house," the objects in image B are the sun and the girl, the electronic device inputs image B into the multi-label classification model, the model output is the labels, "the sun" and "the girl," respectively. The electronic device may then classify image a and image B according to the plurality of labels. For example, in response to receiving the user input label "girl," the electronic device determines image B, from among the plurality of images of the album, that includes the subject girl.

In an embodiment of the present application, an electronic device is provided, the electronic device comprising a processor and a memory for storing instructions executable by the processor; wherein the processor is configured to execute the instructions to implement a training method for executing the multi-label classification model provided by the embodiment of the application.

In one possible implementation, the electronic device may be provided as a terminal, please refer to fig. 1, which shows a block diagram of a terminal 100 according to an exemplary embodiment of the present application. The terminal 100 may be a terminal having an image processing function, such as a smart phone or a tablet computer. The terminal 100 in the present application may include one or more of the following components: processor 110, memory 120, display 130.

Processor 110 may include one or more processing cores. The processor 110 connects various parts within the overall terminal 100 using various interfaces and lines, and performs various functions of the terminal 100 and processes data by executing or executing instructions, programs, code sets, or instruction sets stored in the memory 120 and calling data stored in the memory 120. Alternatively, the processor 110 may be implemented in hardware using at least one of Digital Signal Processing (DSP), Field-Programmable Gate Array (FPGA), and Programmable Logic Array (PLA). The processor 110 may integrate one or more of a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), a Neural-Network Processing Unit (NPU), a modem, and the like. Wherein, the CPU mainly processes an operating system, a user interface, an application program and the like; the GPU is responsible for rendering and drawing the content to be displayed on the display screen 130; the NPU is used for realizing an Artificial Intelligence (AI) function; the modem is used to handle wireless communications. It is understood that the modem may not be integrated into the processor 110, but may be implemented by a single chip.

The Memory 120 may include a Random Access Memory (RAM) or a Read-Only Memory (Read-Only Memory). Optionally, the memory 120 includes a non-transitory computer-readable medium. The memory 120 may be used to store instructions, programs, code sets, or instruction sets. The memory 120 may include a stored program area and a stored data area, wherein the stored program area may store instructions for implementing an operating system, instructions for at least one function (such as a touch function, a sound playing function, an image playing function, etc.), instructions for implementing various method embodiments described below, and the like; the storage data area may store data (such as audio data, a phonebook) created according to the use of the terminal 100, and the like.

The display screen 130 is a display component for displaying a user interface. Optionally, the display screen 130 is a display screen with a touch function, and through the touch function, a user may use any suitable object such as a finger, a touch pen, and the like to perform a touch operation on the display screen 130.

The display 130 is generally provided at a front panel of the terminal 100. The display screen 130 may be designed as a full-face screen, a curved screen, a contoured screen, a double-face screen, or a folding screen. The display 130 may also be designed as a combination of a full-screen and a curved-screen, a combination of a special-shaped screen and a curved-screen, etc., which is not limited in this embodiment.

In addition, those skilled in the art will appreciate that the configuration of terminal 100 illustrated in the above-described figures is not intended to be limiting of terminal 100, and that terminal 100 may include more or less components than those shown, or some components may be combined, or a different arrangement of components. For example, the terminal 100 further includes a microphone, a speaker, a radio frequency circuit, an input unit, a sensor, an audio circuit, a Wireless Fidelity (Wi-Fi) module, a power supply, a bluetooth module, and other components, which are not described herein again.

In another possible implementation manner, the electronic device may be provided as a server, please refer to fig. 2, which shows a block diagram of a server 200 provided in an exemplary embodiment of the present application. The server 200 may generate a relatively large difference due to different configurations or performances, and may include one or more processors (CPUs) 201 and one or more memories 202, where the memories 202 are used for storing executable instructions, and the processors 201 are configured to execute the executable instructions to implement the training method for the multi-label classification model provided by the above-described method embodiments. Of course, the server may also have components such as a wired or wireless network interface, a keyboard, and an input/output interface, so as to perform input/output, and the server may also include other components for implementing the functions of the device, which are not described herein again.

Referring to fig. 3, a flowchart of a method for training a multi-label classification model according to an exemplary embodiment of the present application is shown. The embodiment takes the execution subject as an electronic device as an example for explanation. The method comprises the following steps:

step 301: the electronic device obtains a plurality of first tags.

Wherein the first label is used to represent semantic information of an object in the image. Wherein the first tag comprises at least one of an entity, a scene, a behavior, an event, and the like. The entity may include at least one of an animal, a human, food, and the like. The scene comprises at least one of landscape, building, life and entertainment place, office place and the like; the behavior includes at least one of sports, work, etc., and the event includes at least one of a dinner party, a show, a trip, etc.

In step 301, the implementation manner of acquiring the plurality of first tags by the electronic device is implemented by steps a1-a 3:

a1: the electronic device obtains tag information.

Wherein the tag information includes at least one of a historical search record and questionnaire information. The historical search record includes a plurality of first keywords. The questionnaire information includes a plurality of first keywords.

Wherein the historical search record comprises at least one of the historical search record of the photo album, the historical search record of the webpage and the like. In the embodiment of the present application, a description will be given taking an example in which the history search record includes a history search record of an album. Accordingly, the history search record is a search record of a keyword input in the search box when the user uses the album.

In one possible implementation mode, the electronic equipment acquires all search records of the user using the photo album; in another possible implementation manner, the electronic device obtains a search record within a preset time length closest to the current time. The preset duration can be set and changed according to the requirement; in the embodiment of the present application, the preset duration is not specifically limited; for example, the preset time period is three months or a half year.

Questionnaire information can be collected when the user uses the album. Correspondingly, when the user uses the photo album, the electronic equipment displays the questionnaire on the current display interface, and acquires questionnaire information based on the questionnaire. The content of the questionnaire may be a keyword that the user frequently searches for, or a keyword that the user wishes to be able to recognize. For example, the questionnaire is a survey questionnaire.

In one possible implementation, the questionnaire includes a plurality of keywords, and the user can select the keywords from the plurality of keywords; correspondingly, the step of acquiring the questionnaire information by the electronic equipment based on the questionnaire comprises the following steps: the electronic equipment acquires a selected keyword from the plurality of keywords, and takes the selected keyword as questionnaire information.

In another possible implementation manner, the questionnaire includes a plurality of input boxes, and the user can input keywords in the plurality of input boxes; correspondingly, the step of acquiring the questionnaire information by the electronic equipment based on the questionnaire comprises the following steps: the electronic equipment acquires the input keywords in the input boxes, and takes the input keywords as questionnaire information.

The electronic device can display the questionnaire on the current display interface in the form of questionnaire links. Correspondingly, the step of displaying the questionnaire on the current display interface by the electronic equipment comprises the following steps: the electronic equipment sets a prompt box above a current display interface, and link information of the questionnaire is displayed in the prompt box; in response to the link information being triggered, the questionnaire is presented.

A2: the electronic device extracts a plurality of first keywords from the tag information.

In one possible implementation, the electronic device extracts a plurality of first keywords from the historical search records. In this step, the electronic device may extract all the first keywords in the historical search records. The electronic equipment can also only extract keywords with more search times; correspondingly, the electronic equipment determines the search times of each first keyword in the historical search records, and extracts the first keywords of which the search times are greater than a first preset time according to the search times of each first keyword.

For example, if the first preset number of times is 3, the first keywords are "sky", "bird" and "two-dimensional code", respectively, and the search times of the first keywords are 6, 1 and 4, respectively, the electronic device extracts "sky" and "two-dimensional code". For another example, if the first preset number of times is 2, the first keywords are "bungee", "model" and "facial expression package", respectively, and the search times of the first keywords are 3, 4 and 7, respectively, the electronic device extracts "bungee", "model" and "facial expression package". The setting of the first preset number of times is not particularly limited.

In another possible implementation, the electronic device extracts a plurality of first keywords from the questionnaire information.

It should be noted that the electronic device may extract all the first keywords in the questionnaire information, or may extract a plurality of first keywords whose occurrence times in the questionnaire information exceed a second preset time.

In this step, data of the album used by the user is collected by obtaining the historical search record or questionnaire information of the user, and then the tags required to be classified by the second multi-tag classification model provided by the embodiment of the application are determined according to the data.

After step a2, the electronic device composes the extracted plurality of first keywords into a tag list and displays the tag list on the current display interface. The tab list is displayed, for example, at the top or left side of the current display interface, so that the user can search for a desired keyword according to the tab list.

A3: the electronic device determines the plurality of first labels based on the plurality of first keywords.

In one possible implementation manner, the electronic device takes the plurality of first keywords as the plurality of first tags. For example, the plurality of first keywords are "self-timer", "one-piece dress", "snowflake", and the electronic device composes the "self-timer", "one-piece dress" and "snowflake" into the plurality of first labels.

In another possible implementation manner, for each first keyword, the electronic device determines at least one second keyword, a first category to which the at least one second keyword belongs to the first keyword, and the first keyword and the at least one second keyword are combined into the plurality of first tags.

The electronic equipment takes the first keyword as a first category and determines at least one second keyword belonging to the category. For example, the first keyword is "cat", and the second keyword is "blue cat" or "cat-flower". The electronic device combines the cats, the blue cats and the cat cats into the first labels.

In another possible implementation manner, for each first keyword, the electronic device determines a second category to which the first keyword belongs, determines at least one third keyword included in the second category, and combines the first keyword and the at least one third keyword into the plurality of first tags.

For example, if the first keyword is "run", the second category to which the first keyword belongs is "sports", and the at least one third keyword is "skiing", "wrestling", etc. The electronic device combines "running", "skiing" and "wrestling" into the plurality of first tags.

In the embodiment of the application, the upper and lower keywords of the plurality of first keywords in the atlas classification are determined according to the obtained categories of the plurality of first keywords, so that the plurality of first keywords are refined and supplemented, and a complete label list is obtained.

It should be noted that, in addition to the above steps a1-a3, the electronic device may also obtain tags of a plurality of images in the album, so as to obtain a plurality of first tags.

Step 302: for each first label, the electronic device determines a trained single-label classification model used for classifying the first label.

And the trained single label classification model is an open source model. The electronic equipment determines that the implementation mode of the trained single label classification model is as follows: according to the first label, a single label classification model used for classifying the first label is collected from the open source model website. The open source model website can be a document retrieval website, a technical community and the like. For example, the first tag is a vehicle, and a single tag classification model capable of classifying the vehicle tag is gathered from the technical community. Since the first label is multiple, the collected single label classification model is multiple.

In this step, the label classification model collected by the electronic device may be a multi-label classification model, i.e. the model is used not only for classifying the first label but also for classifying the third label. The third label belongs to the plurality of first labels, or the third label does not belong to the plurality of first labels. In response to the third tag not belonging to the plurality of first tags, the electronic device adds the third tag to the plurality of first tags.

In the embodiment of the application, a single-label classification model used for classifying the first label and trained is determined according to each first label, so that the classification capability of the model covers the classes of the plurality of first labels, and a source is provided for subsequently acquiring the sample image corresponding to each first label.

Step 303: the electronic device obtains a first sample set used for training the single label classification model, wherein the first sample set comprises a plurality of sample images, and each sample image is marked with the first label.

In step 303, the implementation of the electronic device obtaining the first sample set used for training the single label classification model is implemented by steps a1-a 2:

a1: and the electronic equipment performs data distillation on the single-label classification model based on a preset distillation algorithm to obtain a plurality of sample images used for training the single-label classification model.

Wherein the preset distillation algorithm may be a depth inversion (depinversion) algorithm. In the training process of the single-label classification model, a plurality of sample images need to be input for learning. In the training process, the single label classification model encodes the sample images, so as to obtain historical feature data of the sample images, and the historical feature data is stored in a Batch Normalization (BN) layer. Therefore, through a preset distillation algorithm, the electronic equipment can synthesize a plurality of synthesized sample images according to the historical feature data, wherein the synthesized sample images are images with the similarity degree larger than a first similarity threshold value. For example, the first similarity threshold is 95%. The electronic device treats the plurality of synthesized sample images as the plurality of sample images.

A2: the electronic device groups the plurality of sample images into the first sample set.

Each first label corresponds to one single label classification model, each single label classification model corresponds to a plurality of sample images, and the first sample set comprises the sample images corresponding to the plurality of single label classification models.

In the embodiment of the application, the plurality of determined single label classification models are subjected to data distillation to determine a synthesized sample image, the similarity between the synthesized sample image and the sample image input by the plurality of single label classification models in the training process is greater than a first similarity threshold, a first sample set which is not completely labeled is obtained, the sample image does not need to be collected, and the data cost for obtaining the sample image is further reduced.

Step 304: and the electronic equipment performs labeling processing on at least one second label on the sample images in the first sample set to obtain a second sample set, wherein the second sample set comprises the plurality of sample images, and each sample image is labeled with the first label and the at least one second label.

Although the first label is marked in the acquired sample image, a part of the object in the sample image is still not marked, that is, the sample image belongs to a non-fully marked image. Therefore, the electronic device performs labeling processing on the sample image by using at least one second label, and then obtains a completely labeled sample image. The implementation manner of the annotation processing performed on the sample image by the electronic device is not specifically limited in the present application, for example, the annotation processing is performed on the sample image according to an image annotation tool.

In the embodiment of the application, the completely labeled sample image is obtained by labeling the at least one second label on the obtained sample image, so that only a small amount of labeled data needs to be prepared, and the cost for constructing a large number of sample sets is reduced.

Step 305: the first multi-label classification model comprises a first classification module and a second classification module, and the electronic equipment trains the first classification module according to the second sample set to obtain a third multi-label classification model.

After obtaining the second sample set, the electronic device takes the second sample set as the input of the first multi-label classification model, and trains the first multi-label classification model. Accordingly, step 305 is implemented by steps A1-A2:

a1: the electronic equipment trains the first classification module according to the second sample set to obtain a fourth multi-label classification model, the fourth multi-label classification model comprises a third classification module and a second classification module, and the third classification module is a classification module obtained by training the first classification module.

Wherein the first classification module is a Convolutional Neural Network (CNN) model. The electronic device determines the type of the first classification module according to the computational resource condition of the hardware, for example, in response to the fact that the computational power of the hardware is strong, the electronic device determines that the first classification module is a large model with high depth, for example, a Residual Network (ResNet); in response to the hardware being computationally weak, the electronic device determines that the first classification module is a lightweight model, such as a Mobile Network (Mobile net), a Shuffle Network (Shuffle net).

The second classification module is a Recurrent Neural Network (RNN) model, such as a Long-Short Term Memory (LSTM) Neural Network. The output of the fourth multi-label classification model is a plurality of labels corresponding to objects in the sample image.

A2: and deleting the full connection layer in the third classification module in the fourth multi-label classification model by the electronic equipment to obtain the third multi-label classification model.

The third classification module comprises an input layer, a convolution layer, a pooling layer and a full-connection layer. The electronic device trains the third classification module to extract image features of the sample image, the image features of the sample image are extracted by the pooling layer of the third classification module, and the full-link layer is used to convert the features into labels. Therefore, the electronic device deletes the full connection layer of the third classification module to obtain the third multi-label classification model.

In the embodiment of the application, the electronic device trains the first classification module independently to obtain a third classification module, and deletes the full connection layer in the third classification module, so that independent training of the image feature extraction capability of the third multi-label classification model is realized, and the image feature extraction capability of the model is improved.

Step 306: the electronic device determines, by the first classification module in the third multi-label classification model, image features of the plurality of sample images.

And deleting the third classification module of the full connection layer from the first classification module of the third multi-label classification model. In this step, the electronic device takes the image features of the sample image extracted by the pooling layer as an output of the first classification module of the third multi-label classification model.

In the embodiment of the application, since the full connection layer of the third classification module is deleted, the image features of the sample image extracted by the pooling layer are used as the output of the third multi-label classification model, so that data support is provided for training the second classification module.

Step 307: and the electronic equipment trains a second classification module in the third multi-label classification model according to the image characteristics of the sample images and the labels marked on each sample image to obtain the second multi-label classification model.

The input of the second classification module is the image characteristics of the sample image output by the first classification module, and the output is a plurality of labels corresponding to the sample image. That is, the second classification module is used to translate the image features of the sample image into the plurality of labels. The electronic equipment trains the first multi-label classification model once, and then trains to obtain a second multi-label classification model, and directly calls the second multi-label classification model without further training operation in the process of carrying out label classification on the images in the photo album according to the second multi-label classification model.

Referring to fig. 4, if the first classification module is a CNN model and the second classification module is an RNN model, the input of the CNN model is an image, the image features obtained by training the CNN model are input into the RNN model, and the output of the RNN model is a label.

Referring to fig. 5, a flowchart of a multi-label classification method for an image according to an exemplary embodiment of the present application is shown. The embodiment takes the execution subject as an electronic device as an example for explanation. The method comprises the following steps:

step 501: the electronic device obtains a target image to be classified.

And the target image to be classified is an image which is not subjected to label classification in the album.

In one possible implementation, the electronic device performs labeling of the image tag each time an image is added to the album. And in response to the newly added image in the album, the electronic equipment takes the newly added image as a target image to be classified. For example, when a user takes a photo and adds a new photo in the album, the electronic device obtains the new photo and uses the new photo as a target image to be classified.

In another possible implementation manner, the electronic device only performs the labeling of the image tag when an image is newly added to the album and the electric quantity of the electronic device is sufficient. And responding to the newly added image in the album and the electronic equipment in the charging state, acquiring the newly added image by the electronic equipment, and taking the newly added image as a target image to be classified. For example, the electronic device is a terminal, and in order to save the electric quantity of the terminal, the electronic device obtains at least one newly added photo in the album when being in a charging state, so that the electric quantity of the electronic device is saved.

In another possible implementation manner, when the user performs image search, the electronic device performs annotation of the image tag; in response to receiving the search request, the electronic device obtains the images in the album which are not subjected to the label classification, and takes the images which are not subjected to the label classification as target images.

In the embodiment of the application, the target images to be classified are acquired in a real-time acquisition mode or an energy-saving acquisition mode when the electronic equipment is in a charging state, so that the newly added images are separated in time, and the electric quantity of the electronic equipment is saved.

Step 502: and the electronic equipment performs label classification on the target image through the second multi-label classification model to determine a plurality of third labels.

The electronic equipment inputs the target image into the second multi-label classification model and outputs a plurality of third labels of the target image. For example, fig. 6 is a target image including a plurality of objects, i.e., beach, sun, cloud, island, bird, tree. The input of the second multi-label classification model is a target image, the first classification module performs feature extraction on the target image to obtain image features, the second classification module converts the image features to obtain a plurality of third labels, namely, a beach, a sun, a cloud, an island, a bird and a tree, and the output of the second multi-label classification model is the third labels. The electronic device classifies the target image according to the third tags, for example, if the tags all belong to a "landscape" category, the electronic device adds the target image to the "landscape" category, so that the electronic device displays the target image on the display interface in response to the user inputting the keyword "landscape".

After the electronic equipment determines the plurality of third tags of the target image, the plurality of third tags are marked in the image attribute of the target image, so that the subsequent image searching based on the tags is facilitated.

In the embodiment of the application, through acquiring the target image to be classified, the label marking is carried out on the target image through the trained second multi-label classification model, and the target image is classified, so that the convenience of using the photo album by a user is improved.

Referring to fig. 7, a block diagram of a training apparatus for a multi-label classification model according to an embodiment of the present application is shown. The training apparatus 700 of the multi-label classification model may be implemented as all or a part of the processor 110 or all or a part of the processor 201 by software, hardware or a combination of both.

The apparatus 700 comprises:

a first obtaining module 701, configured to obtain a plurality of first tags;

a determining module 702, configured to determine, for each first label, a trained single-label classification model, where the single-label classification model is used to classify the first label;

a second obtaining module 703, configured to obtain a first sample set used for training the single-label classification model, where the first sample set includes a plurality of sample images, and each sample image is labeled with the first label;

an labeling module 704, configured to perform labeling processing on at least one second label on the sample images in the first sample set to obtain a second sample set, where the second sample set includes the multiple sample images, and each sample image is labeled with the first label and the at least one second label;

the training module 705 is configured to train the first multi-label classification model according to the second sample set to obtain a second multi-label classification model.

In one possible implementation, the first multi-label classification model includes a first classification module and a second classification module; the training module 705 includes:

In a possible implementation manner, the second obtaining module 703 is configured to perform data distillation on the single label classification model based on a preset distillation algorithm to obtain a plurality of sample images used for training the single label classification model; the plurality of sample images are grouped into the first sample set.

In a possible implementation manner, the first obtaining module 701 includes:

an acquisition unit configured to acquire tag information;

a second determining unit, configured to determine the plurality of first tags based on the plurality of first keywords.

In one possible implementation, the tag information includes at least one of historical search records and questionnaire information;

the historical search record comprises a plurality of first keywords;

the questionnaire information includes a plurality of first keywords.

In one possible implementation, the apparatus further includes:

The present embodiments also provide a computer-readable storage medium storing at least one program code, which is loaded and executed by a processor to implement the method for training the multi-label classification model as shown in the above embodiments.

Embodiments of the present application further provide a computer program product, wherein when a processor of an electronic device executes program code in the computer program product, the electronic device is enabled to execute the training method of the multi-label classification model as shown in the above embodiments.

Those skilled in the art will recognize that, in one or more of the examples described above, the functions described in the embodiments of the present application may be implemented in hardware, software, firmware, or any combination thereof. When implemented in software, the functions may be stored on or transmitted over as one or more program codes or code on a computer-readable medium. Computer-readable media includes both computer storage media and communication media including any medium that facilitates transfer of a computer program from one place to another. A storage media may be any available media that can be accessed by a general purpose or special purpose computer.

The above description is only exemplary of the present application and should not be taken as limiting, as any modification, equivalent replacement, or improvement made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims

1. A method for training a multi-label classification model, the method comprising:

acquiring a plurality of first labels;

2. The training method of claim 1, wherein the first multi-label classification model comprises a first classification module and a second classification module;

3. The training method of claim 2, wherein the training the first classification module according to the second sample set to obtain a third multi-label classification model comprises:

4. The training method of claim 1, wherein the obtaining a first set of samples used for training the single label classification model comprises:

composing the plurality of sample images into the first sample set.

5. The training method of claim 1, wherein the obtaining a plurality of first labels comprises:

acquiring label information;

extracting a plurality of first keywords from the tag information;

6. The training method of claim 5, wherein the determining the first plurality of labels based on the first plurality of keywords comprises:

7. The training method of claim 5, wherein the tag information comprises at least one of historical search records and questionnaire information;

the historical search record comprises a plurality of first keywords;

the questionnaire information includes a plurality of first keywords.

8. Training method according to claim 1, characterized in that the method further comprises:

acquiring a target image to be classified;

9. An apparatus for training a multi-label classification model, the apparatus comprising:

the first acquisition module is used for acquiring a plurality of first labels;

10. An electronic device, comprising a processor and a memory; the memory stores at least one program code for execution by the processor to implement the method of training the multi-label classification model according to any one of claims 1 to 8.

11. A computer-readable storage medium, characterized in that the storage medium stores at least one program code for execution by a processor to implement the method of training a multi-label classification model according to any of claims 1 to 8.