CN114842251A

CN114842251A - Training method and device of image classification model, image processing method and device and computing equipment

Info

Publication number: CN114842251A
Application number: CN202210452722.5A
Authority: CN
Inventors: 黄泱柯; 洪伟; 胡光龙; 李雪莉
Original assignee: Netease Hangzhou Network Co Ltd
Current assignee: Netease Hangzhou Network Co Ltd
Priority date: 2022-04-27
Filing date: 2022-04-27
Publication date: 2022-08-02

Abstract

The embodiment of the disclosure provides a training and image processing method and device for an image classification model and computing equipment. The training method of the image classification model comprises the following steps: processing the initial first sample set and the initial second sample set according to the first model, and processing the initial second sample set according to the second model to obtain a first sample set after 1 st update and a second sample set after 1 st update; performing at least one model training process on the first model and the second model according to the first sample set after the 1 st update and the second sample set after the 1 st update to obtain a trained first model, wherein the model training process comprises a semi-supervised training process, a supervised training process and an unsupervised training process aiming at the first model and an unsupervised training process aiming at the second model; and obtaining an image classification model according to the trained first model. The labor cost for training the image classification model is reduced.

Description

Training method and device of image classification model, image processing method and device and computing equipment

Technical Field

The embodiment of the disclosure relates to the technical field of artificial intelligence, in particular to a training and image processing method and device for an image classification model and computing equipment.

Background

This section is intended to provide a background or context to the embodiments of the disclosure that are recited in the claims. The description herein is not admitted to be prior art by inclusion in this section.

Image classification refers to a process of dividing an image into specific categories under classification types according to the corresponding classification types. Image classification has a wide range of application scenarios, such as image recommendation, cargo identification, and so forth.

The image classification is mainly realized based on an image classification model. In the training process of the image classification model, a sample image and the labeling information of the sample image need to be acquired, and then the image classification model is trained through the sample image and the labeling information. After the training is finished, the image classification model has the capability of classifying the images, and then the images needing to be classified are input into the image classification model, so that the corresponding image classes can be obtained.

Because the training samples required in the training of the image classification model are huge, a large number of marking personnel are often required to mark the sample images, and therefore the labor cost for training the image classification model is high.

Disclosure of Invention

The disclosure provides a training method and a training device for an image classification model, an image processing method and an image processing device for solving the problem of high labor cost of training the image classification model.

In a first aspect of embodiments of the present disclosure, there is provided a training method of an image classification model, including:

processing an initial first sample set and an initial second sample set according to a first model, and processing the initial second sample set according to a second model to obtain a 1 st updated first sample set and a 1 st updated second sample set;

performing at least one model training process on the first model and the second model according to the first sample set after the 1 st update and the second sample set after the 1 st update to obtain a trained first model, wherein the model training process comprises a semi-supervised training process, a supervised training process and an unsupervised training process aiming at the first model, and an unsupervised training process aiming at the second model;

and obtaining an image classification model according to the trained first model, wherein the initial first sample set comprises first sample images, the first sample images are labeled with corresponding first class labels, the initial second sample set comprises second sample images, and the second sample images are not labeled.

In a second aspect of the disclosed embodiments, there is provided an image processing method comprising:

acquiring a first image to be processed;

inputting the first image into an image classification model to obtain the category of the first image under each classification type, wherein the image classification model is obtained according to any one of the first aspect.

In a third aspect of the disclosed embodiments, there is provided an apparatus for training an image classification model, comprising:

the updating module is used for processing the initial first sample set and the initial second sample set according to the first model and processing the initial second sample set according to the second model to obtain a first sample set after 1 time of updating and a second sample set after 1 time of updating;

a training module, configured to perform at least one model training process on the first model and the second model according to the 1 st updated first sample set and the 1 st updated second sample set to obtain a trained first model, where the model training process includes a semi-supervised training process, a supervised training process, and an unsupervised training process for the first model, and an unsupervised training process for the second model;

and the processing module is used for obtaining an image classification model according to the trained first model, wherein the initial first sample set comprises first sample images, the first sample images are marked with corresponding first class labels, the initial second sample set comprises second sample images, and the second sample images are not labeled.

In a fourth aspect of the disclosed embodiments, there is provided an image processing apparatus comprising:

the acquisition module is used for acquiring a first image to be processed;

and the processing module is used for inputting the first image into an image classification model to obtain the category of the first image under each classification type, wherein the image classification model is a model obtained by training according to any one of the first aspect.

In a fifth aspect of embodiments of the present disclosure, there is provided a computing device comprising: at least one processor and memory;

the memory stores computer execution instructions;

the at least one processor executing the computer-executable instructions stored by the memory causes the at least one processor to perform the method of training an image classification model according to any one of the first aspects or causes the at least one processor to perform the method of image processing according to the second aspect.

In a sixth aspect of embodiments of the present disclosure, a computer-readable storage medium is provided, in which computer-executable instructions are stored, and when a processor executes the computer-executable instructions, the method for training an image classification model according to any one of the first aspect is implemented, or the method for processing an image according to the second aspect is implemented.

The method, the device and the computing equipment for training and processing the image classification model firstly process an initial first sample set and an initial second sample set according to a first model, process the initial second sample set according to a second model to obtain a 1 st updated first sample set and a 1 st updated second sample set, then perform at least one model training process on the first model and the second model according to the 1 st updated first sample set and the 1 st updated second sample set to obtain a trained first model, and finally obtain the image classification model according to the trained first model. Because the initial first sample set comprises the first sample images marked with the first class labels, and the initial second sample set comprises the second sample images without labels, namely, the scheme of the embodiment of the disclosure adopts both the sample sets with labels and the sample sets without labels for model training processing, and can complete the training of the image classification model without marking all the sample images in the training of the image classification model, so that the labor cost is low.

Drawings

The above and other objects, features and advantages of exemplary embodiments of the present disclosure will become readily apparent from the following detailed description read in conjunction with the accompanying drawings. Several embodiments of the present disclosure are illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings and in which:

FIG. 1 is a schematic diagram of a training method for an image classification model;

fig. 2 is a schematic diagram of an application scenario provided by the embodiment of the present disclosure;

fig. 3 is a schematic flowchart of a training method of an image classification model according to an embodiment of the present disclosure;

FIG. 4 is a schematic flow chart illustrating sample set update provided by an embodiment of the present disclosure;

FIG. 5 is a schematic diagram illustrating a 1 st update process of a sample set according to an embodiment of the disclosure;

FIG. 6 is a schematic flow chart of a model training process provided by an embodiment of the present disclosure;

FIG. 7 is a schematic diagram of a model training process provided by an embodiment of the present disclosure;

FIG. 8 is a flow chart illustrating a one-time semi-supervised training process provided by an embodiment of the present disclosure;

fig. 9 is a schematic diagram of obtaining a third sample set according to the embodiment of the disclosure;

fig. 10 is a schematic diagram of obtaining a third sample set according to an embodiment of the present disclosure;

FIG. 11 is a schematic flow chart of a first model training process provided by the embodiments of the present disclosure;

fig. 12 is a schematic flowchart of a merging model training process provided in the embodiment of the present disclosure;

fig. 13 is a schematic network structure diagram of a merging model provided in the embodiment of the present disclosure;

fig. 14 is a schematic flowchart of an image processing method according to an embodiment of the disclosure;

FIG. 15 is a schematic diagram of a program product provided by an embodiment of the present disclosure;

fig. 16 is a schematic structural diagram of a training apparatus for an image classification model according to an embodiment of the present disclosure;

fig. 17 is a schematic structural diagram of an image processing apparatus according to an embodiment of the present disclosure;

FIG. 18 is a schematic structural diagram of a computing device provided by an embodiment of the present disclosure;

in the drawings, the same or corresponding reference numerals indicate the same or corresponding parts.

Detailed Description

The principles and spirit of the present disclosure will be described with reference to a number of exemplary embodiments. It is understood that these embodiments are given solely for the purpose of enabling those skilled in the art to better understand and to practice the present disclosure, and are not intended to limit the scope of the present disclosure in any way. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

As will be appreciated by one skilled in the art, embodiments of the present disclosure may be embodied as a system, apparatus, device, method, or computer program product. Accordingly, the present disclosure may be embodied in the form of: entirely hardware, entirely software (including firmware, resident software, micro-code, etc.), or a combination of hardware and software.

According to an embodiment of the disclosure, a method, medium, apparatus, and computing device for training an image classification model and image processing are provided.

In this document, it is to be understood that any number of elements in the figures are provided by way of illustration and not limitation, and any nomenclature is used for differentiation only and not in any limiting sense.

The basic concepts to which the disclosure relates will first be described.

Web crawlers: according to a certain rule, the program or script of the world wide web information is automatically captured.

Deep learning image classification: the image processing method is an image processing method for distinguishing different categories of targets according to different characteristics reflected in image information, and utilizes a computer to carry out quantitative analysis on an image, and classifies each pixel or area in the image or the image into one of a plurality of categories to replace the visual interpretation of a human.

Active learning: and automatically selecting a sample with high-value information from the mass data according to a set query strategy, and providing the sample with the high-value information for a human expert to label.

Semi-supervised learning: also known as semi-supervised training process, is a method for deep learning model training using both large amounts of unlabeled data and labeled data.

And (3) supervised learning: also known as supervised training process, is a method for deep learning model training using large amounts of labeled data.

Unsupervised learning: the method is a process of processing unlabeled data through a deep learning model and outputting a corresponding result.

swin-transformer: a transform-based image classification neural network is used for carrying out feature extraction on an image.

The principles and spirit of the present disclosure are explained in detail below with reference to several representative embodiments of the present disclosure.

Summary of The Invention

Image classification is an image processing process that classifies images into different categories based on their characteristics. At present, image classification is usually realized by an image classification model, which is a deep learning model, and when an image needs to be classified, the image can be input into the image classification model, and the image classification model processes the image, that is, the classification of the image can be output.

Before images are classified through the image classification model, the image classification model needs to be trained, and corresponding training samples are needed for model training. For example, fig. 1 illustrates a training method of an image classification model, and as shown in fig. 1, images need to be crawled by a web crawler, or images in a database are used as sample images, and then the sample images are manually labeled by a large number of labeling personnel to obtain classes of sample image labels, thereby forming a training set. After the training set is obtained, the image classification model can be trained through the training set, and finally the image classification model with the image classification capability is obtained.

With the continuous development of the image classification technology, the neural network has deeper and deeper structure and higher precision, and meanwhile, the requirement of the training process of the image classification model on the training sample is more and more huge, and the scale of the training sample can directly influence the accuracy of the image classification model.

The inventor finds that the current training of the image classification model adopts a supervised learning method, namely, a completely labeled image is used as a training sample to train the image classification model. Due to the fact that the scale of training samples needed in the training of the image classification model is huge, a large number of labeling personnel are needed to execute the labeling process, and the labor cost is high.

Having described the general principles of the present disclosure, various non-limiting embodiments of the present disclosure are described in detail below.

Application scene overview

An application scenario in which the embodiments of the present disclosure are applicable is first described with reference to fig. 2.

Fig. 2 is a schematic view of an application scenario provided by the embodiment of the present disclosure, as shown in fig. 2, including a client 21 and a server 22, where the client 21 and the server 22 are connected by a wired or wireless connection.

The client 21 may send the image 23 to the server 22 and the image 23 is then processed by the server 22. Where the image 23 may be a sample image for model training, the server 22 performs the training process of the image classification model at this time. Wherein the image 23 may also be an image to be classified, then the server 22 performs an image classification process at this time.

The image classification model has the capability of image classification by training the image classification model, so that the images can be classified according to the image classification model. The image classification method can be applied to various scenes after classifying the images according to different classification types, such as image identification, commodity identification, personalized recommendation of the images and the like.

It should be noted that the client 21 and the server 22 may be two independent devices, or may be two different components integrated in the same device, and in the example of fig. 2, the client 21 and the server 22 are described as two independent devices.

Although the execution subject is the server 22 in the scenario illustrated in fig. 2, the execution subject of each embodiment in the present disclosure may be, for example, a device with a data processing function, such as a server, a processor, a microprocessor, a chip, and the like, and for example, the execution subject may also be the client 21. The specific execution subject of each embodiment in the present disclosure is not limited, and may be selected and set according to actual needs, and any device having a data processing function may be used as the execution subject of each embodiment in the present disclosure. Further, an execution subject of the training method for executing the image classification model and an execution subject of the image processing method may be the same or different.

It should be noted that fig. 2 is only an example of an application scenario applicable to the embodiment of the present disclosure, and does not constitute a limitation to the application scenario.

Exemplary method

In connection with the application scenario of fig. 2, a training method of an image classification model according to an exemplary embodiment of the present disclosure is described with reference to fig. 3. It should be noted that the above application scenarios are only illustrated for the convenience of understanding the spirit and principles of the present disclosure, and the embodiments of the present disclosure are not limited in any way in this respect. Rather, embodiments of the present disclosure may be applied to any scenario where applicable.

Fig. 3 is a schematic flowchart of a training method of an image classification model according to an embodiment of the present disclosure, and as shown in fig. 3, the method may include:

and S31, processing the initial first sample set and the initial second sample set according to the first model, and processing the initial second sample set according to the second model to obtain the first sample set after the 1 st update and the second sample set after the 1 st update.

The initial first sample set and the initial second sample set are sample sets used for image classification model training, wherein the initial first sample set comprises first sample images, the first sample images are marked with corresponding first class labels, and the first class labels are used for indicating classes to which the first sample images belong. For example, if the model is used to train the gender of the person in the image, the first category label is male or female, where male indicates that the gender of the person in the corresponding first sample image is male, and female indicates that the gender of the person in the corresponding first sample image is female.

In some embodiments, the first category labels may be represented by specific numerical values. For example, the gender of the person in the image is taken as an example, the first category label may be a number 0 or 1, where the number 0 represents a male and the number 1 represents a female, or the number 0 represents a female and the number 1 represents a male, and so on.

The initial second sample set includes a second sample image, and the second sample image is unlabeled, that is, the second sample image in the initial second sample set is an unlabeled image.

After the initial first sample set and the initial second sample set are obtained, the initial first sample set and the initial second sample set can be processed according to the first model, the initial second sample set is processed according to the second model, a certain number of second sample images are selected from the initial second sample set according to a processing result output by the first model and a processing result output by the second model, then after the selected second sample images are labeled by a labeling person, the initial first sample set and the initial second sample set are updated according to the labeled second sample images, and a first sample set after 1 time of updating and a second sample set after 1 time of updating are obtained.

And S32, performing at least one model training process on the first model and the second model according to the first sample set after the 1 st update and the second sample set after the 1 st update to obtain a trained first model, wherein the model training process comprises a semi-supervised training process, a supervised training process and an unsupervised training process aiming at the first model and an unsupervised training process aiming at the second model.

The first sample set comprises labeled first sample images, and supervised training processing can be carried out on the first model according to the first sample set, so that the first model has preliminary label prediction capability. The second sample set includes unlabeled second sample images, and the first model and the second model may be unsupervised trained based on the second sample set. Further, the first model may be semi-supervised trained from the first set of samples and the second set of samples. After the first model and the second model are subjected to at least one model training process, the trained first model can be obtained, and the trained first model has the capability of classifying images.

And S33, acquiring an image classification model according to the trained first model.

In the embodiment of the present disclosure, the number of the trained first models is one or more, and any one of the trained first models may perform classification on the image under a certain classification type. For example, a trained first model is used to classify the gender in the image, and the category of the image can be classified as male or female through the trained first model; for example, a trained first model is used to classify the style of the image, and the trained first model can classify the category of the image into cartoon, real person or other styles, etc.

When the number of the trained first models is one, the trained first models can be directly determined as the image classification models. When the number of the trained first models is multiple, the image classification model can be trained according to the multiple trained first models.

The method for training the image classification model provided by the embodiment of the disclosure includes processing an initial first sample set and an initial second sample set according to a first model, processing the initial second sample set according to a second model to obtain a 1 st updated first sample set and a 1 st updated second sample set, performing at least one model training process on the first model and the second model according to the 1 st updated first sample set and the 1 st updated second sample set to obtain a trained first model, and finally obtaining the image classification model according to the trained first model. Because the initial first sample set comprises the first sample images marked with the first class labels, and the initial second sample set comprises the second sample images without labels, namely, the scheme of the embodiment of the disclosure adopts both the sample sets with labels and the sample sets without labels for model training processing, and can complete the training of the image classification model without marking all the sample images in the training of the image classification model, so that the labor cost is low.

On the basis of the above embodiment, the following describes the training process of the image classification model in further detail.

First, the 1 st update of the initial first sample set and the initial second sample set related to S31 in the embodiment of fig. 3 will be described with reference to fig. 4. Fig. 4 is a schematic flowchart of a sample set update provided in the embodiment of the present disclosure, as shown in fig. 4, including:

and S41, initializing the first model for the 1 st time to obtain the initialized first model for the 1 st time.

The model training in the embodiment of the present disclosure involves a first model and a second model, and the structure of the first model and the second model may adopt a conventional image classification neural network, such as a swin-transformer network. The first model and the second model may adopt the same network structure or different network structures.

After the first model and the second model are obtained, the first model may be initialized for the 1 st time to obtain a first model initialized for the 1 st time, and the second model may be initialized to obtain a second initialized model. The initialization is a process of initializing values of parameters in the model, after network structures of the first model and the second model are determined, the structures of the models are unchanged, the first model and the second model further comprise model parameters, and the initialization is a process of initially assigning the model parameters. After initialization, the parameters of the first model initialized at the 1 st time and the parameters of the second model initialized at the 1 st time may be the same or different.

And S42, performing supervised training processing on the first model initialized at the 1 st time according to the initial first sample set to obtain the first model preprocessed at the 1 st time.

The first model initialized at the 1 st time is not trained and does not have the corresponding image classification capability, so that the first model initialized at the 1 st time can be subjected to supervised training processing to obtain the first model preprocessed at the 1 st time, so that the first model preprocessed at the 1 st time has the preliminary label prediction capability.

The supervised training process is implemented based on the labeled sample set, and in the embodiment of the present disclosure, the first model initialized at the 1 st time may be subjected to the supervised training process by using the initial first sample set. Specifically, the initial first sample set includes a first sample image labeled with a corresponding first class label, the first sample image may be input into the first model initialized at 1 st time, the first sample image is processed through the first model initialized at 1 st time to obtain an output classification result, and then a parameter of the first model initialized at 1 st time is adjusted according to the output classification result and the first class label. For any first sample image in the initial first sample set, the supervised training processing can be performed on the first model initialized at the 1 st time in the manner, so that the first model preprocessed at the 1 st time is obtained.

And S43, performing semi-supervised training processing on the first model preprocessed for the 1 st time according to the initial first sample set and the initial second sample set to obtain the first model trained for the 1 st time.

After the 1 st preprocessed first model is obtained, semi-supervised training processing can be performed on the 1 st preprocessed first model according to the initial first sample set and the initial second sample set, and in the semi-supervised training processing process, parameters of the 1 st preprocessed first model can be adjusted along with the training process, so that the 1 st trained first model is finally obtained. The detailed procedure of the semi-supervised training process can be seen from the description of the following embodiments.

And S44, processing the initial second sample set according to the first model trained for the 1 st time and the initialized second model to obtain a first sample set updated for the 1 st time and a second sample set updated for the 1 st time.

After the 1 st trained first model is obtained, the 1 st trained first model is obtained after the 1 st initialized first model is subjected to supervised training processing and semi-supervised training processing, and the 1 st trained first model has certain label prediction capability. And the initialized second model is not subjected to model training processing, and the initialized second model has no label prediction capability.

Then, a plurality of second sample images in the initial second sample set are respectively input into the 1 st trained first model and the initialized second model, so as to obtain a first category vector of each second sample image output by the 1 st trained first model and a second category vector of each second sample image output by the initialized second model.

The plurality of second sample images may be all second sample images in the initial second sample set, or may be partial second sample images in the initial second sample set. For example, the initial second sample set includes r second sample images, which are respectively a second sample image 1, a second sample image 2, and a second sample image r, where r is a positive integer greater than 1. And sequentially inputting the r second sample images to the first model trained at the 1 st time to obtain the first class vectors of the r second sample images output by the first model trained at the 1 st time. And sequentially inputting the r second sample images into the initialized second model to obtain a second class vector of the r second sample images output by the initialized second model.

The number of elements in the first category vector and the second category vector is related to the corresponding classification type. For example, if the classification type is gender, the number of elements in the first class vector and the second class vector may be 2, and the 2 elements represent the probability that the corresponding second sample image is male and female, respectively. Taking the first class vector as (0.1,0.9) as an example, (0.1,0.9) indicates that the probability that the sex corresponding to the second sample image is male is 0.1, and the probability that the sex corresponding to the second sample image is female is 0.9. Taking the second category vector as (0.8,0.2) as an example, (0.8,0.2) indicates that the probability that the gender corresponding to the second sample image is male is 0.8, and the probability that the gender corresponding to the second sample image is female is 0.2. For example, if the classification type is genre and the genre includes true person, cartoon and others, the number of elements in the first category vector and the second category vector may be 3, where the 3 elements represent the probability that the genre of the corresponding second sample image is true person, cartoon and other genres. Taking the first class vector as (0.1,0.9,0) as an example, the probability that the style of the second sample image is true is 0.1, the probability that the style is animation is 0.9, and the probability that the style is other is 0. The meaning represented by each element in the second category vector is the same as that of the first category vector, and is not described herein again.

It should be noted that, for any second sample image, the corresponding first category vector and second category vector may be the same or different. For example, the first class vector is (0.1,0.9) and the second class vector is (0.8,0.2), in which case the first class vector and the second class vector are different. For example, the first class vector is (0.1,0.9,0) and the second class vector is (0.1,0.9,0), and in this case, the first class vector and the second class vector are the same.

After the first category vectors and the second category vectors of the plurality of second sample images are obtained, for any one second sample image, the euclidean distance between the first category vector and the second category vector corresponding to the second sample image can be obtained, then the first L second sample images with the smallest euclidean distance are selected from the plurality of second sample images, and the L second sample images are labeled by a labeling person to obtain the category labels of the L second sample images, wherein L is a positive integer. The value of L may be adjusted as needed, for example, the value of L may be 1/10-1/2 of the sample size included in the initial first sample set.

The first class vector and the second class vector are obtained by processing the second sample image by the first model trained at the 1 st time and the initialized second model respectively, and the initialized second model is equivalent to the initialized first model because the second model and the first model have the same structure, namely the first class vector and the second class vector are equivalent to the second sample image processed by two first models at different time instants (namely the first model trained at the 1 st time and the initialized first model).

The magnitude of the euclidean distance reflects the degree of difference between the first class vector and the second class vector. In the embodiment of the present disclosure, the first L second sample images with the smallest euclidean distance between the corresponding first class vector and the second class vector are selected because the difference degree between the first L second sample images is small, that is, the first L second sample images mainly contain some common features. After the annotation is provided for the annotator, the first L second sample images and the corresponding second class labels are added to the initial first sample set for subsequent model training for the first model, so that the first model can quickly have the recognition capability for common sample images (i.e. images mainly containing some common features).

After the class labels of the L second sample images are obtained, the initial first sample set and the initial second sample set may be updated according to the class labels of the L second sample images. After the L second sample images are labeled, the L second sample images already include corresponding category labels, so that the L second sample images and the corresponding category labels can be added to the initial first sample set to serve as samples in the initial first sample set, update of the initial first sample set is achieved, and the 1 st updated first sample set is obtained. The L second sample images may be deleted from the initial second sample set, so as to update the initial second sample set, and obtain the 1 st updated second sample set.

The 1 st update of the sample set process can also be understood in conjunction with fig. 5. Fig. 5 is a schematic diagram of a 1 st update process of a sample set according to an embodiment of the disclosure, as shown in fig. 5, where M _ a represents a first model, M _ b represents a second model, and an initial first sample set is X ₀ The initial second sample set is Y ₀ 。

Initializing M _ a for the 1 st time to obtain M _ a ₁ ，M_a ₁ I.e. the first model of the 1 st initialization. After M _ b is initialized, M _ b is obtained _t1 ，M_b _t1 I.e. the initialized second model.

Then through the initial first sample set X ₀ For M _ a ₁ Performing supervised training to obtain M _ a ₁ ’，M_a ₁ ' is the first model of the 1 st preprocessing. According to an initial first sample set X ₀ And an initial second sample set Y ₀ For M _ a ₁ ' semi-supervised training process is performed to obtain M _ a _t1 ，M_a _t1 I.e. the first model of the 1 st training.

According to M _ a _t1 And M _ b _t1 Respectively for the initial second sample set Y ₀ Processing to obtain first class vectors f ₀₁ And a second class vector f ₀₂ Then according to f ₀₁ And f ₀₂ Selecting L second sample images with the minimum Euclidean distance for labeling personnel, thereby obtaining a first sample set X after the 1 st update ₁ And the 1 st updated second sample set Y ₁ 。

The 1 st updating process of the initial first sample set and the initial second sample set is described in detail in the foregoing embodiment with reference to fig. 4 and fig. 5, and the following describes the model training process related to S32 in the embodiment of fig. 3.

Fig. 6 is a schematic flow chart of a model training process provided in the embodiment of the present disclosure, as shown in fig. 6, including:

and S61, copying the parameters of the first model trained for the (i-1) th time to the (i-1) th second model to obtain the (i) th second model.

i is an integer greater than or equal to 2, i is initially 2, and the 1 st second model is an initialized second model. The parameter replication can be realized by a parameter replication related function, which can be an own parameter replication function in a module framework, such as an own parameter replication function in a swin-transformer network. In an implementation manner, if the first model and the second model have the same structure, the parameters of the first model trained for the (i-1) th time are copied to the (i-1) th second model, and after the ith second model is obtained, the ith second model is the first model trained for the (i-1) th time.

Since the first model and the second model have the same structure, the first model trained at the i-1 th time is recorded by the ith second model after the parameter replication operation (the ith second model has the same structure and the same parameters as the first model trained at the i-1 th time). Therefore, in the next round of model training, the unsupervised training processing is performed on the ith trained first model and the ith second model according to the ith-1 th updated second sample set, which is equivalent to the unsupervised training processing is performed on the ith trained first model and the ith-1 st trained first model according to the ith-1 th updated second sample set. K second sample images are selected from the second sample set after the ith-1 time of updating through the first model (namely the first model trained for the ith time and the first model trained for the (i-1) th time of adjacent model training) for two times of adjacent model training for marking by the marking personnel, the updating of the sample set is realized according to the marked class label, and then the next iterative training of the first model is realized according to the updated sample set.

And S62, performing model training processing of the ith time on the first model and the ith second model according to the first sample set after the ith-1 time of updating and the second sample set after the ith-1 time of updating to obtain the first model trained of the ith time, the first sample set after the ith time of updating and the second sample set after the ith time of updating.

First, the first model is initialized for the ith time to obtain the first model initialized for the ith time, and the initialization process may refer to the related description of the embodiment in fig. 4, which is not described herein again.

The first model initialized at the ith time does not have the capability of corresponding image classification because the first model initialized at the ith time is not trained, so that the first model initialized at the ith time can be subjected to supervised training processing to obtain the first model preprocessed at the ith time, so that the first model preprocessed at the ith time has preliminary label prediction capability.

The supervised training process is realized based on the labeled sample set, and in the embodiment of the disclosure, the supervised training process may be performed on the first model initialized at the ith time by using the first sample set updated at the (i-1) th time. Specifically, the first sample set after the i-1 th update includes a first sample image labeled with a corresponding first class label, the first sample image may be input into the first model initialized at the i th time, the first sample image is processed through the first model initialized at the i th time to obtain an output classification result, and then parameters of the first model initialized at the i th time are adjusted according to the output classification result and the first class label. For any first sample image in the first sample set after the ith-1 time of updating, the supervised training processing can be carried out on the first model initialized at the ith time by adopting the mode, and the first model preprocessed at the ith time is obtained.

And then, performing semi-supervised training processing on the ith preprocessed first model according to the ith-1 updated first sample set and the ith-1 updated second sample set to obtain the ith trained first model. In the semi-supervised training process, the parameters of the first model preprocessed in the ith time are adjusted along with the training process, and finally the first model trained in the ith time is obtained. The detailed procedure of the semi-supervised training process can be seen from the description of the following embodiments.

Furthermore, the unsupervised training processing can be performed on the first model trained at the ith time and the ith second model according to the updated second sample set at the ith-1 st time, so that the updated first sample set at the ith time and the updated second sample set at the ith time are obtained.

After the ith trained first model is obtained, the ith trained first model is obtained after the ith initialized first model is subjected to supervised training processing and semi-supervised training processing, the ith trained first model has certain label prediction capability, and the ith second model is equivalent to the ith-1 trained first model.

And respectively inputting a plurality of second sample images in the second sample set after the ith-1 time of updating into the first model and the ith second model of the ith time of training to obtain a first class vector of each second sample image output by the first model of the ith time of training and a second class vector of each second sample image output by the ith second model.

The plurality of second sample images may be all second sample images in the second sample set after the i-1 th update, or may be partial second sample images in the second sample set after the i-1 th update. For example, the second sample set after the i-1 th update includes x second sample images, which are respectively the second sample image 1, the second sample image 2, ·, and the second sample image x, where x is a positive integer greater than 1. And sequentially inputting the x second sample images to the ith training first model to obtain the first class vectors of the x second sample images output by the ith training first model. And sequentially inputting the x second sample images into the ith second model to obtain a second class vector of the x second sample images output by the ith second model.

After the first category vectors and the second category vectors of a plurality of second sample images are obtained, for any one second sample image, the euclidean distance between the first category vector and the second category vector corresponding to the second sample image can be obtained, then the first K second sample images with the largest euclidean distance are selected from the plurality of second sample images, and the K second sample images are labeled by a labeling person to obtain the category labels of the K second sample images, wherein K is a positive integer.

In the later stage of model training, the discrimination capability of the model for some common features is high, and the discrimination for some features which are difficult to distinguish is difficult, so that some difficult samples are required to be added into a sample set for model training, and the difficult samples are images with large 'information content'. For a sample which is difficult to output, the fluctuation of the model output features is large, so that the difficulty level of the second sample image can be judged by the difference of the features (i.e. the first category vector and the second category vector) output between the first model and the second model, which is reflected in the embodiment of the present disclosure that the euclidean distance between the corresponding first category vector and the second category vector is large. Therefore, after the first K second sample images with the largest Euclidean distance are selected, the first K second sample images are labeled by a labeling person, the first K second sample images with the large information content can be added to the first sample set to be used for model training, the accuracy of distinguishing features which are difficult to distinguish by the model is improved, and the precision of the model is improved.

After the class labels of the K second sample images are obtained, the first sample set after the i-1 th update and the second sample set after the i-1 st update may be updated according to the class labels of the K second sample images. After the K second sample images are labeled, the K second sample images already include corresponding category labels, so that the K second sample images and the corresponding category labels can be added to the first sample set after the i-1 th update to serve as samples in the first sample set after the i-1 th update, the first sample set after the i-1 th update is updated, and the first sample set after the i-1 th update is obtained. The K second sample images may be deleted from the second sample set updated for the (i-1) th time, so as to update the second sample set updated for the (i-1) th time, and obtain the second sample set updated for the (i) th time.

S63, responding to the fact that the first model trained for the ith time does not meet the first training termination condition, and executing model training processing for the (i + 1) th time; and in response to the ith training model meeting the first training termination condition, determining the ith training model as a training-finished first model.

S61 and S62 are processes for executing the ith model training process, and the ith trained first model, the ith updated first sample set, and the ith updated second sample set can be obtained through the ith model training process. Then, whether the first model trained for the ith time meets a first training termination condition is judged, and different operations are executed according to the judgment result.

The first training termination condition may be that the classification accuracy of the first model trained for the ith time meets a requirement, and the first training termination condition may also be that the number of the second sample images in the second sample set updated for the ith time is less than or equal to a preset value, for example, the number of the second sample images in the second sample set updated for the ith time is 0, the number of the second sample images in the second sample set updated for the ith time is less than or equal to 5, and so on.

When the first training termination condition is not met, model training processing of the (i + 1) th time can be executed, that is, parameters of the first model trained for the ith time are copied to the ith second model to obtain the (i + 1) th second model, and model training processing of the (i + 1) th time is executed on the first model and the (i + 1) th second model according to the first sample set updated for the ith time and the second sample set updated for the ith time. The specific process of the model training process can be described in relation to S61 and S62, and will not be described herein. And when the first training termination condition is met, determining the first model trained for the ith time as the trained first model.

The process of the model training process may also be understood in conjunction with FIG. 7. Fig. 7 is a schematic diagram of a model training process according to an embodiment of the disclosure, as shown in fig. 7, where M _ a represents a first model, M _ b represents a second model, and a first sample set after the ith update is X _i And the second sample set after the ith update is Y _i The ith second model is M _ b _ti 。

Will M _ a _t1 Copy to M _ b _t1 To obtain M _ b _t2 And according to X ₁ And Y ₁ For M _ a and M _ b _t2 Performing model training processing of the 2 nd time to obtain M _ a _t2 、X ₂ And Y ₂ (ii) a Will M _ a _t2 Copy to M _ b _t2 To obtain M _ b _t3 And according to X ₂ And Y ₂ For M _ a and M _ b _t3 Performing model training processing for the 3 rd time to obtain M _ a _t3 、X ₃ And Y ₃ (ii) a ...; will M _ a _ti Copy to M _ b _ti To obtain M _ b _t(i+1) And according to X _i And Y _i For M _ a and M _ b _t(i+1) Performing model training processing of the (i + 1) th time to obtain M _ a _t(i+1) 、X _i+1 And Y _i+1 And so on.

For any ith model training process, as shown in fig. 7, after the ith initialization is performed on M _ a, M _ a is obtained _i ，M_a _i I.e. the first model of the i-th initialization.

Then passes through the first sample set X after the i-1 th updating _i-1 For M _ a _i Performing supervised training to obtain M _ a _i ’，M_a _i ' is the first model of the ith preprocessing. According to X _i-1 And the second sample set Y after the i-1 th update _i-1 For M _ a _i ' performing semi-supervised training to obtain M _ a _ti ，M_a _ti I.e. the first model of the i-th training.

According to M _ a _ti And M _ b _ti Are respectively aligned with Y _i-1 Processing to obtain first class vectors f _(i-1)1 And a second class vector f _(i-1)2 Then according to f _(i-1)1 And f _(i-1)2 Selecting K second sample images with the minimum Euclidean distance for labeling personnel, thereby obtaining the first sample set X after the ith update _i And the second sample set Y after the ith update _i 。

In the above embodiment, the 1 st updating process of the initial first sample set and the initial second sample set, and the process of the model training process of the first model and the second model are described, the semi-supervised training process for the first model is involved in both the 1 st updating process of the sample set and the process of the model training process, and the process of the semi-supervised training process will be described below.

For the ith pre-processed first model, at least one semi-supervised training process may be performed on the ith pre-processed first model according to the first sample set after the i-1 th update and the second sample set after the i-1 th update, so as to obtain the ith trained first model, where please refer to the description of the embodiment in fig. 8 for a process of any one semi-supervised training process.

Fig. 8 is a schematic flow chart of a one-time semi-supervised training processing procedure provided in the embodiment of the present disclosure, as shown in fig. 8, including:

and S81, determining M x n first sample images in the first sample set after the i-1 time of updating, and determining M x (1-n) second sample images in the second sample set after the i-1 time of updating, wherein M is a positive integer, and n is a preset value which is more than 0 and less than 1.

In any iteration training process, M sample images are required to be sent to perform neural network random gradient descent, where the M sample images include M × n first sample images and M × 1-n second sample images, M is a positive integer, n is a preset value greater than 0 and less than 1, and may be set to 3/4, 4/5, and so on.

Since the number of the first sample images and the second sample images is an integer, when the value of M × n or M × 1-n is a non-integer, M × n and M × 1-n may be rounded up or down, the value rounded up to M × n is used as the number of the first sample images determined in the first sample set after the i-1 th update, and the value rounded up to M (1-n) is used as the number of the second sample images determined in the second sample set after the i-1 th update.

S82, a third sample set is obtained according to the M × 1-n second sample images, and the third sample set includes a plurality of third sample images and second category labels of the plurality of third sample images.

Specifically, for any one of the M × 1-n second sample images, the first-type image transformation processing is performed on the second sample image to obtain a plurality of first transformed images corresponding to the second sample image. The first-class image transformation processing comprises at least one of contrast transformation, brightness transformation, color transformation, image cutting, histogram equalization, image sharpening and image cutting, and the first-class image transformation is used for carrying out strong enhancement processing on the second sample image.

And then, carrying out second-class image transformation processing on the second sample image to obtain a second transformation image corresponding to the second sample image. And the second type of image transformation comprises at least one of image scaling and image rotation, and is used for performing weak enhancement processing on the second sample image.

For the second sample image, strong enhancement processing and weak enhancement processing are respectively carried out on the second sample image, and the obtained first conversion image and the second conversion image are different, namely certain interference information is added on the second sample image. However, since the first converted image and the second converted image are obtained by performing different image conversion processes on the same second sample image, the probability of each class corresponding to the second converted image is used as the second class label of the corresponding first converted image, and the consistency regularization principle that the corresponding second class labels are the same for the same class of input (including a plurality of first converted images corresponding to the same second converted image) can be satisfied, so that the robustness of the model for image recognition can be improved.

And then, inputting the second transformation image into the first model after the k-1 th semi-supervised training to obtain the probability of each class corresponding to the second transformation image, thereby obtaining a plurality of third sample images and second class labels of the plurality of third sample images according to the probability of each class corresponding to the second transformation image.

This process can be understood, for example, in conjunction with fig. 9 and 10. Fig. 9 is a schematic diagram of a first example of obtaining a third sample set according to the embodiment of the disclosure, and fig. 10 is a schematic diagram of a second example of obtaining a third sample set according to the embodiment of the disclosure.

As shown in fig. 9, the first-type image conversion processing is performed on the second sample image to obtain a first converted image, and the second-type image conversion processing is performed on the second sample image to obtain a second converted image. And then processing the second transformed image through the first model after the k-1 th semi-supervised training to obtain a corresponding pseudo label, thereby generating a third sample set according to the first transformed image and the pseudo label.

The process of generating the third sample set from the first transformed image and the pseudo label can be understood with reference to fig. 10. As shown in fig. 10, the first-type image conversion processing is performed on the second sample image 100, and 5 first converted images, which are a first converted image 101, a first converted image 102, a first converted image 103, a first converted image 104, and a first converted image 105, are obtained. The second sample image 100 is subjected to a second-type image transformation process to obtain 1 second transformed image, i.e., a second transformed image 106.

And then inputting the second transformation image 106 into the first model after the k-1 th semi-supervised training to obtain the probability of each category corresponding to the second transformation image 106. FIG. 10 illustrates the gender partitioning of images, where the first model output after the k-1 semi-supervised training has a probability of 0.98 for gender being male and 0.02 for gender being female.

Since 0.98 is greater than or equal to the preset value, the second transformed image 106 is an image with a target category, the second transformed image 106 is a target transformed image, and the target category is male. In this case, five sets of training samples, namely, the first converted image 101-the second type label (male), the first converted image 102-the second type label (male), the first converted image 103-the second type label (male), the first converted image 104-the second type label (male) and the first converted image 105-the second type label (male) can be obtained by using the plurality of first converted images (namely, the first converted image 101, the first converted image 102, the first converted image 104 and the first converted image 105) corresponding to the second converted image 106 as the third sample image and the target type male as the second type label (namely, the pseudo label) of the third sample image.

And obtaining a plurality of groups of training samples in the third sample sets according to the second sample images, wherein the training samples in each third sample set comprise the first transformed images and the corresponding second class labels, so that the sample set scale for model training is effectively expanded. Meanwhile, because a plurality of first transformation images obtained according to the same second sample image are not completely the same and the second class labels of the plurality of first transformation images are the same, the robustness of the image identification corresponding to the model can be improved.

And S83, performing cross entropy loss function training on the first model after the k-1 th semi-supervised training according to the M x n first sample images, the first class labels of the M x n first sample images, the plurality of third sample images and the second class labels of the plurality of third sample images to obtain the first model after the k-th semi-supervised training.

Specifically, for any sample image in the M × n first sample images and the third sample images, the sample image is input to the first model after the k-1 th semi-supervised training, and an output sample classification result is obtained, where the sample classification result includes probabilities corresponding to the categories.

Then, a sample classification result and a loss value of a corresponding class label are obtained, and the corresponding class label is a first class label or a second class label. And adjusting parameters of the first model after the kth-1 th semi-supervised training according to the sample classification result and the loss value of the corresponding class label to obtain the first model after the kth semi-supervised training.

The embodiment of fig. 8-10 describes that the first model after the k-1 th semi-supervised training is subjected to semi-supervised training processing according to the first sample set after the i-1 th updating and the second sample set after the i-1 th updating to obtain the first model after the k-1 th semi-supervised training, wherein k is an integer greater than or equal to 1, and k is initially 1. And performing the kth semi-supervised processing on the first model after the kth semi-supervised training, so as to obtain the first model after the 1 st semi-supervised training, and performing the 2 nd semi-supervised processing on the first model after the 1 st semi-supervised training, so as to obtain the first model after the 2 nd semi-supervised training.

When the first model after the kth semi-supervised training does not meet the second training termination condition, performing a (k + 1) th semi-supervised training process on the first model after the kth semi-supervised training to obtain the first model after the (k + 1) th semi-supervised training. And when the first model after the k-th semi-supervised training meets a second training termination condition, determining the first model after the k-th semi-supervised training as the first model after the i-th training. And the second training termination condition is the convergence of the first model after the kth semi-supervised training.

The process of obtaining the trained first model is detailed in the embodiments of fig. 3-10, and is briefly summarized below with reference to fig. 11.

Fig. 11 is a schematic flowchart of a first model training process provided in the embodiment of the present disclosure, as shown in fig. 11, including:

and S111, initializing the first model for the 1 st time to obtain the initialized first model for the 1 st time, and initializing the second model to obtain the initialized second model.

S112, performing supervised training processing on the initialized first model for the 1 st time to obtain a preprocessed first model for the 1 st time, and performing semi-supervised training processing on the preprocessed first model for the 1 st time to obtain a trained first model for the 1 st time.

S113, performing unsupervised training processing on the first model trained at the 1 st time and the initialized second model, and performing 1 st updating of the first sample set and the second sample set according to the result of the unsupervised training processing.

Processing a plurality of second sample images in the initial second sample set according to the first model trained at the 1 st time and the initialized second model to obtain first class vectors and second class vectors of the plurality of second sample images, selecting the first L second sample images with the minimum Euclidean distance from the plurality of second sample images according to the first class vectors and the second class vectors, and labeling the L second sample images by a labeling person, so that the initial first sample set and the initial second sample set are updated according to the labeled L second sample images.

Wherein S111-S113 are steps performed once, i.e., without iteration.

S114, copying the parameters of the first model trained at the ith time to the (i-1) th second model to obtain the ith second model.

And S115, performing ith initialization on the first model to obtain the ith initialized first model.

S116, performing supervised training processing on the first model initialized at the ith time to obtain a first model preprocessed at the ith time, and performing semi-supervised training processing on the first model preprocessed at the ith time to obtain a first model trained at the ith time.

And S117, performing unsupervised training processing on the first model and the ith second model which are trained for the ith time, and updating the ith time of the first sample set and the second sample set according to the result of the unsupervised training processing.

Processing a plurality of second sample images in the initial second sample set according to the first model trained at the ith time and the ith second model to obtain first class vectors and second class vectors of the plurality of second sample images, selecting the first K second sample images with the largest Euclidean distance from the plurality of second sample images according to the first class vectors and the second class vectors, and labeling the K second sample images by a labeling person, so that the first sample set after the updating at the i-1 time and the second sample set after the updating at the i-1 time are updated according to the K second sample images after the labeling.

S114-S117 exemplify the procedure of the i-th model training process, i is initially 2, and i is an integer greater than or equal to 2. It should be noted that, for any ith model training process, after the ith trained first model is obtained, it is necessary to determine whether the ith trained first model meets the first training termination condition, and if not, the (i + 1) th model training process needs to be executed, and specific steps may be referred to as S114-S117. And if so, the first model trained at the ith time is the trained first model. That is, S114-S117 are steps performed for multiple iterations.

A trained first model may classify images under corresponding classification types, but in some cases it may be desirable to classify images under different classification types, which may include, for example, gender, style, and so forth. If a first model is trained for each classification type, when images to be processed are classified through each first model, each additional classification type needs to be added with a first model corresponding to the classification type. In the model application, the image needs to be input into the first model corresponding to each classification type, and the operation is complex. However, if an image classification model under a multi-classification type is directly trained, the class of each classification type needs to be labeled on the trained sample image, and the operation is also complicated. Based on this, the embodiment of the present disclosure provides a scheme of model merging to reduce inference time.

Fig. 12 is a schematic flowchart of a process of training a merging model according to an embodiment of the present disclosure, as shown in fig. 12, including:

and S121, inputting the fourth sample image to each trained first model to obtain a first classification result of the fourth sample image under each classification type, wherein the first classification result comprises first probabilities of all classes under the corresponding classification type.

The fourth sample image may include sample images in the first sample set and the second sample set, or may be any other sample images. And each trained first model can classify the image under the corresponding classification type, and the fourth sample image is input into each trained first model, so that a first classification result of the fourth sample image under each classification type can be obtained.

The process of model merging may be understood, for example, in connection with fig. 13. Fig. 13 is a schematic diagram of a network structure of a merged model provided in the embodiment of the present disclosure, and as shown in fig. 13, the merged model includes 2 first models, which are a model M1 and a model M2 after training. The classification type corresponding to the model M1 is gender, and the classification type of gender includes 2 categories, namely "male" and "female" respectively; the model M2 corresponds to a genre of classification type, which includes 3 categories, i.e., "cartoon", "real person", and "others", respectively.

The fourth sample image is included in the training set and then input to model M1 and model M2, respectively. The first classification result output by the model M1 is (0.99,0.01), which means that the model M1 determines that the fourth sample image has a probability of 0.99 for belonging to the category "male" and a probability of 0.01 for belonging to the category "female". The first classification result output by the model M2 is (0.99,0.01,0), which means that the model M2 determines that the probability that the fourth sample image belongs to the category "animation" is 0.99, the probability that the fourth sample image belongs to the category "real person" is 0.01, and the probability that the fourth sample image belongs to the category "other" is 0.

And S122, inputting the fourth sample image into the image classification model to obtain a second classification result output by the image classification model under each classification type, wherein the second classification result comprises a second probability of each classification under the corresponding classification type.

For example, in fig. 13, the fourth sample image is input to the image classification model, and the second classification result is (0.3,0.7,0.8,0.1,0.1), meaning that the image classification model determines that the fourth sample image has a probability of 0.3 for belonging to the category "male", a probability of 0.7 for belonging to the category "female", a probability of 0., "8 for belonging to the category" animation ", a probability of 0.1 for belonging to the category" real person ", and a probability of 0.1 for belonging to the category" other ".

And S123, obtaining loss values of the first probabilities and the corresponding second probabilities.

In the classification model training, the goal of the model is to map the input features to one point of the output, so that the obtained result is one hot encoded class information, and therefore, the output of the trained first model is similar to one hot encoding, that is, the probability of a certain class under the classification type is close to 1, and the probability of other classes is close to 0, so that the provided supervision information is very limited.

Therefore, the embodiment of the disclosure provides a softmax function with a preset temperature coefficient T as a loss function to calculate loss values of the first probability and the second probability, so that the trained first model provides more supervision signals, and the image classification model learns the output characteristics of the trained first model better.

Specifically, a corresponding first normalized probability value may be obtained according to a preset temperature coefficient and the first probability, a corresponding second normalized probability value may be obtained according to the preset temperature coefficient and the second probability, and then loss values of the first probability and the second probability may be obtained according to the first normalized probability value and the second normalized probability value.

The first normalized probability value or the second normalized probability value can be calculated in the following formula (1):

wherein q is _i Is the first normalized probability value or the second normalized probability value, z _i T is a preset temperature coefficient.

Taking the first classification result (0.99,0.01) output by the model M1 in fig. 13 as an example, if it is required to calculate the first normalized probability value corresponding to the first probability 0.99, then:

taking the second classification result (0.3,0.7,0.8,0.1,0.1) output by the image classification model in fig. 13 as an example, where the first probability 0.99 corresponds to a second probability of 0.3 in the second classification result, then the corresponding second normalized probability value is:

q _i1 and q is _i2 The difference between the first probability 0.9 and the second probability 0.3 is the loss value.

Attention to the negative label can be amplified through the preset temperature coefficient T, so that the output probability (namely the first normalized probability value corresponding to the first probability or the second normalized probability value corresponding to the second probability) is smoother, and the image classification model can be guaranteed to learn more features.

And S124, adjusting parameters of the image classification model according to the loss values of the first probabilities and the corresponding second probabilities to obtain the trained image classification model.

In S123, a scheme of calculating loss values of the first probability and the second probability is illustrated, and parameters of the image classification model may be adjusted according to the loss values of each first probability and each corresponding second probability. And aiming at any fourth sample image, the parameters of the image classification model can be adjusted by adopting the scheme, and finally the trained image classification model is obtained.

Aiming at the multi-classification scene of the image, compared with the scheme that the image classification model can be trained only by carrying out class marking on the sample image under each classification type at present, the scheme provided by the embodiment of the disclosure does not need to carry out the class marking on the sample image under each classification type, but only needs to carry out the class marking under a certain classification type, so that the classification data marking requirement is reduced, and meanwhile, the model merging strategy is adopted to integrate the functions of the first model under a plurality of classification types, so that the reasoning cost is saved.

After the training of the image classification model is completed, the image classification model can be used for classifying the image.

Fig. 14 is a schematic flowchart of an image processing method according to an embodiment of the present disclosure, and as shown in fig. 14, the method may include:

and S141, acquiring a first image to be processed.

And S142, inputting the first image into the image classification model to obtain the category of the first image under each classification type.

The first image is an image to be classified, and the first image may be input to an image classification model, and the image classification model processes the first image to obtain a category of the first image under each classification type, thereby implementing classification of the first image. The image classification model is a model trained according to the method described in the embodiment of fig. 3-13.

In summary, the present disclosure provides a training method for an image classification model, which trains through a first sample set with a label and a second sample set without a label, and reduces the amount of labeling on the sample set under the same training set scale compared with the training through only using the sample set with a label, thereby reducing the labor cost for labeling. In the process of model training, sample images with high value can be automatically selected for labeling personnel, a scheme for model combination is provided for integrating the functions of a plurality of trained first models, the workload of labeling of multi-label classification is reduced, and the reasoning cost is saved.

Exemplary Medium

Having described the method of the exemplary embodiment of the present disclosure, next, a storage medium of the exemplary embodiment of the present disclosure will be described with reference to fig. 15.

Fig. 15 is a schematic diagram of a program product provided by an embodiment of the present disclosure, and referring to fig. 15, a program product 150 for implementing the above method according to an embodiment of the present disclosure is described, which may employ a portable compact disc read only memory (CD-ROM) and include program codes, and may be run on a terminal device, such as a personal computer. However, the program product of the present disclosure is not limited thereto.

The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

A readable signal medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. The readable signal medium may also be any readable medium other than a readable storage medium.

Program code for carrying out operations of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user computing device, partly on the user device, partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN).

Exemplary devices

After the media of the exemplary embodiment of the present disclosure are introduced, next, a training apparatus and an image processing apparatus of an image classification model of the exemplary embodiment of the present disclosure are described with reference to fig. 16 and fig. 17, for implementing the method in any of the above method embodiments, which have similar implementation principles and technical effects, and are not repeated herein.

Fig. 16 is a schematic structural diagram of a training apparatus for an image classification model according to an embodiment of the present disclosure, as shown in fig. 16, including:

an updating module 161, configured to process the initial first sample set and the initial second sample set according to the first model, and process the initial second sample set according to the second model to obtain a first sample set after 1 st update and a second sample set after 1 st update;

a training module 162, configured to perform at least one model training process on the first model and the second model according to the 1 st updated first sample set and the 1 st updated second sample set to obtain a trained first model, where the model training process includes a semi-supervised training process, a supervised training process, and an unsupervised training process for the first model, and an unsupervised training process for the second model;

the processing module 163 is configured to obtain an image classification model according to the trained first model, where the initial first sample set includes first sample images, the first sample images are labeled with corresponding first class labels, the initial second sample set includes second sample images, and the second sample images are unlabeled.

In a possible implementation manner, the update module 161 is specifically configured to:

initializing the first model for the 1 st time to obtain a first model initialized for the 1 st time;

carrying out supervised training processing on the first model initialized at the 1 st time according to the initial first sample set to obtain a first model preprocessed at the 1 st time;

performing semi-supervised training processing on the 1 st preprocessed first model according to the initial first sample set and the initial second sample set to obtain a 1 st trained first model;

and processing the initial second sample set according to the 1 st trained first model and the initialized second model to obtain the 1 st updated first sample set and the 1 st updated second sample set.

respectively inputting a plurality of second sample images in the initial second sample set into the 1 st trained first model and the initialized second model to obtain a first category vector of each second sample image output by the 1 st trained first model and a second category vector of each second sample image output by the initialized second model;

acquiring category labels of L second sample images from the plurality of second sample images, wherein the L second sample images are the first L second sample images with the smallest Euclidean distance between corresponding first category vectors and second category vectors in the plurality of second sample images, and L is a positive integer;

and updating the initial first sample set and the initial second sample set according to the class labels of the L second sample images to obtain the 1 st updated first sample set and the 1 st updated second sample set.

In a possible implementation, the training module 162 is specifically configured to:

copying the parameters of the first model trained for the (i-1) th time to the (i-1) th second model to obtain the (i) th second model;

performing model training processing of the ith time on the first model and the ith second model according to the first sample set after the ith-1 time of updating and the second sample set after the ith-1 time of updating to obtain a first model after the ith time of training, a first sample set after the ith time of updating and a second sample set after the ith time of updating;

in response to that the ith training first model does not meet a first training termination condition, copying parameters of the ith training first model to an ith second model to obtain an i +1 th second model, and performing i +1 th model training processing on the first model and the i +1 th second model according to the i-th updated first sample set and the i-th updated second sample set;

determining the ith training first model as the training-completed first model in response to the ith training first model satisfying the first training termination condition;

wherein i is an integer greater than or equal to 2, i is initially 2, and the 1 st second model is the initialized second model.

performing initialization processing on the first model for the ith time to obtain an initialized first model for the ith time;

carrying out supervised training processing on the first model initialized at the ith time according to the first sample set updated at the ith-1 time to obtain a first model preprocessed at the ith time;

performing at least one semi-supervised training process on the ith preprocessed first model according to the first sample set after the ith-1 time of updating and the second sample set after the ith-1 time of updating to obtain the ith trained first model;

and performing unsupervised training processing on the ith trained first model and the ith second model according to the ith-1 th updated second sample set to obtain the ith updated first sample set and the ith updated second sample set.

performing semi-supervised training processing on the first model after the k-1 th semi-supervised training according to the first sample set after the i-1 th updating and the second sample set after the i-1 th updating to obtain the first model after the k-1 th semi-supervised training;

in response to that the first model after the kth semi-supervised training does not meet a second training termination condition, performing semi-supervised training processing on the first model after the kth semi-supervised training according to the first sample set after the i-1 th updating and the second sample set after the i-1 th updating to obtain a first model after the kth +1 th semi-supervised training;

in response to the first model after the kth semi-supervised training meeting the second training termination condition, determining the first model after the kth semi-supervised training as the first model of the ith training;

and k is an integer greater than or equal to 1, the k is initially 1, and the first model after the 0 th semi-supervised training is the first model of the ith preprocessing.

determining M x n first sample images in the first sample set after the i-1 time updating, and determining M x (1-n) second sample images in the second sample set after the i-1 time updating, wherein M is a positive integer, and n is a preset value which is more than 0 and less than 1;

obtaining a third sample set according to the M x (1-n) second sample images, wherein the third sample set comprises a plurality of third sample images and second class labels of the third sample images;

and performing cross entropy loss function training on the first model after the k-1 th semi-supervised training according to the M x n first sample images, the first class labels of the M x n first sample images, the plurality of third sample images and the second class labels of the plurality of third sample images to obtain the first model after the k-th semi-supervised training.

performing first-class image transformation processing on any one of the M (1-n) second sample images to obtain a plurality of first transformation images corresponding to the second sample images;

performing second-type image transformation processing on the second sample image to obtain a second transformation image corresponding to the second sample image;

inputting the second transformation image into the first model after the k-1 th semi-supervised training to obtain the probability of each category corresponding to the second transformation image;

and acquiring the plurality of third sample images and second class labels of the plurality of third sample images according to the probability of each class corresponding to the second conversion image.

In a possible implementation, the first type of image transformation includes at least one of contrast transformation, brightness transformation, color transformation, image cropping, histogram equalization, image sharpening, and image cropping, and the first type of image transformation is used for performing strong enhancement processing on the second sample image;

the second type of image transformation comprises at least one of image scaling and image rotation, and the second type of image transformation is used for performing weak enhancement processing on the second sample image.

determining an image with a target class as a target transformation image in each second transformation image, wherein the target class is a class of which the corresponding probability is greater than or equal to a preset value;

determining a plurality of first transformed images corresponding to the target transformed image as the third sample image;

determining the target class as a second class label for the third sample image.

inputting the sample image to the first model after the k-1 th semi-supervised training aiming at any one sample image in the M x n first sample images and the third sample images to obtain an output sample classification result, wherein the sample classification result comprises the probability corresponding to each category;

obtaining the sample classification result and a loss value of a corresponding class label, wherein the corresponding class label is the first class label or the second class label;

and adjusting parameters of the first model after the kth-1-time semi-supervised training according to the sample classification result and the loss value of the corresponding class label to obtain the first model after the kth semi-supervised training.

respectively inputting a plurality of second sample images in the updated second sample set of the (i-1) th time into the first model trained at the ith time and the ith second model to obtain a first class vector of each second sample image output by the first model trained at the ith time and a second class vector of each second sample image output by the ith second model;

acquiring category labels of K second sample images from the plurality of second sample images, wherein the K second sample images are the first K second sample images with the largest Euclidean distance between corresponding first category vectors and second category vectors in the plurality of second sample images, and K is a positive integer;

and updating the first sample set after the i-1 th updating and the second sample set after the i-1 th updating according to the category labels of the K second sample images to obtain the first sample set after the i-th updating and the second sample set after the i-th updating.

In a possible implementation, the processing module 163 is specifically configured to:

inputting a fourth sample image to each trained first model to obtain a first classification result of the fourth sample image under each classification type, wherein the first classification result comprises first probabilities of all classes under the corresponding classification type;

inputting the fourth sample image into the image classification model to obtain a second classification result under each classification type output by the image classification model, wherein the second classification result comprises a second probability of each category under the corresponding classification type;

obtaining loss values of each first probability and each corresponding second probability;

and adjusting parameters of the image classification model according to the first probabilities and the corresponding loss values of the second probabilities to obtain the trained image classification model.

In a possible implementation manner, for any first probability in the first classification result and a second probability corresponding to the first probability, the processing module 163 is specifically configured to:

acquiring a corresponding first normalized probability value according to a preset temperature coefficient and the first probability;

acquiring a corresponding second normalized probability value according to the preset temperature coefficient and the second probability;

and obtaining loss values of the first probability and the second probability according to the first normalized probability value and the second normalized probability value.

The training device for the image classification model provided by the embodiment of the disclosure can be used for executing the technical scheme of the method embodiment, and the implementation principle and the technical effect are similar, and are not repeated here.

Fig. 17 is a schematic structural diagram of an image processing apparatus according to an embodiment of the present disclosure, as shown in fig. 17, including:

an obtaining module 171, configured to obtain a first image to be processed;

the processing module 172 is configured to input the first image into an image classification model, to obtain a category of the first image under each classification type, where the image classification model is a model obtained by training according to the training method of the image classification model in the foregoing embodiment.

The image processing apparatus provided in the embodiment of the present disclosure may be used to implement the technical solutions of the above method embodiments, and the implementation principles and technical effects thereof are similar and will not be described herein again.

Exemplary computing device

Having described the methods, media, and apparatus of the exemplary embodiments of the present disclosure, a computing device of the exemplary embodiments of the present disclosure is described next with reference to fig. 18.

The computing device 180 shown in fig. 18 is only one example and should not impose any limitations on the functionality or scope of use of embodiments of the disclosure.

Fig. 18 is a schematic structural diagram of a computing device provided in the embodiment of the present disclosure, and as shown in fig. 18, the computing device 180 is represented in the form of a general-purpose computing device. Components of computing device 180 may include, but are not limited to: the at least one processing unit 181, the at least one memory unit 182, and a bus 183 that couples various system components including the processing unit 181 and the memory unit 182.

The bus 183 includes a data bus, a control bus, and an address bus.

The storage unit 182 may include readable media in the form of volatile memory, such as Random Access Memory (RAM)1821 and/or cache memory 1822, and may further include readable media in the form of non-volatile memory, such as Read Only Memory (ROM) 1823.

The storage unit 182 may also include a program/utility 1825 having a set (at least one) of program modules 1824, such program modules 1824 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.

Computing device 180 may also communicate with one or more external devices 184 (e.g., keyboard, pointing device, etc.). Such communication may occur via input/output (I/O) interfaces 185. Moreover, computing device 180 may also communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the internet) via network adapter 186. As shown in FIG. 18, network adapter 186 communicates with the other modules of computing device 180 via bus 183. It should be understood that although not shown in the figures, other hardware and/or software modules may be used in conjunction with computing device 180, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.

It should be noted that although in the above detailed description several units/modules or sub-units/modules of the speech recognition apparatus are mentioned, this division is merely exemplary and not mandatory. Indeed, the features and functionality of two or more of the units/modules described above may be embodied in one unit/module, in accordance with embodiments of the present disclosure. Conversely, the features and functions of one unit/module described above may be further divided into embodiments by a plurality of units/modules.

Further, while the operations of the disclosed methods are depicted in the drawings in a particular order, this does not require or imply that these operations must be performed in this particular order, or that all of the illustrated operations must be performed, to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step execution, and/or one step broken down into multiple step executions.

While the spirit and principles of the present disclosure have been described with reference to several particular embodiments, it is to be understood that the present disclosure is not limited to the particular embodiments disclosed, nor is the division of aspects, which is for convenience only as the features in such aspects may not be combined to benefit. The disclosure is intended to cover various modifications and equivalent arrangements included within the spirit and scope of the appended claims.

Claims

1. A training method of an image classification model comprises the following steps:

2. The method of claim 1, wherein processing the initial first sample set and the initial second sample set according to a first model and processing the initial second sample set according to a second model to obtain a 1 st updated first sample set and a 1 st updated second sample set comprises:

and processing the initial second sample set according to the first model trained for the 1 st time and the initialized second model to obtain the first sample set updated for the 1 st time and the second sample set updated for the 1 st time.

3. The method of claim 2, wherein performing at least one model training process on the first model and the second model according to the 1 st updated first sample set and the 1 st updated second sample set to obtain a trained first model, comprises:

4. The method of claim 3, wherein performing an i-th model training process on the first model and the i-th second model according to the i-1 th updated first sample set and the i-1 th updated second sample set to obtain an i-th trained first model, an i-th updated first sample set, and an i-th updated second sample set, comprises:

5. The method of claim 4, wherein performing unsupervised training on the i-th trained first model and the i-th second model according to the i-1-th updated second sample set to obtain the i-th updated first sample set and the i-th updated second sample set comprises:

respectively inputting a plurality of second sample images in the updated second sample set of the (i-1) th time into the first trained model and the ith second model to obtain a first category vector of each second sample image output by the first trained model of the ith time and a second category vector of each second sample image output by the ith second model;

6. An image processing method comprising:

acquiring a first image to be processed;

inputting the first image into an image classification model to obtain the category of the first image under each classification type, wherein the image classification model is a model obtained by training according to any one of claims 1 to 5.

7. An apparatus for training an image classification model, comprising:

8. An image processing apparatus comprising:

the acquisition module is used for acquiring a first image to be processed;

a processing module, configured to input the first image into an image classification model, so as to obtain a category of the first image under each classification type, where the image classification model is a model trained according to any one of claims 1 to 14.

9. A computing device, comprising: at least one processor and memory;

the memory stores computer-executable instructions;

the at least one processor executing the computer-executable instructions stored by the memory causes the at least one processor to perform the method of training an image classification model according to any one of claims 1-5 or causes the at least one processor to perform the method of image processing according to claim 6.

10. A computer-readable storage medium having stored thereon computer-executable instructions which, when executed by a processor, implement a method of training an image classification model according to any one of claims 1 to 5, or an image processing method according to claim 6.