CN116824194A - Training method of image classification model, image processing method and device - Google Patents

Training method of image classification model, image processing method and device Download PDF

Info

Publication number
CN116824194A
CN116824194A CN202210259085.XA CN202210259085A CN116824194A CN 116824194 A CN116824194 A CN 116824194A CN 202210259085 A CN202210259085 A CN 202210259085A CN 116824194 A CN116824194 A CN 116824194A
Authority
CN
China
Prior art keywords
image
prediction
classifier
category
loss
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210259085.XA
Other languages
Chinese (zh)
Inventor
吕永春
朱徽
王洪斌
周迅溢
曾定衡
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Mashang Consumer Finance Co Ltd
Original Assignee
Mashang Consumer Finance Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Mashang Consumer Finance Co Ltd filed Critical Mashang Consumer Finance Co Ltd
Priority to CN202210259085.XA priority Critical patent/CN116824194A/en
Publication of CN116824194A publication Critical patent/CN116824194A/en
Pending legal-status Critical Current

Links

Landscapes

  • Image Analysis (AREA)

Abstract

The application discloses a training method, an image processing method and an image processing device for an image classification model, which are used for realizing training to obtain the image classification model with higher prediction accuracy and accurately predicting image types by utilizing limited tagged images. The training method comprises the following steps: inputting a sample set into a first classifier of an image classification model, outputting a prediction category embedded vector of an image in the sample set, wherein the sample set comprises an unlabeled image and a labeled image; inputting the sample set into a second classifier of the image classification model, and outputting a prediction class code of the image in the sample set; determining total prediction loss of an image classification model based on a prediction type embedded vector and a prediction type code of an image in a sample set and label information of the sample set, wherein the label information comprises a type label of a label image and a label embedded vector, and the type label is a code of real type information of the label image; based on the total predicted loss, respective network parameters of the first classifier and the second classifier are adjusted.

Description

Training method of image classification model, image processing method and device
Technical Field
The present application relates to the field of computer technologies, and in particular, to a method and apparatus for training and image processing of an image classification model.
Background
Semi-supervised learning (Semi-Supervised Learning, SSL) is a key problem in the research of pattern recognition and machine learning fields, and is a learning method combining supervised learning and unsupervised learning. Semi-supervised learning uses a large number of unlabeled exemplars and uses a limited number of labeled exemplars to perform pattern recognition and machine learning tasks.
At present, the class labels of the labeled samples adopted for training the image classification model are usually in a coding format, namely the class to which the input image belongs is uniquely determined, and the quantity of the labeled samples is limited, so that a great amount of useful information is easily lost in the model training process, particularly the similarity and the difference between images of different classes, such as the similarity of the visual characteristics of Husky and wolves, and the difference of the visual characteristics of the wolves and sofas are large, so that the image classification model is easy to fall into an incorrect training direction and can not fully utilize the unlabeled samples, thereby causing the model training effect to be unsatisfactory and affecting the prediction accuracy of the image classification model.
Based on this, how to train to obtain an image classification model with higher prediction accuracy by using limited tagged images is a problem that needs to be solved currently.
Disclosure of Invention
The embodiment of the application aims to provide a training method, an image processing method and an image processing device for an image classification model, which are used for realizing training to obtain the image classification model with higher prediction accuracy and accurately predicting image types by utilizing limited label images.
In order to achieve the above object, the embodiment of the present application adopts the following technical scheme:
in a first aspect, an embodiment of the present application provides a training method for an image classification model, including:
inputting a sample set into a first classifier of an image classification model, and outputting a predicted category embedded vector of an image in the sample set, wherein the predicted category embedded vector is an embedded vector of predicted category information of the corresponding image, and the sample set comprises an unlabeled image and a labeled image;
inputting the sample set into a second classifier of the image classification model, and outputting a prediction type code of an image in the sample set, wherein the prediction type code is a code of prediction type information of the corresponding image;
Determining total prediction loss of the image classification model based on a prediction category embedded vector and a prediction category code of the image in the sample set and label information of the sample set, wherein the label information comprises a category label of the labeled image and a label embedded vector, the category label is a code of real category information of the labeled image, and the label embedded vector is an embedded vector of real category information of the labeled image;
based on the total predicted loss, respective network parameters of the first classifier and the second classifier are adjusted.
It can be seen that, in the embodiment of the application, the category information of the image is converted into the category embedding vector by introducing the embedding processing of the category information, so that the category embedding vector can retain more potential information in the real category information, and further can reflect the inherent association with other category information, such as the difference and the similarity; the category information of the image is converted into corresponding codes by introducing the codes of the category information, so that the category codes can clearly represent the category of the image; by adopting an image classification model comprising a first classifier and a second classifier, the first classifier performs semi-supervised learning based on a tagged image and a tag embedded vector thereof and an untagged image, namely, information provided by the tagged image and the tag embedded vector thereof for the first classifier is utilized, useful information is extracted from the untagged image for self training, and further, internal association among images of different categories is better learned and understood, so that the image classification model is prevented from falling into an incorrect learning direction; meanwhile, the second classifier performs semi-supervised learning based on the tagged image and the class tag thereof and the untagged image, namely, the information provided by the tagged image and the class tag thereof for the second classifier is utilized to extract useful information from the untagged image for self training, and further, the prediction class information with clear boundaries is output; therefore, the first classifier and the second classifier are equivalent to respectively understanding and learning the input images from different directions, the image classification model is facilitated to learn more knowledge, the semi-supervised learning effect is enhanced, and the image classification model with higher prediction accuracy is obtained by utilizing limited label image training.
In a second aspect, an embodiment of the present application provides an image processing method, including:
inputting an image to be processed into an image classification model, outputting a prediction result set of the image to be processed, wherein the prediction result set comprises a prediction category embedding vector and/or a prediction category code, the image classification model comprises a first classifier and a second classifier, the first classifier is used for classifying and predicting the image to be processed and embedding the obtained prediction category information to obtain the prediction category embedding vector of the image to be processed, the second classifier is used for classifying and predicting the image to be processed and coding the obtained prediction category information to obtain the prediction category code of the image to be processed, and the image classification model is an image classification model trained based on the method of the first aspect;
and determining the category of the image to be processed based on the prediction result set of the image to be processed.
It can be seen that, in the embodiment of the application, the prediction result set of the image to be processed can be obtained by inputting the image to be processed into the image classification model obtained by training, and then the category of the image to be processed can be determined based on the obtained prediction result set, so that the method is simple, convenient and quick to realize and high in efficiency; in addition, in the training process of the image classification model, the category information of the image is converted into the category embedding vector by introducing the embedding processing of the category information, so that the category embedding vector can retain more potential information in the real category information, and further can reflect the inherent association with other category information, such as the difference and the similarity; the category information of the image is converted into corresponding codes by introducing the codes of the category information, so that the category codes can clearly represent the category of the image; by adopting an image classification model comprising a first classifier and a second classifier, the first classifier performs semi-supervised learning based on a tagged image and a tag embedded vector thereof and an untagged image, namely, information provided by the tagged image and the tag embedded vector thereof for the first classifier is utilized, useful information is extracted from the untagged image for self training, and further, internal association among images of different categories is better learned and understood, so that the image classification model is prevented from falling into an incorrect learning direction; meanwhile, the second classifier performs semi-supervised learning based on the tagged image and the class tag thereof and the untagged image, namely, the information provided by the tagged image and the class tag thereof for the second classifier is utilized to extract useful information from the untagged image for self training, and further, the prediction class information with clear boundaries is output; therefore, the first classifier and the second classifier are equivalent to respectively understanding and learning the input images from different directions, the image classification model is facilitated to learn more knowledge, the semi-supervised learning effect is enhanced, and the image classification model with higher prediction accuracy is obtained by utilizing limited label image training.
In a third aspect, an embodiment of the present application provides a training apparatus for an image classification model, including:
the first prediction module is used for inputting a sample set into a first classifier of an image classification model, outputting a prediction category embedded vector of an image in the sample set, wherein the prediction category embedded vector is an embedded vector of prediction category information of a corresponding image, and the sample set comprises an unlabeled image and a labeled image;
the second prediction module is used for inputting the sample set into a second classifier of the image classification model and outputting a prediction type code of an image in the sample set, wherein the prediction type code is a code of prediction type information of the corresponding image;
the loss determination module is used for determining total prediction loss of the image classification model based on a prediction category embedded vector and a prediction category code of the image in the sample set and label information of the sample set, wherein the label information comprises a category label of the labeled image and a label embedded vector, the category label is the code of real category information of the labeled image, and the label embedded vector is the embedded vector of the real category information of the labeled image;
And the adjusting module is used for adjusting the network parameters of each of the first classifier and the second classifier based on the total prediction loss.
In a fourth aspect, an embodiment of the present application provides an image processing apparatus including:
the third prediction module is used for inputting an image to be processed into an image classification model and outputting a prediction result set of the image to be processed, wherein the prediction result set comprises a prediction category embedded vector and/or a prediction category code, the image classification model comprises a first classifier and a second classifier, the first classifier is used for carrying out classification prediction on the image to be processed and carrying out embedding processing on obtained prediction category information to obtain the prediction category embedded vector of the image to be processed, the second classifier is used for carrying out classification prediction on the image to be processed and carrying out coding on obtained prediction category information to obtain the prediction category code of the image to be processed, and the image classification model is an image classification model obtained based on the training of the method of the first aspect;
and the category determining module is used for determining the category to which the image to be processed belongs based on the prediction result set of the image to be processed.
In a fifth aspect, an electronic device includes:
a processor;
a memory for storing the processor-executable instructions;
wherein the processor is configured to execute the instructions to implement the method according to the first aspect.
In a sixth aspect, a computer readable storage medium, which when executed by a processor of an electronic device, enables the electronic device to perform the method according to the first aspect.
A seventh aspect, a processor;
a memory for storing the processor-executable instructions;
wherein the processor is configured to execute the instructions to implement the method as described in the second aspect.
In an eighth aspect, a computer readable storage medium, which when executed by a processor of an electronic device, enables the electronic device to perform the method as described in the second aspect.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute a limitation on the application. In the drawings:
FIG. 1 is a flowchart of a training method of an image classification model according to an embodiment of the present application;
FIG. 2 is a flowchart of a training method of an image classification model according to another embodiment of the present application;
FIG. 3 is a flowchart of an image processing method according to an embodiment of the present application;
FIG. 4 is a schematic structural diagram of a training device for image classification model according to an embodiment of the present application;
fig. 5 is a schematic structural diagram of an image processing apparatus according to an embodiment of the present application;
fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the technical solutions of the present application will be clearly and completely described below with reference to specific embodiments of the present application and corresponding drawings. It will be apparent that the described embodiments are only some, but not all, embodiments of the application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
The terms "first," "second," and the like in the description and in the claims, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that embodiments of the application may be practiced otherwise than as specifically illustrated or described herein. Furthermore, in the present specification and claims, "and/or" means at least one of the connected objects, and the character "/" generally means a relationship in which the associated object is an "or" before and after.
In order to obtain an image classification model with higher accuracy by utilizing limited label image training, the embodiment of the application provides a training method of the image classification model based on semi-supervised learning, and the real type information of the labeled image is expressed into an embedded space so as to better reflect the difference and similarity between images of different types; and then, the first classifier and the second classifier of the image classification model are cooperatively trained by using the representation of the class information of the image in the embedded space and the representation of the coding form, so that the semi-supervised learning gain is realized, the image classification model is beneficial to learning and understanding the similarity and the difference between images of different classes, and the prediction accuracy of the image classification model is improved. The embodiment of the application also provides an image processing method, and the image classification model obtained by training can be used for accurately carrying out classification prediction on the image.
It should be understood that, the training method and the image processing method for the image classification model provided by the embodiment of the application may be executed by an electronic device or software installed in the electronic device, and in particular may be executed by a terminal device or a server device.
The following describes in detail the technical solutions provided by the embodiments of the present application with reference to the accompanying drawings.
Referring to fig. 1, a flowchart of a training method of an image classification model according to an embodiment of the application is shown, and the method may include the following steps:
s102, inputting the sample set into a first classifier of the image classification model, and outputting a prediction category embedding vector of the image in the sample set.
In the embodiment of the application, the sample set comprises a labeled image and an unlabeled image, and the labeled image refers to an image with label information. In practical application, to further improve the prediction accuracy of the image classification model, the sample set may include a plurality of labeled images and a plurality of unlabeled images, where the plurality of labeled images may belong to different classes.
The label information of the labeled image is used for representing the real category information of the labeled image, and can specifically represent the real category to which the content presented by the image belongs. For example, the category to which the image belongs may be a category of people, animals, scenery, and the like; as another example, the category to which the image belongs may be a sub-category subdivided under a certain large category, such as for a large category of people, the category to which the image belongs may be low emotion, happiness, anger, and the like.
The label information of the labeled image may include a class label of the labeled image and a label embedding vector. The class label of the tagged image is the code of the real class information of the tagged image. In practical applications, the class label of the tagged image may be obtained by encoding the real class information of the tagged image, for example, single-hot (one-hot) encoding is performed on the real class information of the tagged image, so that the class label of the tagged image may uniquely determine the real class to which the tagged image belongs.
The label embedded vector of the labeled image is an embedded vector of the true category information of the labeled image. In practical applications, the label embedding vector of the labeled image may be obtained by performing embedding processing on the real category information of the labeled image. Therefore, the real type information of the tagged image is converted into the vector representation with fixed length, so that more potential information in the real type information is reserved, and the difference and the similarity between the real type information and other type information are recognized.
Specifically, as an alternative implementation manner, the real type label information of the labeled image may be input into an embedding model, and the label embedding vector of the labeled image is output, where the embedding model is obtained by training based on the real type information of the sample image and the label embedding vector of the sample image.
In practical application, the embedding model may be any suitable model with an embedding processing function, for example, the embedding model may be a word embedding (word embedding) model such as Glove, word2vec, etc., and the type of the embedding model may be selected according to actual needs, which is not limited in the embodiment of the present application. In addition, the embedded model may be obtained by training by any suitable training method, for example, training with the real class information of the sample image as a training sample and the tag embedded vector of the sample image as a tag corresponding to the training sample, where the specific training method may be selected according to the actual needs, which is not limited in the embodiment of the present application.
In order to enable the image classification model to fully learn the differences and the similarities between different types of images under the condition that the number of the tagged images is limited, and output well-defined prediction results, so that the prediction accuracy of the image classification model is improved. In practical applications, the first classifier and the second classifier may have different network structures.
Specifically, the first classifier is configured to classify and predict an input image, and then perform embedding processing on the obtained prediction type information to obtain a prediction type embedding vector of the prediction type information of the image, that is, the prediction type embedding vector is an embedding vector of the prediction type information of the corresponding image. In S102, the sample set is input into the first classifier, the first classifier classifies and predicts the tagged image and the untagged image in the sample set, and then embeds the obtained prediction category information, so as to obtain the prediction category embedded vector of the tagged image and the prediction category embedded vector of the untagged image.
S104, inputting the sample set into a second classifier of the image classification model, and outputting the prediction type codes of the images in the sample set.
The second classifier is configured to classify and predict an input image, and then encode the obtained prediction type information, for example, one-hot encoding, to obtain a prediction type encoding of the prediction type information of the image, that is, the prediction type encoding is an encoding of the prediction type information of the corresponding image.
In S104, the sample set is input into a second classifier, and the second classifier classifies and predicts the tagged image and the untagged image in the sample set, and then encodes the obtained prediction type information, thereby obtaining the prediction type code of the tagged image and the prediction type code of the untagged image.
Optionally, in order to enable each classifier in the image classification model to fully understand and learn the input image, the expression capability of the image classification model is improved, and before the sample set is input to the corresponding classifier, data enhancement processing may be further performed on the image in the sample set, so as to increase disturbance on the image.
Specifically, for the unlabeled image, various data enhancement processes can be performed on the unlabeled image to obtain a plurality of enhanced images, then the enhanced images are input into a first classifier, the prediction category embedded vector of each enhanced image is output, and the enhanced images are input into a second classification model, and the prediction category codes of each enhanced image are output. It can be understood that the disturbance introduced by different data enhancement processes is different in size, in an ideal state, the same image is input to the same classifier after being disturbed by different sizes, and the same prediction result is output, so that the prediction category embedded vector output by the first classifier under different disturbances can play a role in supervising the first classifier, and the performance capability of the first classifier is improved; likewise, the prediction category codes output by the second classifier under different disturbance can play a role in supervising the second classification, so that the expressive power of the second classifier is improved, and the prediction accuracy of the image classification model is improved.
More specifically, the label-free image is subjected to various data enhancement processes, which can be specifically implemented as follows: and carrying out weak enhancement processing on the unlabeled image to obtain a weak enhancement image, and carrying out strong enhancement processing on the unlabeled image to obtain a strong enhancement image. Wherein the weak enhancement process may specifically include, but is not limited to, panning, flipping, etc., and the strong enhancement process may include, but is not limited to, occlusion, color transformation, etc. It can be understood that, because the disturbance introduced by weak enhancement is smaller and no label image distortion is caused, the possibility that the classifier obtains a wrong prediction result is lower, but only the weak enhancement image may cause the classifier to fall into an overfitting state and cannot extract essential characteristics, while the disturbance introduced by strong enhancement image is larger and the distortion of the label-free image may be caused, but the characteristics sufficient for identifying the class can still be reserved, and the prediction result of the strong enhancement image can be guided by using the prediction result of the weak enhancement image, so that the expression capability of the classifier is further improved.
For tagged images, weak enhancement processing may be performed on the tagged image. Correspondingly, in the step S102, the weakly enhanced tagged image is input into the first classifier, and the prediction category embedded vector of the tagged image is output; in S104, the weakly enhanced tagged image is input to the second classifier, and the prediction type code of the tagged image is output.
S106, determining the total prediction loss of the image classification model based on the prediction type embedded vector and the prediction type code of the image in the sample set and the label information of the sample set.
The label information of the sample set comprises a class label of the labeled image in the sample set and a label embedding vector.
In the embodiment of the application, the total prediction loss of the image classification model is used for representing the deviation between the prediction result output by the image classification model for classifying and predicting the input image and the real category to which the input image belongs.
Considering that the first classifier and the second classifier process and predict the input image from different directions, respectively, the first classifier and the second classifier generate a certain prediction bias, in an alternative implementation, in order to more accurately determine the total prediction loss of the image classification model, to further improve the prediction accuracy of the image classification model, the step S106 may include:
s161, determining a first prediction loss based on the prediction category embedded vector of the image in the sample set and the label embedded vector of the labeled image.
The first prediction loss is used for representing the prediction loss generated by the first classifier, and reflects the difference between the prediction result obtained by the classification prediction of the input image by the first classifier and the real category of the input image.
Considering that the first classifier performs semi-supervised learning tasks based on the input image, which combines supervised learning based on the tagged image and its tag embedding vector and non-supervised learning based on the non-tagged image, each learning task generates a certain prediction loss, and more specifically, the first prediction loss includes a first supervised loss and a first non-supervised loss, where the first supervised loss is used to represent the prediction loss generated by the first classifier performing the supervised learning, and the first non-supervised loss is used to represent the prediction loss generated by the first classifier performing the non-supervised learning.
In order to more accurately determine the prediction loss generated by the first classifier, in an alternative implementation, the unlabeled image is enhanced by multiple data and then input into the first classifier, so that the obtained prediction class embedding vector of the unlabeled image comprises the respective prediction class embedding vectors of the enhanced images. Accordingly, the first supervised penalty may be determined based on the predicted class embedded vector of the tagged image and the tag embedded vector of the tagged image; the first unsupervised penalty may be determined based on a prediction category embedding vector for each of the plurality of enhanced images.
More specifically, the plurality of enhanced images include a weakly enhanced image obtained by weakly enhancing the unlabeled image and a strongly enhanced image obtained by strongly enhancing the unlabeled image. Accordingly, the above S161 may be specifically implemented as:
step A1, determining a first supervised penalty based on the prediction category embedded vector of the tagged image and the tag embedded vector of the tagged image.
Specifically, as shown in the following equation (1), the cosine loss function, the prediction class embedding vector of the tagged image, and the tag embedding vector of the tagged image may be used to determine the first supervised loss:
wherein, the liquid crystal display device comprises a liquid crystal display device,representing a first supervised loss, C (·) representing a cosine loss function, B representing a labeled image x ξ Alpha (·) represents the weak enhancement treatment, alpha (x) ξ ) Representing a weakly enhanced tagged image, p 1 Representing a first classifier, p 1 (y|α(x ξ ) A predictive category embedding vector, x) representing a tagged image ξ Representing the output of the first classifier, W T y ξ Representing a tagged image x ξ Is a label embedded vector of (a), y ξ Representing a tagged image x ξ Category labels of W T Representing a transpose of an embedding matrix W containing a plurality of label embedding vectors for the labeled image.
And step A2, determining a pseudo tag embedded vector of the unlabeled image based on the predicted category embedded vector of the weak enhanced image and the embedded vector of the labeled image.
It can be appreciated that, since the unlabeled image itself does not have category information, determining the pseudo-label embedded vector of the unlabeled image based on the predicted category embedded vector output by the first classifier for the weakly enhanced image is equivalent to labeling the unlabeled image with "artificial labels" to indicate the category that the first classifier predicts for the unlabeled image.
Specifically, the pseudo tag embedding vector of the untagged image may be determined as follows: firstly, determining classification probabilities of the weak enhanced image corresponding to various categories respectively based on the similarity between a prediction category embedded vector of the weak enhanced image and tag embedded vectors of all tagged images in a sample set, wherein the sample set comprises the tagged images of the various categories; then, based on classification probabilities of the weak enhanced image corresponding to the classification of the various categories, determining pseudo category information of the unlabeled image; further, the pseudo category information of the weak enhanced image is subjected to embedding processing, and the obtained embedded vector is determined as a pseudo tag embedded vector of the non-tag image.
More specifically, determining the classification probability of the weak enhanced image corresponding to each of the plurality of categories may be implemented as: generating an embedding matrix W based on the label embedding vectors of the label images in the sample set, wherein each row vector of the embedding matrix is a corresponding label embedding vector of the label images; further, a prediction class embedding vector q of the weakly enhanced image may be calculated ξ Cosine similarity with each row vector of the embedding matrix, thereby converting the predicted category embedding vector of the weak enhanced image into the classification probability of the weak enhanced image corresponding to various categories, namelyWherein (1)>Representing classification probability of weak enhanced image corresponding to multiple categories, q ξ A prediction class embedding vector representing a weakly enhanced image, W representing an embedding matrix comprising a plurality of label embedding vectors for labeled images, cos Sim (·) representing cosine similarity; then, sharpening is carried out on classification probabilities of the weak enhanced image corresponding to various categories, wherein the classification probabilities are shown in the following formula (2):
wherein, the liquid crystal display device comprises a liquid crystal display device,representing classification probabilities of the sharpened weakly enhanced image corresponding to various categories,/for each category>Representing the classification probability of the weakly enhanced image corresponding to the r-th class,/or->The classification probability corresponding to the ith class of the weak enhanced image is represented, omega represents the weight adjustment super-parameter, and K represents the total number of classes of the labeled images in the sample set.
The determining the pseudo tag information of the unlabeled image based on the classification probability of the weak enhanced image corresponding to the classification of the various categories can be specifically realized as follows: selecting the maximum classification probability exceeding a first preset probability threshold from the classification probabilities respectively corresponding to the weak enhanced image in a plurality of categories, and determining the category information corresponding to the selected classification probability as the pseudo category information of the label-free image. It can be understood that, because the weak enhanced image introduces a disturbance of a certain magnitude compared with the original unlabeled image, the class represented by the prediction class embedding vector output by the first classifier may be made to be wrong, and the class corresponding to the maximum classification probability exceeding the first preset probability threshold is selected as the pseudo class information, so that the deviation between the pseudo class information and the real class information of the unlabeled image can be reduced, and the prediction accuracy of the first classifier is improved.
And step A3, determining the first unsupervised loss based on the pseudo tag embedded vector of the unlabeled image and the prediction type embedded vector of the strong enhanced image.
Specifically, as shown in the following formula (3), the cosine loss function, the pseudo tag embedded vector of the unlabeled image, and the prediction type embedded vector of the strongly enhanced image may be used to determine the first unsupervised loss:
Wherein, the liquid crystal display device comprises a liquid crystal display device,represents a first unsupervised loss, u ξ Represents an unlabeled image, C (·) represents a cosine loss function, μB represents an unlabeled image u ξ Quantity of->Representing a strong enhancement treatment, < >>Representing strongly enhanced images, p 1 Representing the first classifier, y representing the output of the first classifier,/>A prediction class embedding vector representing a strongly enhanced image,representing a label-free image u ξ Is embedded with a vector, is->Representing classification probabilities of the sharpened weakly enhanced image corresponding to various categories,/for each category>Representing the maximum classification probability, W, of the classification probabilities corresponding to the various categories for the weakly enhanced image T Representing a transpose of an embedding matrix W comprising a plurality of label embedding vectors of the labeled image, γ representing a first predetermined probability threshold.
It can be appreciated that the first classifier performs semi-supervised learning tasks based on the input image, which combine supervised learning based on the tagged image and its tag embedded vector and unsupervised learning based on the untagged image, each of which generates a certain prediction loss. For this reason, based on the prediction category embedded vector output by the first classifier for the tagged image and the tag embedded vector of the tagged image, determining the first supervised loss, so that the obtained first supervised loss can accurately reflect the prediction loss generated by the first classifier when the supervised learning task is performed; the method comprises the steps of utilizing the principle that the disturbance sizes introduced by weak enhancement processing and strong enhancement processing are different, and the prediction results obtained by inputting the same image into the same classifier after different disturbance are theoretically the same, based on the prediction type embedded vector of the weak enhancement image and the label embedded vector of the labeled image, marking a pseudo-label embedded vector for the unlabeled image, and then utilizing the pseudo-label embedded vector of the unlabeled image and the prediction type embedded vector of the strong enhancement image to determine first unsupervised loss, so that the obtained first unsupervised loss can accurately reflect the prediction loss generated by the first classifier when the unsupervised learning task is carried out, and the first classifier is facilitated to monitor and enhance the prediction type embedded vector of the strong enhancement image by utilizing the prediction type embedded vector of the weak enhancement image in the unsupervised learning process, thereby being beneficial to improving the prediction accuracy of the first classifier.
Embodiments of the present application are illustrated herein as one specific implementation of determining a first predictive loss. Of course, it should be understood that the first predictive loss may be determined in other ways, as well, and embodiments of the application are not limited in this regard.
S162, determining a second prediction loss based on the prediction category codes of the images in the sample set and the category labels of the labeled images.
The second prediction loss is used for representing the prediction loss generated by the second classifier, and reflects the difference between the prediction result obtained by classifying and predicting the input image by the second classifier and the real category of the input image.
Considering that the second classifier performs semi-supervised learning tasks based on the input images, which combine supervised learning based on the tagged images and their class tags and non-supervised learning based on the non-tagged images, each learning task generates a certain prediction loss, for this purpose, more specifically, the second prediction loss includes a second supervised loss and a second non-supervised loss, where the second supervised loss is used to represent the prediction loss generated by the second classifier performing the supervised learning, and the second non-supervised loss is used to represent the prediction loss generated by the second classifier performing the non-supervised learning.
In order to more accurately determine the prediction loss generated by the second classifier, in an alternative implementation, the unlabeled image is enhanced by multiple data and then input to the second classifier, so that the prediction class code of the resulting unlabeled image includes the respective prediction class codes of the multiple enhanced images. Accordingly, the second supervised penalty may be determined based on the predictive class coding of the tagged image and the class tag of the tagged image; the second unsupervised penalty may be determined based on a prediction class coding for each of the plurality of enhanced images.
More specifically, the plurality of enhanced images include a weakly enhanced image obtained by weakly enhancing the unlabeled image and a strongly enhanced image obtained by strongly enhancing the unlabeled image. Accordingly, the above S162 may be specifically implemented as:
and step B1, determining a second supervised loss based on the prediction category codes and the category labels of the tagged images.
Specifically, as shown in equation (4) below, a second supervised penalty may be determined using the cross entropy penalty function, the predictive class coding of the tagged image, and the class label of the tagged image:
wherein, the liquid crystal display device comprises a liquid crystal display device,representing a second supervised loss, >Representing cross entropyLoss function, x ξ Representing tagged images, B representing the number of tagged images, α (·) representing weak enhancement processing, α (x) ξ ) Representing a weakly enhanced tagged image, p 2 Representing a second classifier, p 2 (y|α(x ξ ) A) predictive class coding representing a tagged image, y representing the output of the second classifier, y ξ A category label representing a labeled image.
And step B2, determining a pseudo category label of the label-free image based on the prediction category coding of the weak enhanced image.
It will be appreciated that, since the unlabeled image itself does not have class information, determining a pseudo class label for the unlabeled image based on the predictive class encoding output by the second classifier for the weakly enhanced image is equivalent to "manual labeling" the unlabeled image to indicate that the second classifier predicts the class for the unlabeled image.
Specifically, the predictive category codes output by the second classifier for the input image include predictive category codes of the input image corresponding to a plurality of categories and classification probabilities thereof. Based on this, as an alternative, a prediction type code with the highest corresponding classification probability may be selected from prediction type codes of weak enhanced images, and determined as a pseudo type label of a label-free image.
As another preferable scheme, considering that the weak enhanced image introduces a disturbance of a certain size compared with the original unlabeled image, the prediction class code output by the second classifier may have a certain error, in order to ensure that the pseudo class label can accurately represent the class to which the unlabeled image belongs, the prediction class code with the corresponding classification probability exceeding the second preset probability threshold value can be selected from the prediction class codes of the weak enhanced image, and is determined as the pseudo class label of the unlabeled image.
And step B3, determining a second unsupervised loss based on the pseudo category labels of the unlabeled images and the prediction category codes of the strong enhanced images.
Specifically, as shown in equation (5) below, a second unsupervised loss may be determined using the cross entropy loss function, the pseudo-class labels of the unlabeled image, and the predictive class coding of the strongly enhanced image:
wherein, the liquid crystal display device comprises a liquid crystal display device,representing a second unsupervised loss,>represents a cross entropy loss function, u ξ Represents an unlabeled image, μB represents an unlabeled image u ξ Quantity of->Representing strongly enhanced images, p 2 Representing the second classifier, y representing the output of the second classifier,/>Representing a strongly enhanced image +.>Prediction class coding,/- >Representing a label-free image u ξ Pseudo category label of-> q′ ξ Prediction class coding, max (q' ξ ) Representing the maximum classification probability, q ', corresponding to the predictive class coding of a weakly enhanced image' ξ =p 2 (y|α(u ξ )),α(u ξ ) Representing a weakly enhanced image, τ represents a second preset probability threshold.
Embodiments of the present application are illustrated herein as one specific implementation of determining the second predictive loss. Of course, it should be understood that the second predictive loss may be determined in other ways, as well, and embodiments of the application are not limited in this regard.
It will be appreciated that the second classifier performs semi-supervised learning tasks based on the input images, which combine supervised learning based on tagged images and their class tags with unsupervised learning based on untagged images, each of which produces a certain prediction loss. Based on the prediction category codes output by the second classifier aiming at the tagged images and the category tags of the tagged images, determining second supervised losses, so that the obtained second supervised losses can accurately reflect the prediction losses generated by the second classifier when the supervised learning task is carried out; the method comprises the steps of utilizing the principle that the disturbance values introduced by weak enhancement processing and strong enhancement processing are different and the prediction results obtained by inputting the same image into the same classifier after different disturbance are theoretically the same, marking a pseudo tag for an unlabeled image based on the prediction type code of the weak enhancement image, and then determining a second unsupervised loss by utilizing the pseudo tag of the unlabeled image and the prediction type code of the strong enhancement image, so that the obtained second unsupervised loss not only can accurately reflect the prediction loss generated by the second classifier when the unsupervised learning task is carried out, but also is beneficial to the prediction type code of the strong enhancement image which is supervised and enhanced by the prediction type code of the weak enhancement image in the unsupervised learning process of the second classifier, thereby being beneficial to improving the prediction accuracy of the second classifier.
S163, determining the total prediction loss of the image classification model based on the first prediction loss and the second prediction loss.
In an alternative implementation, to more accurately determine the total prediction loss of the image classification model, the first prediction loss and the second prediction loss are weighted and summed based on weights corresponding to the first classifier and the second classifier, to obtain the total prediction loss of the image classification model.
Specifically, the total predicted loss of the image classification model can be determined by the following equation (6):
wherein, the liquid crystal display device comprises a liquid crystal display device,representing the total predictive loss of the image classification model, +.>Representing a first supervised loss->Representing first unsupervised loss,>representing a second supervised loss,>represents a second unsupervised loss, lambda 1 And the weight corresponding to the first classifier is represented, and the weight corresponding to the second classifier is 1./>
It can be understood that, because the first classifier processes and predicts the input image from the angle of the embedded vector of the category information, and the second classifier processes and predicts the input image from the angle of the encoding of the category information, a certain deviation exists between the prediction result output by each classifier and the real category to which the input image belongs, so that the prediction accuracy of the image classification model is affected by the prediction deviation generated by each classifier, and in an ideal state, the prediction result output after the same image is input to different classifiers for classification prediction is the same, and the prediction deviation generated by each classifier is the same or close, the total prediction loss of the image classification model is determined by determining the respective prediction loss of the first classifier and the second classifier, so that the prediction deviation of the image classification model can be reflected more accurately by the total prediction loss, and the network parameters of each classifier of the image classification model are adjusted by using the total prediction loss, thereby being beneficial to improving the prediction accuracy of the image classification model.
Embodiments of the application are illustrated herein as one specific implementation of determining total predicted loss. Of course, it should be understood that the total predicted loss may be determined in other ways, as well, and embodiments of the application are not limited in this regard.
S108, based on the total prediction loss, adjusting network parameters of each of the first classifier and the second classifier.
The network parameters of each classifier may include, but are not limited to, the number of neurons in each network layer, the connection relationship and connection edge weight between neurons in different network layers, the bias corresponding to the neurons in each network layer, and so on.
Because the total prediction loss of the image classification model can reflect the difference between the prediction result output by the image classification model for classifying and predicting the input image and the real category to which the input image belongs, in order to obtain the image classification model with high accuracy, a back propagation algorithm can be adopted, and the respective network parameters of the first classifier and the second classifier can be adjusted based on the total prediction loss of the image classification model.
More specifically, when the back propagation algorithm is adopted to adjust the network parameters of each of the first classifier and the second classifier, the back propagation algorithm can be adopted to determine the prediction loss caused by each network layer of each of the first classifier and the second classifier based on the total prediction loss value of the image classification model, the current network parameters of the first classifier and the current network parameters of the second classifier; and then, aiming at reducing the total prediction loss value of the image classification model, adjusting the relevant parameters of each network layer in the first classifier and the relevant parameters of each network layer in the second classifier layer by layer.
An embodiment of the present application herein shows a specific implementation of S108 described above. Of course, it should be understood that S108 may be implemented in other manners, and embodiments of the present application are not limited thereto.
It should be noted that, the above-mentioned process is only one adjustment process, and in practical applications, multiple adjustments may be required, so the above-mentioned steps S102 to S108 may be repeatedly performed multiple times until the preset training stop condition is met, thereby obtaining the final image classification model. The preset training stop condition may be that the total predicted loss of the image classification model is smaller than a preset loss threshold, or may be that the adjustment frequency reaches a preset frequency, or the like, which is not limited in the embodiment of the present application.
Specifically, after S108 described above, the training method for an image classification model provided by the embodiment of the present application may further include: if the image classification model does not meet the preset training stop condition, the weight corresponding to the first classifier is reduced, and the steps S102 to S108 are repeatedly executed until the image classification model meets the preset training stop condition.
It can be understood that, because the first classifier outputs the prediction category embedded vector of the input image, which retains more potential information in the category information of the input image, and the second classifier outputs the prediction category code of the input image, which can definitely input the prediction category of the image, by gradually reducing the weight of the first classifier in the training process, the image classification model can fully learn the potential information in the label information of the labeled image, especially the difference and the similarity between the images of different categories in the early training stage, thereby relieving the problem of insufficient supervision information provided by the limited labeled image in the semi-supervision training process, and the image classification model can output the prediction category with definite demarcation in the late training stage, thereby achieving the purpose of more smoothly and scientifically carrying out the semi-supervision training on the image classification model, and further improving the prediction accuracy of the image classification model.
According to the training method of the image classification model, the class information of the image is converted into the class embedding vector by introducing the embedding processing of the class information, so that the class embedding vector can retain more potential information in the real class information, and further can reflect the inherent association with other class information, such as the difference and the similarity; the category information of the image is converted into corresponding codes by introducing the codes of the category information, so that the category codes can clearly represent the category of the image; by adopting an image classification model comprising a first classifier and a second classifier, the first classifier performs semi-supervised learning based on a tagged image and a tag embedded vector thereof and an untagged image, namely, information provided by the tagged image and the tag embedded vector thereof for the first classifier is utilized, useful information is extracted from the untagged image for self training, and further, internal association among images of different categories is better learned and understood, so that the image classification model is prevented from falling into an incorrect learning direction; meanwhile, the second classifier performs semi-supervised learning based on the tagged image and the class tag thereof and the untagged image, namely, the information provided by the tagged image and the class tag thereof for the second classifier is utilized to extract useful information from the untagged image for self training, and further, the prediction class information with clear boundaries is output; therefore, the first classifier and the second classifier are equivalent to respectively understanding and learning the input images from different directions, the image classification model is facilitated to learn more knowledge, the semi-supervised learning effect is enhanced, and the image classification model with higher prediction accuracy is obtained by utilizing limited label image training.
The above embodiment introduces a training method for an image classification model, through which an image classification model for different application scenarios can be trained, and a sample set and label information thereof used for model training for different application scenarios can be selected according to the application scenarios. The application scene used in the training method provided by the embodiment of the application can include, for example, but not limited to, facial expression classification, natural animal classification, handwriting digital recognition and other scenes. The training method of the image classification model provided by the embodiment of the application is described in detail below by taking an application scene of facial expression classification as an example.
In this scenario, the tagged image may be a face image having tag information for representing expressions presented by the face image, such as happiness, anger, heart injury, and the like. The label information of the labeled face image comprises a class label of the labeled face image and a label embedded vector, wherein the class label is the code of the expression presented by the labeled face image, and the label embedded vector is the embedded vector of the expression presented by the labeled face image. The unlabeled image may be a face image without label information.
After the sample set and its tag information are acquired, the above steps S102 to S108 may be performed. Therefore, the trained image classification model can identify the expression presented by the face image.
Based on the training method of the image classification model shown in the embodiment of the application, the image classification model obtained by training can be applied to any scene needing classification prediction of the image. The application process based on the image classification model is described in detail below.
The embodiment of the application also provides an image processing method which can be used for carrying out classification prediction on the image to be processed based on the image classification model trained by the method shown in fig. 1.
Referring to fig. 3, a flowchart of an image processing method according to an embodiment of the application may include the following steps:
s302, inputting the image to be processed into an image classification model, and outputting a prediction result set of the image to be processed.
The prediction result set of the image to be processed comprises a prediction type embedded vector and/or a prediction type code of the image to be processed. The image classification model comprises a first classifier and a second classifier, wherein the first classifier is used for carrying out classification prediction on an image to be processed and carrying out embedding processing on the obtained prediction category information to obtain a prediction category embedding vector of the image to be processed; the second classifier is used for classifying and predicting the image to be processed and encoding the obtained prediction type information to obtain the prediction type encoding of the image to be processed. The image classification model is an image classification model obtained by training based on the training method of the image classification model.
S304, determining the category of the image to be processed based on the prediction result set of the image to be processed.
Alternatively, the category to which the image to be processed belongs may be determined based on the predicted category embedding vector of the image to be processed. For example, the predicted class embedding vector of the image to be processed may be converted into a class to which the image to be processed belongs.
Alternatively, the category to which the image to be processed belongs may also be determined based on the predictive category encoding of the image to be processed. For example, the category indicated by the prediction category encoding of the image to be processed may be determined as the category to which the image to be processed belongs.
Optionally, the prediction type embedding vector and the prediction type code of the image to be processed can be integrated, and the type to which the image to be processed belongs can be determined. For example, if the prediction category determined based on the prediction category embedding vector of the image to be processed coincides with the prediction category determined based on the prediction category encoding of the image to be processed, the category may be determined as the category to which the image to be processed belongs, and so on.
According to the image processing method provided by the embodiment of the application, the predicted result set of the image to be processed can be obtained by inputting the image to be processed into the image classification model obtained through training, and then the category of the image to be processed can be determined based on the obtained predicted result set, so that the method is simple, convenient and quick to realize and high in efficiency; in addition, in the training process of the image classification model, the category information of the image is converted into the category embedding vector by introducing the embedding processing of the category information, so that the category embedding vector can retain more potential information in the real category information, and further can reflect the inherent association with other category information, such as the difference and the similarity; the category information of the image is converted into corresponding codes by introducing the codes of the category information, so that the category codes can clearly represent the category of the image; by adopting an image classification model comprising a first classifier and a second classifier, the first classifier performs semi-supervised learning based on a tagged image and a tag embedded vector thereof and an untagged image, namely, information provided by the tagged image and the tag embedded vector thereof for the first classifier is utilized, useful information is extracted from the untagged image for self training, and further, internal association among images of different categories is better learned and understood, so that the image classification model is prevented from falling into an incorrect learning direction; meanwhile, the second classifier performs semi-supervised learning based on the tagged image and the class tag thereof and the untagged image, namely, the information provided by the tagged image and the class tag thereof for the second classifier is utilized to extract useful information from the untagged image for self training, and further, the prediction class information with clear boundaries is output; therefore, the first classifier and the second classifier are equivalent to respectively understanding and learning the input images from different directions, the image classification model is facilitated to learn more knowledge, the semi-supervised learning effect is enhanced, and the image classification model with higher prediction accuracy is obtained by utilizing limited label image training.
The foregoing describes specific embodiments of the present disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims can be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.
In addition, corresponding to the training method of the image classification model shown in fig. 1, the embodiment of the application further provides a training device of the image classification model. Referring to fig. 4, a schematic structural diagram of an image classification model training apparatus 400 according to an embodiment of the present application is provided, where the apparatus includes:
a first prediction module 410, configured to input a sample set into a first classifier of an image classification model, and output a predicted class embedding vector of an image in the sample set, where the predicted class embedding vector is an embedding vector of predicted class information of a corresponding image, and the sample set includes a non-labeled image and a labeled image;
A second prediction module 420, configured to input the sample set into a second classifier of the image classification model, and output a prediction class code of an image in the sample set, where the prediction class code is a code of prediction class information of a corresponding image;
a loss determination module 430, configured to determine a total prediction loss of the image classification model based on a prediction category embedding vector and a prediction category code of an image in the sample set and tag information of the sample set, where the tag information includes a category tag of the tagged image and a tag embedding vector, the category tag is a code of real category information of the tagged image, and the tag embedding vector is an embedding vector of real category information of the tagged image;
an adjustment module 440 for adjusting respective network parameters of the first classifier and the second classifier based on the total predicted loss.
According to the training device for the image classification model, provided by the embodiment of the application, the category information of the image is converted into the category embedding vector by introducing the embedding processing of the category information, so that the category embedding vector can retain more potential information in the real category information, and further can reflect the inherent association with other category information, such as the difference and the similarity; the category information of the image is converted into corresponding codes by introducing the codes of the category information, so that the category codes can clearly represent the category of the image; by adopting an image classification model comprising a first classifier and a second classifier, the first classifier performs semi-supervised learning based on a tagged image and a tag embedded vector thereof and an untagged image, namely, information provided by the tagged image and the tag embedded vector thereof for the first classifier is utilized, useful information is extracted from the untagged image for self training, and further, internal association among images of different categories is better learned and understood, so that the image classification model is prevented from falling into an incorrect learning direction; meanwhile, the second classifier performs semi-supervised learning based on the tagged image and the class tag thereof and the untagged image, namely, the information provided by the tagged image and the class tag thereof for the second classifier is utilized to extract useful information from the untagged image for self training, and further, the prediction class information with clear boundaries is output; therefore, the first classifier and the second classifier are equivalent to respectively understanding and learning the input images from different directions, the image classification model is facilitated to learn more knowledge, the semi-supervised learning effect is enhanced, and the image classification model with higher prediction accuracy is obtained by utilizing limited label image training.
Optionally, the loss determination module includes:
a first loss determination sub-module configured to determine a first prediction loss based on a prediction category embedding vector of an image in the sample set and a label embedding vector of the labeled image, the first prediction loss being indicative of a prediction loss generated by the first classifier;
a second loss determination sub-module for determining a second prediction loss based on a prediction class coding of the images in the sample set and a class label of the tagged images, the second prediction loss being indicative of a prediction loss generated by the second classifier;
a total loss determination sub-module for determining a total predicted loss of the image classification model based on the first predicted loss and the second predicted loss.
Optionally, the apparatus 400 further includes:
the enhancement processing module is used for performing various enhancement processes on the unlabeled image before the first prediction module inputs a sample set into a first classifier of an image classification model and the second prediction module inputs the sample set into a second classifier of the image classification model, so as to obtain a plurality of enhanced images;
the first prediction module includes:
The first prediction submodule is used for inputting the tagged image and the enhanced images into the first classifier to obtain respective prediction category embedded vectors of the tagged image and the enhanced images;
the second prediction module includes:
and the second prediction submodule is used for inputting the tagged image and the enhanced images into the second classifier to obtain respective prediction category codes of the tagged image and the enhanced images.
Optionally, the enhancement processing module includes:
the first enhancement processing submodule is used for carrying out weak enhancement processing on the unlabeled image to obtain a weak enhancement image; the method comprises the steps of,
and the second enhancement processing sub-module is used for carrying out strong enhancement processing on the label-free image to obtain a strong enhancement image.
Optionally, the first predicted loss includes a first supervised loss and a first unsupervised loss, the first loss determination submodule to:
determining the first supervised penalty based on the predicted category embedded vector for the tagged image and the tag embedded vector for the tagged image;
determining a pseudo tag embedding vector of the unlabeled image based on the predicted category embedding vector of the weakly enhanced image and the tag embedding vector of the labeled image;
The first unsupervised penalty is determined based on the pseudo tag embedded vector of the unlabeled image and the prediction category embedded vector of the strongly enhanced image.
Optionally, the first loss determination submodule determines a pseudo tag embedding vector of the unlabeled image based on the prediction category embedding vector of the weak enhanced image and the tag embedding vector of the labeled image, including:
determining classification probabilities of the weak enhanced image corresponding to various categories respectively based on the similarity between the predicted category embedded vector of the weak enhanced image and the tag embedded vectors of the tagged images in the sample set, wherein the sample set comprises the tagged images of the various categories;
based on the classification probabilities of the weak enhanced image corresponding to various categories respectively, determining pseudo category information of the unlabeled image;
and performing embedding processing on the pseudo category information of the weak enhanced image, and determining the obtained embedded vector as a pseudo tag embedded vector of the label-free image.
Optionally, the first loss determination submodule determines, based on classification probabilities of the weak enhanced image corresponding to multiple categories respectively, pseudo category information of the unlabeled image, including:
Selecting the maximum classification probability exceeding a first preset probability threshold from the classification probabilities respectively corresponding to the weak enhanced image in a plurality of categories;
and determining the category information corresponding to the selected classification probability as the pseudo category information of the label-free image.
Optionally, the second predicted loss includes a second supervised loss and a second unsupervised loss, the second loss determination submodule to:
determining the second supervised loss based on a predictive class encoding and class labels for the tagged images;
determining a pseudo class label of the unlabeled image based on a predictive class coding of the weakly enhanced image;
the second unsupervised penalty is determined based on the pseudo-class label of the unlabeled image and the predictive class coding of the strongly enhanced image.
Optionally, the second loss determination submodule determines a pseudo category label of the unlabeled picture based on the prediction category coding of the weak enhanced picture, including:
and selecting a prediction type code with the corresponding classification probability exceeding a second preset probability threshold from the prediction type codes of the weak enhanced image, and determining the prediction type code as a pseudo type label of the label-free image.
Optionally, the total loss determination submodule is configured to:
based on the weights corresponding to the first classifier and the second classifier, carrying out weighted summation on the first prediction loss and the second prediction loss to obtain the total prediction loss of the image classification model;
after adjusting the respective network parameters of the first classifier and the second classifier based on the total predicted loss, the method further comprises:
and if the image classification model does not meet the preset training stop condition, reducing the weight corresponding to the first classifier, and repeatedly executing the steps from the first classifier for inputting the sample set into the image classification model to the step of adjusting the network parameters of each of the first classifier and the second classifier based on the total prediction loss until the image classification model meets the preset training stop condition.
Optionally, the apparatus 400 further includes:
the embedding module is used for inputting the real class information of the tagged image into an embedding model and outputting the tag embedding vector of the tagged image before the loss determining module determines the total prediction loss of the image classification model based on the prediction class embedding vector and the prediction class code of the image in the sample set and the tag information of the sample set, wherein the embedding model is obtained by training based on the real class information of the sample image and the tag embedding vector of the sample image; and/or the number of the groups of groups,
And the encoding module is used for encoding the real class information of the tagged image before the loss determination module determines the total prediction loss of the image classification model based on the prediction class embedded vector and the prediction class encoding of the image in the sample set and the label information of the sample set, so as to obtain the class label of the tagged image.
Obviously, the training device for the image classification model provided by the embodiment of the application can be used as an execution main body of the training method for the image classification model shown in fig. 1, so that the function of the training device for the image classification model in fig. 1 can be realized. Since the principle is the same, the description is not repeated here.
In addition, corresponding to the image processing method shown in fig. 3, the embodiment of the application further provides an image processing device. Referring to fig. 5, a schematic structural diagram of an image processing apparatus 500 according to an embodiment of the present application is provided, the apparatus includes:
the third prediction module 510 is configured to input an image to be processed into an image classification model, and output a prediction result set of the image to be processed, where the prediction result set includes a prediction class embedding vector and/or a prediction class code, and the image classification model includes a first classifier and a second classifier, where the first classifier is configured to perform classification prediction on the image to be processed and perform embedding processing on obtained prediction class information to obtain a prediction class embedding vector of the image to be processed, and the second classifier is configured to perform classification prediction on the image to be processed and encode the obtained prediction class information to obtain a prediction class code of the image to be processed, and the image classification model is an image classification model that is trained based on the training method of the image classification model provided by the embodiment of the present application;
The category determination module 520 is configured to determine, based on the prediction result set of the image to be processed, a category to which the image to be processed belongs.
According to the image processing device provided by the embodiment of the application, the predicted result set of the image to be processed can be obtained by inputting the image to be processed into the image classification model obtained through training, and then the category of the image to be processed can be determined based on the obtained predicted result set, so that the image processing device is simple, convenient and quick to realize and high in efficiency; in addition, in the training process of the image classification model, the category information of the image is converted into the category embedding vector by introducing the embedding processing of the category information, so that the category embedding vector can retain more potential information in the real category information, and further can reflect the inherent association with other category information, such as the difference and the similarity; the category information of the image is converted into corresponding codes by introducing the codes of the category information, so that the category codes can clearly represent the category of the image; by adopting an image classification model comprising a first classifier and a second classifier, the first classifier performs semi-supervised learning based on a tagged image and a tag embedded vector thereof and an untagged image, namely, information provided by the tagged image and the tag embedded vector thereof for the first classifier is utilized, useful information is extracted from the untagged image for self training, and further, internal association among images of different categories is better learned and understood, so that the image classification model is prevented from falling into an incorrect learning direction; meanwhile, the second classifier performs semi-supervised learning based on the tagged image and the class tag thereof and the untagged image, namely, the information provided by the tagged image and the class tag thereof for the second classifier is utilized to extract useful information from the untagged image for self training, and further, the prediction class information with clear boundaries is output; therefore, the first classifier and the second classifier are equivalent to respectively understanding and learning the input images from different directions, the image classification model is facilitated to learn more knowledge, the semi-supervised learning effect is enhanced, and the image classification model with higher prediction accuracy is obtained by utilizing limited label image training.
Obviously, the image processing apparatus provided in the embodiment of the present application may be used as an execution subject of the image processing method shown in fig. 3, so that the functions of the image processing method implemented in fig. 3 can be implemented. Since the principle is the same, the description is not repeated here.
Fig. 6 is a schematic structural view of an electronic device according to an embodiment of the present application. Referring to fig. 6, at the hardware level, the electronic device includes a processor, and optionally an internal bus, a network interface, and a memory. The Memory may include a Memory, such as a Random-Access Memory (RAM), and may further include a non-volatile Memory (non-volatile Memory), such as at least 1 disk Memory. Of course, the electronic device may also include hardware required for other services.
The processor, network interface, and memory may be interconnected by an internal bus, which may be an ISA (Industry Standard Architecture ) bus, a PCI (Peripheral Component Interconnect, peripheral component interconnect standard) bus, or EISA (Extended Industry Standard Architecture ) bus, among others. The buses may be classified as address buses, data buses, control buses, etc. For ease of illustration, only one bi-directional arrow is shown in FIG. 6, but not only one bus or type of bus.
And the memory is used for storing programs. In particular, the program may include program code including computer-operating instructions. The memory may include memory and non-volatile storage and provide instructions and data to the processor.
The processor reads the corresponding computer program from the nonvolatile memory into the memory and then runs the computer program to form the training device of the image classification model on a logic level. The processor is used for executing the programs stored in the memory and is specifically used for executing the following operations:
inputting a sample set into a first classifier of an image classification model, and outputting a predicted category embedded vector of an image in the sample set, wherein the predicted category embedded vector is an embedded vector of predicted category information of the corresponding image, and the sample set comprises an unlabeled image and a labeled image;
inputting the sample set into a second classifier of the image classification model, and outputting a prediction type code of an image in the sample set, wherein the prediction type code is a code of prediction type information of the corresponding image;
determining total prediction loss of the image classification model based on a prediction category embedded vector and a prediction category code of the image in the sample set and label information of the sample set, wherein the label information comprises a category label of the labeled image and a label embedded vector, the category label is a code of real category information of the labeled image, and the label embedded vector is an embedded vector of real category information of the labeled image;
Based on the total predicted loss, respective network parameters of the first classifier and the second classifier are adjusted.
Alternatively, the processor reads the corresponding computer program from the nonvolatile memory into the memory and then runs the computer program to form the image processing apparatus on a logic level. The processor is used for executing the programs stored in the memory and is specifically used for executing the following operations:
inputting an image to be processed into an image classification model, outputting a prediction result set of the image to be processed, wherein the prediction result set comprises a prediction category embedding vector and/or a prediction category code, the image classification model comprises a first classifier and a second classifier, the first classifier is used for carrying out classification prediction on the image to be processed and carrying out embedding processing on obtained prediction category information to obtain the prediction category embedding vector of the image to be processed, the second classifier is used for carrying out classification prediction on the image to be processed and carrying out coding on obtained prediction category information to obtain the prediction category code of the image to be processed, and the image classification model is an image classification model trained based on the training method of the image classification model.
And determining the category of the image to be processed based on the prediction result set of the image to be processed.
The method performed by the training device of the image classification model disclosed in the embodiment of fig. 1 of the present application or the method performed by the image processing device disclosed in the embodiment of fig. 3 of the present application may be applied to or implemented by a processor. The processor may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware in a processor or by instructions in the form of software. The processor may be a general-purpose processor, including a central processing unit (Central Processing Unit, CPU), a network processor (Network Processor, NP), etc.; but also digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), field programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components. The disclosed methods, steps, and logic blocks in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present application may be embodied directly in the execution of a hardware decoding processor, or in the execution of a combination of hardware and software modules in a decoding processor. The software modules may be located in a random access memory, flash memory, read only memory, programmable read only memory, or electrically erasable programmable memory, registers, etc. as well known in the art. The storage medium is located in a memory, and the processor reads the information in the memory and, in combination with its hardware, performs the steps of the above method.
The electronic device may further perform the method of fig. 1 and implement the function of the training device of the image classification model in the embodiment shown in fig. 1, or the electronic device may further perform the method of fig. 3 and implement the function of the image processing device in the embodiment shown in fig. 3, which is not described herein.
Of course, other implementations, such as a logic device or a combination of hardware and software, are not excluded from the electronic device of the present application, that is, the execution subject of the following processing flows is not limited to each logic unit, but may be hardware or a logic device.
The embodiments of the present application also provide a computer-readable storage medium storing one or more programs, the one or more programs comprising instructions, which when executed by a portable electronic device comprising a plurality of application programs, enable the portable electronic device to perform the method of the embodiment of fig. 1, and in particular to perform the operations of:
inputting a sample set into a first classifier of an image classification model, and outputting a predicted category embedded vector of an image in the sample set, wherein the predicted category embedded vector is an embedded vector of predicted category information of the corresponding image, and the sample set comprises an unlabeled image and a labeled image;
Inputting the sample set into a second classifier of the image classification model, and outputting a prediction type code of an image in the sample set, wherein the prediction type code is a code of prediction type information of the corresponding image;
determining total prediction loss of the image classification model based on a prediction category embedded vector and a prediction category code of the image in the sample set and label information of the sample set, wherein the label information comprises a category label of the labeled image and a label embedded vector, the category label is a code of real category information of the labeled image, and the label embedded vector is an embedded vector of real category information of the labeled image;
based on the total predicted loss, respective network parameters of the first classifier and the second classifier are adjusted.
Alternatively, the instructions, when executed by a portable electronic device comprising a plurality of applications, enable the portable electronic device to perform the method of the embodiment shown in fig. 3, and in particular to:
inputting an image to be processed into an image classification model, outputting a prediction result set of the image to be processed, wherein the prediction result set comprises a prediction category embedding vector and/or a prediction category code, the image classification model comprises a first classifier and a second classifier, the first classifier is used for carrying out classification prediction on the image to be processed and carrying out embedding processing on obtained prediction category information to obtain the prediction category embedding vector of the image to be processed, the second classifier is used for carrying out classification prediction on the image to be processed and carrying out coding on obtained prediction category information to obtain the prediction category code of the image to be processed, and the image classification model is an image classification model trained based on the training method of the image classification model.
And determining the category of the image to be processed based on the prediction result set of the image to be processed.
In summary, the foregoing description is only of the preferred embodiments of the present application, and is not intended to limit the scope of the present application. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the protection scope of the present application.
The system, apparatus, module or unit set forth in the above embodiments may be implemented in particular by a computer chip or entity, or by a product having a certain function. One typical implementation is a computer. In particular, the computer may be, for example, a personal computer, a laptop computer, a cellular telephone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.
Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises the element.
In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for system embodiments, since they are substantially similar to method embodiments, the description is relatively simple, as relevant to see a section of the description of method embodiments.

Claims (13)

1. A method for training an image classification model, comprising:
inputting a sample set into a first classifier of an image classification model, and outputting a predicted category embedded vector of an image in the sample set, wherein the predicted category embedded vector is an embedded vector of predicted category information of the corresponding image, and the sample set comprises an unlabeled image and a labeled image;
Inputting the sample set into a second classifier of the image classification model, and outputting a prediction type code of an image in the sample set, wherein the prediction type code is a code of prediction type information of the corresponding image;
determining total prediction loss of the image classification model based on a prediction category embedded vector and a prediction category code of the image in the sample set and label information of the sample set, wherein the label information comprises a category label of the labeled image and a label embedded vector, the category label is a code of real category information of the labeled image, and the label embedded vector is an embedded vector of real category information of the labeled image;
based on the total predicted loss, respective network parameters of the first classifier and the second classifier are adjusted.
2. The method of claim 1, wherein the determining the total prediction loss of the image classification model based on the prediction category embedding vector and the prediction category encoding of the image in the sample set and the label information of the sample set comprises:
determining a first prediction loss based on a prediction category embedding vector of the image in the sample set and a label embedding vector of the labeled image, the first prediction loss being indicative of a prediction loss generated by the first classifier;
Determining a second prediction loss based on a prediction class encoding of the images in the sample set and a class label of the tagged images, the second prediction loss being indicative of a prediction loss generated by the second classifier;
based on the first predicted loss and the second predicted loss, a total predicted loss of the image classification model is determined.
3. The method of claim 2, wherein prior to inputting a sample set into a first classifier of an image classification model and inputting the sample set into a second classifier of the image classification model, the method further comprises:
performing various enhancement treatments on the label-free image to obtain a plurality of enhancement images;
the first classifier for inputting a sample set into an image classification model includes:
inputting the tagged image and the plurality of enhanced images into the first classifier to obtain respective prediction category embedded vectors of the tagged image and the plurality of enhanced images;
the inputting the sample set into a second classifier of the image classification model comprises:
and inputting the tagged image and the plurality of enhanced images into the second classifier to obtain respective prediction category codes of the tagged image and the plurality of enhanced images.
4. A method according to claim 3, wherein the unlabeled image is subjected to a plurality of enhancement processes to obtain a plurality of enhanced images, comprising:
performing weak enhancement processing on the label-free image to obtain a weak enhancement image; the method comprises the steps of,
and carrying out strong enhancement processing on the label-free image to obtain a strong enhancement image.
5. The method of claim 4, wherein the first predicted loss comprises a first supervised loss and a first unsupervised loss;
the determining a first prediction loss based on a prediction category embedding vector of the image in the sample set and a label embedding vector of the labeled image comprises:
determining the first supervised penalty based on the predicted category embedded vector for the tagged image and the tag embedded vector for the tagged image;
determining a pseudo tag embedding vector of the unlabeled image based on the predicted category embedding vector of the weakly enhanced image and the tag embedding vector of the labeled image;
the first unsupervised penalty is determined based on the pseudo tag embedded vector of the unlabeled image and the prediction category embedded vector of the strongly enhanced image.
6. The method of claim 5, wherein the determining the pseudo tag-embedded vector for the unlabeled image based on the predicted class-embedded vector for the weakly enhanced image and the tag-embedded vector for the labeled image comprises:
Determining classification probabilities of the weak enhanced image corresponding to various categories respectively based on the similarity between the predicted category embedded vector of the weak enhanced image and the tag embedded vectors of the tagged images in the sample set, wherein the sample set comprises the tagged images of the various categories;
based on the classification probabilities of the weak enhanced image corresponding to various categories respectively, determining pseudo category information of the unlabeled image;
and performing embedding processing on the pseudo category information of the weak enhanced image, and determining the obtained embedded vector as a pseudo tag embedded vector of the label-free image.
7. The method of claim 6, wherein determining pseudo-category information for the unlabeled image based on classification probabilities of the weakly enhanced image in respective ones of a plurality of categories comprises:
selecting the maximum classification probability exceeding a first preset probability threshold from the classification probabilities respectively corresponding to the weak enhanced image in a plurality of categories;
and determining the category information corresponding to the selected classification probability as the pseudo category information of the label-free image.
8. The method of claim 4, wherein the second predicted loss comprises a second supervised loss and a second unsupervised loss;
Said determining a second prediction loss based on a prediction class coding of the images in the sample set and a class label of the tagged image, comprising:
determining the second supervised loss based on a predictive class encoding and class labels for the tagged images;
determining a pseudo class label of the unlabeled image based on a predictive class coding of the weakly enhanced image;
the second unsupervised penalty is determined based on the pseudo-class label of the unlabeled image and the predictive class coding of the strongly enhanced image.
9. The method of claim 2, wherein the determining the total predicted loss of the image classification model based on the first predicted loss and the second predicted loss comprises:
based on the weights corresponding to the first classifier and the second classifier, carrying out weighted summation on the first prediction loss and the second prediction loss to obtain the total prediction loss of the image classification model;
after adjusting the respective network parameters of the first classifier and the second classifier based on the total predicted loss, the method further comprises:
and if the image classification model does not meet the preset training stop condition, reducing the weight corresponding to the first classifier, and repeatedly executing the steps from the first classifier for inputting the sample set into the image classification model to the step of adjusting the network parameters of each of the first classifier and the second classifier based on the total prediction loss until the image classification model meets the preset training stop condition.
10. An image processing method, comprising:
inputting an image to be processed into an image classification model, and outputting a prediction result set of the image to be processed, wherein the prediction result set comprises a prediction category embedding vector and/or a prediction category code, the image classification model comprises a first classifier and a second classifier, the first classifier is used for carrying out classification prediction on the image to be processed and carrying out embedding processing on obtained prediction category information to obtain the prediction category embedding vector of the image to be processed, and the second classifier is used for carrying out classification prediction on the image to be processed and coding the obtained prediction category information to obtain the prediction category code of the image to be processed;
and determining the category of the image to be processed based on the prediction result set of the image to be processed.
11. A training device for an image classification model, comprising:
the first prediction module is used for inputting a sample set into a first classifier of an image classification model, outputting a prediction category embedded vector of an image in the sample set, wherein the prediction category embedded vector is an embedded vector of prediction category information of a corresponding image, and the sample set comprises an unlabeled image and a labeled image;
The second prediction module is used for inputting the sample set into a second classifier of the image classification model and outputting a prediction type code of an image in the sample set, wherein the prediction type code is a code of prediction type information of the corresponding image;
the loss determination module is used for determining total prediction loss of the image classification model based on a prediction category embedded vector and a prediction category code of the image in the sample set and label information of the sample set, wherein the label information comprises a category label of the labeled image and a label embedded vector, the category label is the code of real category information of the labeled image, and the label embedded vector is the embedded vector of the real category information of the labeled image;
and the adjusting module is used for adjusting the network parameters of each of the first classifier and the second classifier based on the total prediction loss.
12. An electronic device, comprising:
a processor;
a memory for storing the processor-executable instructions;
wherein the processor is configured to execute the instructions to implement the method of any one of claims 1 to 10.
13. A computer readable storage medium, characterized in that instructions in the storage medium, when executed by a processor of an electronic device, enable the electronic device to perform the method of any one of claims 1 to 10.
CN202210259085.XA 2022-03-16 2022-03-16 Training method of image classification model, image processing method and device Pending CN116824194A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210259085.XA CN116824194A (en) 2022-03-16 2022-03-16 Training method of image classification model, image processing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210259085.XA CN116824194A (en) 2022-03-16 2022-03-16 Training method of image classification model, image processing method and device

Publications (1)

Publication Number Publication Date
CN116824194A true CN116824194A (en) 2023-09-29

Family

ID=88127983

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210259085.XA Pending CN116824194A (en) 2022-03-16 2022-03-16 Training method of image classification model, image processing method and device

Country Status (1)

Country Link
CN (1) CN116824194A (en)

Similar Documents

Publication Publication Date Title
CN109598231B (en) Video watermark identification method, device, equipment and storage medium
CN114549913B (en) Semantic segmentation method and device, computer equipment and storage medium
CN113298096B (en) Method, system, electronic device and storage medium for training zero sample classification model
CN109934253B (en) Method and device for generating countermeasure sample
CN112149754B (en) Information classification method, device, equipment and storage medium
CN111401062A (en) Text risk identification method, device and equipment
CN116129224A (en) Training method, classifying method and device for detection model and electronic equipment
CN113435499A (en) Label classification method and device, electronic equipment and storage medium
CN116630480B (en) Interactive text-driven image editing method and device and electronic equipment
CN111858999B (en) Retrieval method and device based on segmentation difficult sample generation
CN113221717A (en) Model construction method, device and equipment based on privacy protection
CN117349402A (en) Emotion cause pair identification method and system based on machine reading understanding
CN114155388B (en) Image recognition method and device, computer equipment and storage medium
CN117523218A (en) Label generation, training of image classification model and image classification method and device
CN115905613A (en) Audio and video multitask learning and evaluation method, computer equipment and medium
CN116824194A (en) Training method of image classification model, image processing method and device
CN115393914A (en) Multitask model training method, device, equipment and storage medium
CN115205573A (en) Image processing method, device and equipment
CN115700555A (en) Model training method, prediction method, device and electronic equipment
CN114443877A (en) Image multi-label classification method, device, equipment and storage medium
CN113569094A (en) Video recommendation method and device, electronic equipment and storage medium
CN116824287A (en) Training method of image classification model, image processing method and device
CN113033212B (en) Text data processing method and device
CN117456219A (en) Training method of image classification model, image classification method and related equipment
CN117437684B (en) Image recognition method and device based on corrected attention

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination