CN114385846A

CN114385846A - Image classification method, electronic device, storage medium and program product

Info

Publication number: CN114385846A
Application number: CN202111621526.8A
Authority: CN
Inventors: 张培圳; 何银银
Original assignee: Shenzhen Kuangshi Jinzhi Technology Co ltd; Beijing Kuangshi Technology Co Ltd; Beijing Megvii Technology Co Ltd
Current assignee: Shenzhen Kuangshi Jinzhi Technology Co ltd; Beijing Kuangshi Technology Co Ltd; Beijing Megvii Technology Co Ltd
Priority date: 2021-12-23
Filing date: 2021-12-23
Publication date: 2022-04-22

Abstract

The application provides an image classification method, electronic equipment, a storage medium and a program product, relates to the technical field of image processing, and aims to realize accurate classification of images to be classified by using an image classification model. The method comprises the following steps: acquiring an image to be classified; inputting the image to be classified into an image classification model to obtain a classification prediction result of the image to be classified, wherein the image classification model is obtained by training a preset model by using basic loss and inter-class loss; the basic loss is determined according to the classification prediction result of each sample image predicted by the preset model and the real class label of each sample image; the inter-class loss is determined according to the classification prediction result of each sample image predicted by the preset model and the soft class label of each sample image, and the soft class label of one sample image is determined according to the confidence degree that each sample image is predicted as the real class of the sample image.

Description

Image classification method, electronic device, storage medium and program product

Technical Field

The present application relates to the field of image processing technologies, and in particular, to an image classification method, an electronic device, a storage medium, and a program product.

Background

With the development of computer technology, the image classification model can realize the classification of various images to be classified, wherein the image classification model learns and distinguishes the characteristics of each sample image in the training stage. If the samples used for model training are concentrated, the number of the images of the samples in each category is unbalanced, which may cause that the image classification model cannot learn well and distinguish the characteristics of the images of the samples in each category, thereby affecting the accuracy of the image classification model.

However, most sample images in the sample set have the problem of non-uniform categories, and therefore, the trained image classification model often has a larger improvement space in terms of accuracy. How to improve the accuracy of the image classification model has important significance for the future development of the image classification model.

Disclosure of Invention

In view of the above, embodiments of the present invention provide an image classification method, an electronic device, a storage medium, and a program product to overcome or at least partially solve the above problems.

In a first aspect of the embodiments of the present invention, there is provided an image classification method, where the method includes:

acquiring an image to be classified;

inputting the image to be classified into an image classification model to obtain a classification prediction result of the image to be classified, wherein the image classification model is obtained by training a preset model by using basic loss and inter-class loss;

the basic loss is determined according to the classification prediction result of each sample image predicted by the preset model and the real class label of each sample image;

the inter-class loss is determined according to the classification prediction result of each sample image predicted by the preset model and the soft class label of each sample image, and the soft class label of one sample image is determined according to the confidence degree that each sample image is predicted as the real class of the sample image.

Optionally, the training process of the image classification model includes the following steps:

obtaining a plurality of sample images carrying real category labels, and inputting the sample images into the preset model to obtain a classification prediction result of each sample image;

establishing the basic loss according to the difference between the classification prediction result of each sample image and the real class label of the sample image;

establishing the inter-class loss according to the difference between the classification prediction result of each sample image and the soft class label of the sample image;

and training the preset model based on the basic loss and the inter-class loss to obtain the classification model.

Optionally, establishing the inter-class loss according to a difference between the classification prediction result of each sample image and the soft class label of the sample image, includes:

establishing a confusion matrix according to the classification prediction result of each sample image and the real class label carried by the sample image, wherein elements in the confusion matrix are characterized: the real category is a plurality of samples of the category of the row where the element is located, and is predicted as the average confidence of the category of the column where the element is located;

taking a column vector corresponding to each sample image in the confusion matrix as a soft class label of the sample image, wherein the column vector corresponding to one sample image represents the confidence coefficient of each sample predicted as the real class of the sample;

and establishing the inter-class loss according to the difference between the classification prediction result of each sample image and the column vector corresponding to the sample image.

Optionally, establishing a confusion matrix according to the classification prediction result of each sample image and the real class label carried by the sample image, including:

obtaining the confidence coefficient of each sample image which is predicted to be of each category except the background category;

normalizing the confidence coefficient of each predicted sample image in each category except the background category to obtain a normalized prediction result of each sample image;

and establishing a confusion matrix according to the normalized prediction result of each sample image and the carried real category label.

Optionally, obtaining a classification prediction result of each sample image includes:

obtaining a classification prediction result of each sample image in different training batches;

establishing a confusion matrix according to the classification prediction result of each sample image and the real class label carried by the sample image, wherein the confusion matrix comprises the following steps:

establishing confusion matrixes of different training batches according to the real class label of each sample image and the classification prediction result of the sample image in each training batch;

and averaging elements at corresponding positions in the confusion matrixes of different training batches to obtain elements of the confusion matrixes, and further establishing the confusion matrixes.

Optionally, the preset model is a classification branch of an untrained instance segmentation model, and the untrained instance segmentation model further includes a position prediction branch; the training process of the classification branch in the example segmentation model at least comprises the following steps:

acquiring image characteristics of image samples containing sample objects, wherein each sample object carries a real class label of the sample object;

inputting the image characteristics of the image sample into the untrained instance segmentation model to obtain a first prediction position frame of each sample object in the image sample output by the position prediction branch and a first prediction category of each sample object in the image sample output by the classification branch;

updating the image characteristics of the image samples based on the first prediction position frame and the first prediction category of each sample object;

obtaining a second prediction category of each sample object in the image sample output by the classification branch based on the updated image characteristics of the image sample;

establishing the basic loss of the classification branch according to the second prediction category of each sample object in each sample and the difference between the real category labels of each sample object;

establishing inter-class loss of the classification branch according to the second prediction class of each sample object in each sample and the difference between the confidence degrees of the real classes of each sample object predicted as the sample object;

training the classification branch based on the base loss and the inter-class loss of the classification branch.

Optionally, the method further comprises:

acquiring a second prediction position frame of each sample object in the image sample, which is output by the position prediction branch based on the updated image characteristics of the image sample;

training the classification branch based on the basis loss and the inter-class loss of the classification branch, including:

training the classification branches based on the basic loss and the weight of the classification branches and the inter-class loss and the weight of the classification branches to obtain intermediate branches;

repeating the step of training the classified branch to obtain an intermediate branch by taking the intermediate branch as the classified branch, the second prediction position frame as the first prediction position frame and the second prediction category as the first prediction category;

and taking the intermediate branch obtained at the last time as a classification branch of the example segmentation model.

Optionally, the step of training the classification branch to obtain an intermediate branch is repeated, and includes:

in the process of repeatedly training the classification branches to obtain the middle branches, the weight lost between the classes of the classification branches is gradually increased.

Optionally, the method further comprises:

obtaining an unclassified image containing an object to be classified;

and inputting the unclassified image into the example segmentation model to obtain the prediction category of each object to be classified in the unclassified image determined by the classification branch of the example segmentation model.

In a second aspect of the embodiments of the present invention, an electronic device is provided, which includes a memory, a processor, and a computer program stored on the memory, and the processor executes the computer program to implement the image classification method disclosed in the embodiments of the present application.

In a third aspect of the embodiments of the present invention, a computer-readable storage medium is provided, on which a computer program/instruction is stored, which when executed by a processor implements the image classification method as disclosed in the embodiments of the present application.

In a fourth aspect of the embodiments of the present invention, a computer program product is provided, which includes a computer program/instruction, and when the computer program/instruction is executed by a processor, the computer program/instruction implements the image classification method as disclosed in the embodiments of the present application.

In a fifth aspect of the embodiments of the present invention, there is provided an object classification apparatus, including:

the acquisition module is used for acquiring an image to be classified;

the classification module is used for inputting the image to be classified into an image classification model to obtain a classification prediction result of the image to be classified, wherein the image classification model is obtained by training a preset model by using basic loss and inter-class loss; the basic loss is determined according to the classification prediction result of each sample image predicted by the preset model and the real class label of each sample image; the inter-class loss is determined according to the classification prediction result of each sample image predicted by the preset model and the soft class label of each sample image, and the soft class label of one sample image is determined according to the confidence degree that each sample image is predicted as the real class of the sample image.

The embodiment of the invention has the following advantages:

in this embodiment, an image to be classified may be acquired; inputting the image to be classified into an image classification model to obtain a classification prediction result of the image to be classified, wherein the image classification model is obtained by training a preset model by using basic loss and inter-class loss; the basic loss is determined according to the classification prediction result of each sample image predicted by the preset model and the real class label of each sample image; the inter-class loss is determined according to the classification prediction result of each sample image predicted by the preset model and the soft class label of each sample image, and the soft class label of one sample image is determined according to the confidence degree that each sample image is predicted as the real class of the sample image. In this way, the base loss is determined based on the classification prediction result of the sample image and the real class label, so that training the preset model using the base loss can guide the preset model to try to classify the sample image into the real class to which the sample image belongs when classifying the sample image. The inter-class loss is determined based on the classification prediction result of the sample image and the confidence of the real class label of each sample image predicted as the sample image; therefore, the preset model is trained by using the inter-class loss, and for each sample image, the confidence level of the preset model for misclassifying the sample images of other classes into the real class of the sample image is equal to the confidence level of the sample image for misclassifying the sample image into other erroneous classes; the error classification correction is realized by adopting the idea of countertraining. Therefore, the preset model is trained by using the basic loss and the inter-class loss at the same time, and the image classification model with higher accuracy can be obtained, so that the image classification model is used for accurately classifying the image to be classified.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings needed to be used in the description of the embodiments of the present application will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art that other drawings can be obtained according to these drawings without inventive exercise.

FIG. 1 is a flow chart of the steps of a method for image classification according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of an embodiment of an confusion matrix;

FIG. 3 is a diagram illustrating an example of averaging confusion matrices from different training batches to obtain a final confusion matrix;

fig. 4 is a schematic structural diagram of an image classification apparatus according to an embodiment of the present invention.

Detailed Description

In order to make the aforementioned objects, features and advantages of the present application more comprehensible, the present application is described in further detail with reference to the accompanying drawings and the detailed description.

In recent years, technical research based on artificial intelligence, such as computer vision, deep learning, machine learning, image processing, and image recognition, has been actively developed. Artificial Intelligence (AI) is an emerging scientific technology for studying and developing theories, methods, techniques and application systems for simulating and extending human Intelligence. The artificial intelligence subject is a comprehensive subject and relates to various technical categories such as chips, big data, cloud computing, internet of things, distributed storage, deep learning, machine learning and neural networks. Computer vision is used as an important branch of artificial intelligence, particularly a machine is used for identifying the world, and the computer vision technology generally comprises the technologies of face identification, living body detection, fingerprint identification and anti-counterfeiting verification, biological feature identification, face detection, pedestrian detection, target detection, pedestrian identification, image processing, image identification, image semantic understanding, image retrieval, character identification, video processing, video content identification, behavior identification, three-dimensional reconstruction, virtual reality, augmented reality, synchronous positioning and map construction (SLAM), computational photography, robot navigation and positioning and the like. With the research and progress of artificial intelligence technology, the technology is applied to various fields, such as security, city management, traffic management, building management, park management, face passage, face attendance, logistics management, warehouse management, robots, intelligent marketing, computational photography, mobile phone images, cloud services, smart homes, wearable equipment, unmanned driving, automatic driving, smart medical treatment, face payment, face unlocking, fingerprint unlocking, testimony verification, smart screens, smart televisions, cameras, mobile internet, live webcasts, beauty treatment, medical beauty treatment, intelligent temperature measurement and the like.

In order to solve the technical problem that the accuracy of an image classification model in the related technology is not high, the applicant proposes: and training the model by using both the basic loss and the inter-class loss so as to improve the accuracy of the image classification model obtained by training.

Referring to fig. 1, a flowchart illustrating steps of an image classification method according to an embodiment of the present invention is shown, and as shown in fig. 1, the image classification method may specifically include the following steps:

step S11: acquiring an image to be classified;

step S12: inputting the image to be classified into an image classification model to obtain a classification prediction result of the image to be classified, wherein the image classification model is obtained by training a preset model by using basic loss and inter-class loss; the basic loss is determined according to the classification prediction result of each sample image predicted by the preset model and the real class label of each sample image; the inter-class loss is determined according to the classification prediction result of each sample image predicted by the preset model and the soft class label of each sample image, and the soft class label of one sample image is determined according to the confidence degree that each sample image is predicted as the real class of the sample image.

The classification of the image to be classified may refer to the classification of the image itself (for example, the image may be a landscape image, a person image, an animal image, or the like), or may refer to the classification of a foreground object in the image (for example, a cat photo is classified as a cat photo, and a dog photo is classified as a dog photo). The embodiments of the present application take classification of foreground objects in an image as an example, and describe an image classification method, it can be understood that it is a similar idea to classify an image itself.

Inputting the image to be classified into an image classification model, wherein the image classification model can predict the confidence coefficient of the image to be classified belonging to each class, and the confidence coefficient of the image to be classified belonging to each class is used as the classification prediction result of the image to be classified to be output, or the class with the highest confidence coefficient in each class is used as the classification prediction result of the image to be classified to be output.

The image classification model is obtained by training a preset model by using the basic loss and the inter-class loss. The classification prediction result of the sample image determined by the preset model in the training process is the confidence of the sample image belonging to each class. Each category refers to a category and a background category of each of all sample images used in training. The order of each category may be set, and then a vector is generated according to the confidence level that the sample image determined by the preset model belongs to each category, and the vector represents the classification prediction result of the sample image. For example, there are four categories of cat, dog, pig and background, and the four categories are set in the following order: the confidence degrees that a sample image determined by a preset model belongs to each category of cat, dog, pig and background are respectively as follows: cat-0.5, dog-0.3, pig-0.1 and background 0.1, the prediction result of the classification of the sample image determined by the preset model can be [0.5,0.3,0.1,0.1] in a vector representation.

Optionally, in order to make the preset model learn the features of the sample images of each category except the background category as much as possible, the confidence coefficient to which the background category belongs in the classification prediction result may be deleted, and then the remaining confidence coefficients are normalized. Following the previous example, the classification prediction results [0.5,0.3,0.1,0.1] may result in new classification predictions of approximately [0.56,0.33,0.11] after removing the confidence of the background class.

The real category label of the sample image represents the category to which the sample image really belongs, the category corresponds to the classification prediction result of the sample image, the real category label can also be a vector, and the order of the category represented by each element in the vector of the real category label is the same as the order of the category represented by each element in the classification prediction result of the sample image. Continuing with the previous example, if the category to which the sample image belongs is a dog, the true category label of the sample image may be [0,1,0,0], and of course, after deleting the background category, the true category label of the sample image may be [0,1,0 ].

For each sample image, a base loss may be established based on the classification prediction result of the sample image and the true class label of the sample image. Training the preset model by using the basis loss can guide the preset model to try to classify the sample image into the real category to which the sample image belongs when classifying the sample image.

The preset model may also be trained using inter-class losses determined based on the classification prediction results of the sample images and the soft class labels, the soft class label of a sample image being determined based on the confidence with which each sample image is predicted as the true class of the sample image. It will be appreciated that the soft class labels of sample images of the same truth class are the same. The soft class label can also be a vector corresponding to the classification prediction result of the sample image, and the sequence of the class of each element representation in the vector of the soft class label is the same as the sequence of the class of each element representation in the classification prediction result of the sample image.

For example, if all the sample images have a category of cat, dog, and pig in total, the soft class label of the sample image with a true category of dog is the confidence that all the sample images are predicted as the dog, and if all the cat sample images are predicted as the dog with an average confidence of 0.2, all the dog sample images are predicted as the dog with an average confidence of 0.6, and all the pig sample images are predicted as the dog with an average confidence of 0.1, the soft class label of the dog sample image may be [0.2,0.6,0.1], and optionally, the soft class label may be normalized to [0.22,0.67,0.11 ].

Training the preset model by using the inter-class loss, wherein for each sample image, the confidence coefficient of the preset model for misclassifying the sample images of other classes into the real class of the sample image is equal to the confidence coefficient of the sample image for misclassifying the sample image into the other classes; therefore, a kind of countertraining is formed, the guide model distinguishes the differences between the sample images predicted to be in the same category, and the error classification correction is realized.

By adopting the technical scheme of the embodiment of the application, the basic loss is determined based on the classification prediction result of the sample image and the real class label, so that the preset model is trained by using the basic loss, and the preset model can be guided to classify the sample image into the real class to which the sample image belongs when being classified. The inter-class loss is determined based on the classification prediction result of the sample image and the confidence of the real class label of each sample image predicted as the sample image; therefore, the preset model is trained by using the inter-class loss, and for each sample image, the confidence level of the preset model for misclassifying the sample images of other classes into the real class of the sample image is equal to the confidence level of the sample image for misclassifying the sample image into other classes; the error classification correction is realized by adopting the idea of countertraining. Therefore, the preset model is trained by simultaneously utilizing the basic loss and the inter-class loss, and the image classification model with higher accuracy can be obtained.

Optionally, the image classification model is obtained by training a preset model based on a basic loss and an inter-class loss by using a plurality of sample images carrying real class labels. Optionally, the base loss is a cross-entropy loss established according to a difference between the classification prediction result of each sample image and the real class label of the sample image, and the inter-class loss is a cross-entropy loss established according to a difference between the classification prediction result of each sample image and the soft class label of the sample image.

Aiming at the problem that part of sample images are easily mistakenly classified into other categories, the preset model is trained by adopting inter-category loss, so that the model can be prompted to pay attention to the difference between the sample images which are easily mistakenly classified into another category, and the model can realize the difference between the sample images of the category and the sample images of the other category. For example, the real category of the sample image is cat or dog, and a part of cats (for example, hairless cats) can be easily predicted as dogs by the preset model, then soft category labels can be generated according to the confidence that each sample image is predicted as a dog, and the model is trained by using the soft category labels to establish inter-class loss, so that the model focuses more on the difference between each sample image (hairless cat and real dog) predicted as a dog, and the trained model can distinguish the hairless cat from the dog as much as possible.

Alternatively, when training the model, only for the class sample image with a high probability of being mispredicted, the inter-class loss may be calculated for the class into which the sample image of the class is easily mistakenly classified.

Alternatively, the inter-class loss can be established by using a confusion matrix, which is a visualization tool and can more intuitively reflect the real class of the sample image, the classification prediction result of the sample image by the preset model, and the relation between the real class and the classification prediction result. In the embodiment of the present application, the real category of the sample image is characterized by a row, and the prediction category of the sample image is characterized by a list, it can be understood that the characterization of the row and the characterization of the column may also be exchanged, and other technical means may be adjusted accordingly.

The elements in the confusion matrix are determined based on the average classification prediction result of each type of sample image. Each row of the confusion matrix represents the average classification prediction result of each sample image of the real category corresponding to the row, and each column represents the average confidence of each category of sample images being predicted as the prediction category corresponding to the column.

FIG. 2 shows a schematic representation of a confusion matrix characterizing that each sample image of the real category cat has a confidence of 0.6 for cat prediction, 0.3 for dog prediction, and 0.1 for pig prediction; the confidence coefficient of each sample with the real category of the dog, which is predicted to be a cat, is 0.2, the confidence coefficient of each sample with the real category of the dog, which is predicted to be a dog, is 0.7, and the confidence coefficient of each sample with the real category of the dog, which is predicted to be a pig, is 0.1; each sample with the true category of pigs had a confidence of 0.1 for cat prediction, 0.1 for dog prediction, and 0.8 for pig prediction. Wherein the first column of the confusion matrix characterizes all samples, all cat samples having a probability of being predicted as cats of 0.6, all dog samples having a probability of being mispredicted as cats of 0.2, and all pig samples having a probability of being mispredicted as cats of 0.1.

The column vector corresponding to each sample image represents a vector formed by each element in a column where a prediction type corresponding to the real type of the sample image is located in the confusion matrix, and inter-type loss can be established according to the difference between the classification prediction result of each sample image and the column vector corresponding to the sample image. For example, the soft category labels of all cat sample images in fig. 2 are [0.6,0.2,0.1], the soft category labels of all dog sample images are [0.3,0.7,0.2], and the soft category labels of all pig sample images are [0.1,0.1,0.7 ].

In the actual training process, when the confidence coefficient of each sample image determined by the preset model, which belongs to each category, is preset, the confidence coefficient of each sample image, which belongs to the background category, is also determined. Optionally, in order to enable the preset model to learn the features of the sample images of each category except the background category as much as possible and improve the classification accuracy of the image classification model obtained by training, when the confusion matrix is established, only the column to which the background category belongs may be deleted, and then the remaining elements in the confusion matrix are normalized; or only obtaining the confidence coefficient of each sample image which is predicted to be of each category except the background category, normalizing the confidence coefficient of each sample image which is predicted to be of each category except the background category to obtain the normalized prediction result of each sample image, and then establishing a confusion matrix according to the normalized prediction result of each sample image and the carried real category label.

Optionally, multiple rounds of training are performed on the preset model, and the classification prediction result of each sample image obtained by each round of training is accumulated to obtain a confusion matrix. The method can be that a confusion matrix of the training batch is generated directly according to the classification prediction result of each sample image in each round of training, then elements of corresponding positions in a plurality of confusion matrices are averaged to obtain elements of a final confusion matrix, and then the confusion matrix is established. Fig. 3 shows a schematic diagram of averaging the confusion matrices of different training batches to obtain a final confusion matrix, where the confusion matrices of two different training batches are averaged to obtain the final confusion matrix.

Or directly averaging the classification prediction results of each sample image in a plurality of training batches, and establishing a confusion matrix according to the averaged classification prediction results. Averaging the classification prediction results means that the confidence degrees of the classes are averaged, for example, if the classification prediction results of one sample image are cat-0.5 and dog-0.3 for one time, cat 0.3 and dog 0.1 for the other time, the classification prediction results of the two times are averaged, and the averaged classification prediction results are cat-0.4 and dog-0.2.

Optionally, the averaging of the confusion matrices of different training batches and the averaging of the classification predictions of different training batches may be exponential moving averages, which increase the weight of the classification prediction of training batches at a later time.

By adopting the technical scheme of the embodiment of the application, the confusion matrix is established by averaging the classification prediction results of multiple rounds of training, so that the obtained confusion matrix is more accurate, the training effect of the model is prevented from being seriously influenced by the result of one-time training, and the accuracy of the trained image classification model is improved.

Optionally, as an embodiment, the preset model is a classification branch of an untrained instance segmentation model, and the untrained instance segmentation model further includes a location prediction branch; the classification branch in the untrained example segmentation model can be trained by adopting the following steps to obtain the classification branch in the example segmentation model:

step S21: acquiring image characteristics of image samples containing sample objects, wherein each sample object carries a real class label of the sample object.

And training classification branches in the example segmentation model by using an image sample, wherein the image sample comprises at least one sample object, and each sample object carries a real class label of the sample object. And acquiring image characteristics of the image sample, including color characteristics, texture characteristics, shape characteristics, spatial relationship characteristics and the like of the image sample. The method for extracting the image features of the image sample is not limited in the embodiment of the application.

Step S22: and inputting the image characteristics of the image sample into the untrained instance segmentation model to obtain a first prediction position frame of each sample object in the image sample output by the position prediction branch and a first prediction category of each sample object in the image sample output by the classification branch.

Alternatively, the image features of the image sample may be input into the untrained example segmentation model, or the image sample may be directly input into the untrained example segmentation model, and the image features of the image sample may be extracted from the example segmentation model.

A location prediction branch in the example segmentation model may predict the location of a sample object in the image sample and a classification branch may predict the class of the sample object. When the example segmentation model is trained, the position prediction branch outputs a first prediction position frame of each sample object in the image sample, and the first prediction category of each sample object in the image sample output by the branch is classified.

Step S23: and updating the image characteristics of the image sample based on the first prediction position frame and the first prediction category of each sample object.

After the first predicted position frame and the first predicted category are obtained, the feature of the first predicted position frame and the feature of the first predicted category may be fused into the image feature of the image sample to obtain an updated image feature.

Step S24: and obtaining a second prediction category of each sample object in the image sample output by the classification branch based on the updated image characteristics of the image sample.

Based on the updated image features, the classification branch may output a second prediction class for each sample object in the image sample.

Step S25: and establishing the basic loss of the classification branch according to the second prediction category of each sample object in each sample and the difference between the real category labels of each sample object.

The basic loss of the classification branch may be established based on the second prediction category of each sample object and the real category label of each sample object, and the specific establishment method of the basic loss of the classification branch may refer to the method for establishing the basic loss of the classification model, which is not described herein again.

Step S26: establishing inter-class loss of the classification branch according to the second prediction class of each sample object in each sample and the difference between the confidence degrees of the real classes of each sample object predicted as the sample object;

based on the second prediction category of each sample object and the confidence that each sample object is predicted as the true category of the sample object, the inter-class loss of the classification branch may be established, and the specific establishment method of the inter-class loss of the classification branch may refer to the method for establishing the inter-class loss of the classification model, which is not described herein again.

Step S27: training the classification branch based on the base loss and the inter-class loss of the classification branch.

And training the classification branches based on the basic loss and the inter-class loss of the classification branches to obtain the trained classification branches.

By adopting the technical scheme of the embodiment of the application, when the classification branch is trained, the image characteristics are updated by utilizing the first prediction position frame and the first prediction category of the classification branch, and the updated image characteristics are utilized to classify the sample object, so that a more accurate second prediction category can be obtained; meanwhile, the classification branches are trained by adopting the basic loss and the inter-class loss of the classification branches, so that the accuracy of the classification branches obtained by training can be improved.

Optionally, as an embodiment, multiple rounds of training may be performed on the classification branch, a second predicted position frame of each sample object for position prediction classification prediction is obtained in each round of training, the image feature after the primary updating is updated again based on the second predicted position frame and the second prediction category to obtain an image feature after the secondary updating, and the classification branch after the round of training is taken as a middle branch.

And repeating the step of training the classification branch to obtain the intermediate branch by taking the intermediate branch as the classification branch, taking the second prediction position frame as the first prediction position frame and taking the second prediction category as the first prediction category until a preset condition is met, stopping training the classification branch, and taking the intermediate branch obtained at the last time as the trained classification branch of the example segmentation model. The preset condition may be that the accuracy of the instance segmentation model reaches a preset threshold. In the actual training, the applicant finds that the step of training the classification branches for 2 or 3 times to obtain the middle branch can be repeated to obtain the classification branch with better training effect. It will be appreciated that the base and inter-class losses of classification branches are continually updated with each round of classification prediction.

That is, the next round of training after obtaining the image features after the second update is: and training the classification branch based on the basic loss and the inter-class loss of the classification branch by using the image characteristics after secondary updating to obtain a third prediction class of the sample object and obtain a third prediction position frame output by the position prediction branch.

Optionally, the base loss and the inter-class loss of the classification branch have respective corresponding weights, and the greater weight is directly given to the inter-class loss of the classification branch, which may affect the classification accuracy of the classification branch. Thus, the inter-class penalty of the classification branch may be gradually increased in each round of training.

After the classification branch of the example segmentation model is trained, inputting the unclassified image containing the object to be classified into the example segmentation model, and obtaining the prediction category of each object to be classified in the unclassified image determined by the classification branch of the example segmentation model. And the object to be classified in the unclassified image is a foreground image in the unclassified image.

It can be understood that the base loss and the inter-class loss of the classification branch are only used when the classification branch is trained, and the classification branch of the trained example segmentation model is also subjected to multiple updates of image features in the model when the classification branch is actually applied, but the base loss and the inter-class loss of the classification branch are not involved.

It should be noted that, for simplicity of description, the method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present invention is not limited by the illustrated order of acts, as some steps may occur in other orders or concurrently in accordance with the embodiments of the present invention. Further, those skilled in the art will appreciate that the embodiments described in the specification are presently preferred and that no particular act is required to implement the invention.

Fig. 4 is a schematic structural diagram of an image classification apparatus according to an embodiment of the present invention, and as shown in fig. 4, the image classification apparatus includes an obtaining module and a classification module, where:

the acquisition module is used for acquiring images to be classified;

Optionally, as an embodiment, the training process of the image classification model includes the following steps:

Optionally, as an embodiment, establishing the inter-class loss according to a difference between the classification prediction result of each sample image and the soft class label of the sample image includes:

Optionally, as an embodiment, the creating a confusion matrix according to the classification prediction result of each sample image and the true class label carried by the sample image includes:

Optionally, as an embodiment, obtaining the classification prediction result of each sample image includes:

Optionally, as an embodiment, the preset model is a classification branch of an untrained instance segmentation model, and the untrained instance segmentation model further includes a position prediction branch; the training process of the classification branch in the example segmentation model at least comprises the following steps:

Optionally, as an embodiment, the apparatus further includes:

the second obtaining module is used for obtaining a second prediction position frame of each sample object in the image sample, which is output by the position prediction branch based on the image characteristics after the image sample is updated;

Optionally, as an embodiment, the step of training the classification branch to obtain an intermediate branch is repeated, and includes:

Optionally, as an embodiment, the apparatus further includes:

the image acquisition module is used for acquiring an unclassified image containing an object to be classified;

and the class prediction module is used for inputting the unclassified image into the example segmentation model to obtain the prediction class of each object to be classified in the unclassified image determined by the classification branch of the example segmentation model.

It should be noted that the device embodiments are similar to the method embodiments, so that the description is simple, and reference may be made to the method embodiments for relevant points.

The embodiment of the present invention further provides an electronic device, which includes a memory, a processor, and a computer program stored on the memory, where the processor executes the computer program to implement the image classification method disclosed in the embodiment of the present application.

Embodiments of the present invention further provide a computer-readable storage medium, on which a computer program/instruction is stored, and when the computer program/instruction is executed by a processor, the computer program/instruction implements the image classification method disclosed in the embodiments of the present application.

Embodiments of the present invention further provide a computer program product, which includes a computer program/instruction, and the computer program/instruction, when executed by a processor, implements the image classification method disclosed in the embodiments of the present application.

The embodiments in the present specification are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other.

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, apparatus or computer program product. Accordingly, embodiments of the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, embodiments of the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

Embodiments of the present invention are described with reference to flowchart illustrations and/or block diagrams of methods, apparatus, electronic devices and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing terminal to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing terminal, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing terminal to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing terminal to cause a series of operational steps to be performed on the computer or other programmable terminal to produce a computer implemented process such that the instructions which execute on the computer or other programmable terminal provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

While preferred embodiments of the present invention have been described, additional variations and modifications of these embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the embodiments of the invention.

Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or terminal that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or terminal. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or terminal that comprises the element.

The image classification method, the electronic device, the storage medium and the program product provided by the present application are introduced in detail, and a specific example is applied to illustrate the principle and the implementation manner of the present application, and the description of the above embodiment is only used to help understand the method and the core idea of the present application; meanwhile, for a person skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims

1. An image classification method, comprising:

acquiring an image to be classified;

2. The method of claim 1, wherein the training process of the image classification model comprises the steps of:

3. The method of claim 2, wherein establishing the inter-class loss based on a difference between the classification prediction result of each sample image and the soft class label of the sample image comprises:

4. The method of claim 3, wherein establishing a confusion matrix according to the classification prediction result of each sample image and the real class label carried by the sample image comprises:

5. The method of claim 3 or 4, wherein obtaining the classification prediction result of each sample image comprises:

6. The method of claim 1, wherein the predetermined model is a classification branch of an untrained instance segmentation model, the untrained instance segmentation model further comprising a location prediction branch; the training process of the classification branch in the example segmentation model at least comprises the following steps:

7. The method of claim 6, further comprising:

8. The method of claim 7, wherein repeating the step of training the classification branch to obtain the intermediate branch comprises:

9. The method of any of claims 6-8, further comprising:

obtaining an unclassified image containing an object to be classified;

10. An electronic device comprising a memory, a processor and a computer program stored on the memory, wherein the processor executes the computer program to implement the image classification method of any one of claims 1 to 9.

11. A computer-readable storage medium on which is stored a computer program/instructions, characterized in that the computer program/instructions, when executed by a processor, implement the image classification method of any one of claims 1 to 9.

12. A computer program product comprising computer programs/instructions, characterized in that the computer programs/instructions, when executed by a processor, implement the image classification method of any of claims 1 to 9.