CN111930935A

CN111930935A - Image classification method, device, equipment and storage medium

Info

Publication number: CN111930935A
Application number: CN202010565300.XA
Authority: CN
Inventors: 张文俊
Original assignee: Pulian International Co ltd
Current assignee: Pulian International Co ltd
Priority date: 2020-06-19
Filing date: 2020-06-19
Publication date: 2020-11-13

Abstract

The invention discloses an image classification method, which comprises the following steps: inputting an image to be classified into a pre-trained image classifier so that the image classifier outputs a plurality of probability vectors of different classes of the image to be classified; wherein the probability vector is the probability that the image to be classified is correctly classified into the current category; acquiring a category corresponding to the maximum probability vector from the probability vectors of the different categories as a target category; judging whether the product of the probability vector of the target category and the output response of the image classifier is larger than a preset threshold value or not; if yes, judging that the image to be classified belongs to the target class; if not, the image to be classified is judged to belong to the background image. The invention also discloses an image classification device, an image classification device and a computer readable storage medium. By adopting the embodiment of the invention, the accuracy of image classification can be effectively improved.

Description

Image classification method, device, equipment and storage medium

Technical Field

The present invention relates to the field of image classification, and in particular, to an image classification method, apparatus, device, and storage medium.

Background

In recent years, with the rapid development of science and technology, the capabilities of cognitive systems based on machine learning, particularly image classification, have come to catch up with humans. However, the reality is that there are many uncertainties compared to laboratories that tightly control environmental variables, which present many challenges for image classification. For example, for a machine learning classifier whose image classification targets cats and dogs, entering a picture of a vehicle, the model will often fail to classify correctly. In order to achieve the purpose of accurate classification, the traditional image classification technology usually adopts a mode of training a classifier to extract the features of the image, firstly carries out similarity calculation on the features of the image, and then carries out clustering and optimization on the feature map according to the similarity so as to obtain the cluster centers of all categories, thereby classifying to obtain the open set categories. However, the above method does not sufficiently consider intra-class information in the same class of images, which may cause a poor position of a cluster center and further cause low accuracy of image classification. In addition, in the conventional image classification technology, the features of all the unknown classes are required to be gathered in a part of the hyperspace, but the data of the unknown classes are not limited during training, which is difficult to achieve for a data set with a limited number, and finally, the resulting feature distribution may be as shown in fig. 1, wherein the black dots in fig. 1 represent the unknown classes, the unknown classes which are not classified are relatively large, and the classification accuracy is low.

Disclosure of Invention

The embodiment of the invention aims to provide an image classification method, device, equipment and storage medium, which can effectively improve the accuracy of image classification.

In order to achieve the above object, an embodiment of the present invention provides an image classification method, including:

inputting an image to be classified into a pre-trained image classifier so that the image classifier outputs a plurality of probability vectors of different classes of the image to be classified; wherein the probability vector is the probability that the image to be classified is correctly classified into the current category;

acquiring a category corresponding to the maximum probability vector from the probability vectors of the different categories as a target category;

judging whether the product of the probability vector of the target category and the L2 norm is greater than a preset threshold value or not;

if yes, judging that the image to be classified belongs to the target class; if not, the image to be classified is judged to belong to the background image.

As an improvement of the above scheme, the training method of the image classifier includes:

acquiring a data set; wherein the data set comprises a number of different categories of target images and a category of background images;

inputting all target images under the current category into a preset classifier so that the preset classifier divides the target images into typical data and atypical data;

calculating a first proportion of the target image of the current category in the data set and a second proportion of the background image in the data set;

constructing a loss function of the image classifier to be trained according to the target image of the current category, the first proportion and the second proportion; the image classifier is used for outputting a vector with a preset length, and inputting the output vector into an output layer to obtain a probability vector with the preset length;

inputting the typical data and the atypical data into the image classifier to train the image classifier.

As an improvement of the above scheme, the inputting all the target images in the current category into a preset classifier so that the preset classifier divides the target images into typical data and atypical data includes:

inputting all target images under the current category into a preset classifier so that the preset classifier outputs the classification probability of the target images; wherein the classification probability is the probability that the target image is correctly classified by the preset classifier;

sorting the classification probabilities of all the target images;

and acquiring atypical data from the sorted target images according to a preset proportion, and taking the rest target images as atypical data.

As an improvement of the above scheme, the classification probability of the typical data is greater than that of the atypical data.

As a refinement of the above solution, the output layer is a softmax layer, which satisfies the following definitions:

wherein, s (c) is a probability vector classified into a category c, e is a natural logarithm, c represents a category in the data set, and j is a total number of categories in the data set.

In order to achieve the above object, an embodiment of the present invention further provides an image classification apparatus, including:

the image classifier classification module is used for inputting an image to be classified into a pre-trained image classifier so that the image classifier outputs a plurality of probability vectors of different classes of the image to be classified; wherein the probability vector is the probability that the image to be classified is correctly classified into the current category;

a target category obtaining module, configured to obtain, from the multiple probability vectors of different categories, a category corresponding to a largest probability vector as a target category;

the judging module is used for judging whether the product of the probability vector of the target category and the L2 norm is larger than a preset threshold value or not; if yes, judging that the image to be classified belongs to the target class; if not, the image to be classified is judged to belong to the background image.

As an improvement of the above scheme, the apparatus further includes an image classifier training module, including:

a data set acquisition unit for acquiring a data set; wherein the data set comprises a number of different categories of target images and a category of background images;

the data dividing unit is used for inputting all target images under the current category into a preset classifier so that the preset classifier divides the target images into typical data and atypical data;

the calculating unit is used for calculating a first proportion occupied by the target image of the current category in the data set and a second proportion occupied by the background image in the data set;

the loss function construction unit is used for constructing a loss function of the image classifier to be trained according to the target image of the current category, the first proportion and the second proportion; the image classifier is used for outputting a vector with a preset length, and inputting the output vector into an output layer to obtain a probability vector with the preset length;

a training unit for inputting the typical data and the atypical data into the image classifier to train the image classifier.

As an improvement of the above scheme, the data dividing unit is configured to:

sorting the classification probabilities of all the target images;

To achieve the above object, an embodiment of the present invention further provides an image classification device, which includes a processor, a memory, and a computer program stored in the memory and configured to be executed by the processor, and the processor implements the image classification method according to any one of the above embodiments when executing the computer program.

In order to achieve the above object, an embodiment of the present invention further provides a computer-readable storage medium, where the computer-readable storage medium includes a stored computer program, where when the computer program runs, the apparatus where the computer-readable storage medium is located is controlled to execute the image classification method according to any one of the above embodiments.

Compared with the prior art, the image classification method, the device, the equipment and the storage medium disclosed by the invention have the advantages that firstly, the image to be classified is input into the pre-trained image classifier, so that the image classifier outputs a plurality of probability vectors of different classes of the image to be classified; then, acquiring a category corresponding to the maximum probability vector from the probability vectors of the different categories as a target category; and finally, when the product of the probability vector of the target class and the L2 norm is greater than a preset threshold, judging that the image to be classified belongs to the target class. Because the output response of the image classifier is considered in the process of judging whether the image to be classified belongs to the target class or not, the image classification method can have a good filtering effect on the background class image, so that the response of the background class image is inhibited, and the accuracy of image classification can be effectively improved.

Drawings

FIG. 1 is a feature distribution diagram in a conventional image classification technique;

FIG. 2 is a flowchart of an image classification method according to an embodiment of the present invention;

FIG. 3 is a flowchart of an image classifier training method provided by an embodiment of the present invention;

FIG. 4 is a schematic diagram illustrating the classification of typical data and atypical data provided by an embodiment of the present invention;

FIG. 5 is a feature distribution diagram of an image classification method according to an embodiment of the present invention;

fig. 6 is a schematic structural diagram of an image classification apparatus according to an embodiment of the present invention;

fig. 7 is a schematic structural diagram of an image classification device according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Referring to fig. 2, fig. 2 is a flowchart of an image classification method according to an embodiment of the present invention; the image classifier method includes:

s11, inputting the image to be classified into a pre-trained image classifier so that the image classifier outputs a plurality of probability vectors of different classes of the image to be classified; wherein the probability vector is the probability that the image to be classified is correctly classified into the current category;

s12, obtaining the category corresponding to the maximum probability vector from the probability vectors of different categories as a target category;

s13, judging whether the product of the probability vector of the target category and the L2 norm is larger than a preset threshold value or not;

s14, if yes, judging that the image to be classified belongs to the target class; if not, the image to be classified is judged to belong to the background image.

Specifically, in training the image classifier, the input data set is differentiated between open-set images and closed-set images. The closed set image refers to a category image that we need to classify, for example, we want to classify a person, a car, a cat, and a dog, then the images of the four categories are closed set images, and during training we have a background image, that is, other images not belonging to the four categories, such as buildings, flowers, fruits, and so on, but since the data set is limited during training, there are certainly other categories that do not appear in the training set in practical application, so the images of the other categories that do not appear in the training set are called open set images. For example, there is no image of a bird in the training set, but in practical application, the image of the bird needs to be input into a classifier for classification, and if the method is a conventional method, erroneous classification is likely to be performed, but in the embodiment of the present invention, response or probability enhancement is performed on closed set image data, and for open set image data other than these categories, response size and probability of averaging output are suppressed, so the image classifier provided in the embodiment of the present invention is actually a classifier for open set images.

Suppose that the image to be classified is input as x_iThe output of the image classifier is f (x)_i) Inputting the vector output by the image classifier into the softmax layer of the image classifier to obtain a plurality of probability vectors S (x) of different classes_i) If the maximum probability is the target class c, the probability vector of the target class is S_c(x_i). Judging whether the following formula is satisfied:

S_c(x_i)*||f(x_i)||>t formula (1); wherein, | | f (x)_i) And | | | is the norm of L2, and T is the preset threshold. And if the formula (1) is met, the image to be classified is classified into a target class c, otherwise, the image to be classified belongs to a background class.

It should be noted that before classifying the image to be classified, an image classifier needs to be trained, referring to fig. 3, an embodiment of the present invention further provides a method for training an image classifier, including:

s21, acquiring a data set; wherein the data set comprises a number of different categories of target images and a category of background images;

s22, inputting all target images under the current category into a preset classifier so that the preset classifier divides the target images into typical data and atypical data;

s23, calculating a first proportion of the target image of the current category in the data set and a second proportion of the background image in the data set;

s24, constructing a loss function of the image classifier to be trained according to the target image of the current category, the first proportion and the second proportion; the image classifier is used for outputting a vector with a preset length, and inputting the output vector into an output layer to obtain a probability vector with the preset length;

s25, inputting the typical data and the atypical data into the image classifier to train the image classifier.

Specifically, in step S21, it is assumed that the data set includes N classes of interest (i.e., several different classes) each containing several target images and 1 background class containing several background images as the N +1 th class.

Specifically, in step S22, a preset classifier with N +1 outputs is first trained, where the preset classifier may be Adaboost, SVM, CNN, etc., and the output of the preset classifier is the probability of being classified into various classes. All target images under the current category c (assuming that the index of each picture is i) are input into a preset classifier so that the preset classifier divides the target images into typical data and atypical data.

Optionally, all target images in the current category are input into a preset classifier, so that the preset classifier outputs the classification probability of the target images

Wherein the classification probability is the probability that the target image is correctly classified by the preset classifier; classification probability for all of the target images

Sorting is carried out; and acquiring atypical data from the sorted target images according to a preset proportion, and taking the rest target images as atypical data.

Illustratively, the classification probability of the representative data is greater than the classification probability of the atypical data. The background class does not need to be divided into typical data and atypical data. Assuming that we divide the first category, for the remaining N-1 categories of interest, each category needs to perform one operation, that is, each category of picture selects q% (preset ratio) with the highest probability as the typical data of the category, and the remaining (100-q)% as the atypical data, so as to finally obtain N-category typical data and N-category atypical data, where a schematic diagram after data division is shown in fig. 4. It should be noted that the ratio of the typical data to the atypical data is adjusted by adjusting the size of q, and the size of q may be an empirical value or a value obtained by analyzing the distribution of the correct classification probability, which is not limited in the present invention.

In the implementation of the invention, the training data is divided into typical data and atypical data, and the intra-class information in the same class data and the inter-class information between different classes of data are fully considered.

Specifically, in step S23, the ratio of each type of image in the data set is counted, and it is assumed that the first ratio of the target image of the current type c in the data set is p_c(c is an integer of 1 to N), and the second proportion of the background image in the data set is p_N+1。

Specifically, in steps S24 to S25, one N output CNN image classifiers (image classifiers to be trained) are defined, the input is picture data, the output is N (∞, infinity) real numbers, i.e., one N-length vector, and the output vector is input to the softmax layer, resulting in an N-length probability vector, where the definition of the softmax layer satisfies:

Further, the output vector may be input to the sigmoid function. In the embodiment of the invention, the output vector is input into the softmax layer, so that cross entropy derivation can be conveniently carried out during training.

Constructing a loss function of the image classifier to be trained according to the target image of the current category, the first proportion and the second proportion, and assuming that an input picture is x_iThe output of the classifier is f (x)_i) Inputting the output vector into softmax layer to obtain probability vector S (x)_i) Wherein the probability of the j-th class is S_j(x_i) The loss function satisfies the following formula:

wherein, | f: (_i) | | is L2 norm, alpha, beta₁、β₂λ and is a tunable hyperparameter, and β₁<β₂。

From the loss function, it can be seen that for the known class of images, if it is typical data, there will be a λ max (0, - | f (x)_i)||²The purpose of this item is to increase the size of the output response, i.e. for typical data of known class, we expect that the output response of the classifier will be relatively large; for atypical data, the previous term in the loss function

Will have a larger index beta₂The goal is that for atypical data, we want the classifier to improve the probability of correct classification for this class of more difficult to classify samples, but to prevent overfitting, no term is added that is related to the magnitude of the output response. For background class data, we want to have low classification probability for all classes, so the logarithm of all class probability is averaged in the first term, and we also want the response of the background class to be as low as possible, so there is the second term λ x | f (x |)_i)||²。

For different classes c, the probability terms of the loss function have different weights

This reduces the effect of non-uniform numbers of different classes of pictures in the data set, and in addition, for atypical data, a larger hyper-parameter β is used₂The probability of the atypical data being correctly classified is lower, and the use of a larger number of secondary factors will make a greater penalty on the lower probability, so that the hardcases of the same type can be urged to be better classified to obtain more accurate scoresClass probability, i.e. greater S_c(x_i)。

Compared with a method for calculating similarity by calculating characteristic values, the method provided by the embodiment of the invention is used for outputting the response size from the model, and not compressing all the characteristics of unknown classes in a fan-shaped hyperspace. From the loss function, for atypical data, | | f (x) in the loss function_i)||²In terms, the output response of the model is to be suppressed, and at this time, referring to fig. 5, in the feature distribution diagram of the image classification method provided in the embodiment of the present invention, black dots represent unknown classes, the number of the unknown classes is small, which indicates that the probability that the input image is accurately classified is high, and therefore, the incorrect classification of the unknown class image can be effectively suppressed. Therefore, in the classification, the probability is multiplied by the output L2 norm (i.e. the output response of the image classifier) in formula (1), so that the background class can be filtered very well.

Compared with the prior art, the image classification method disclosed by the invention comprises the steps of firstly, inputting an image to be classified into a pre-trained image classifier so that the image classifier outputs a plurality of probability vectors of different types of the image to be classified; then, acquiring a category corresponding to the maximum probability vector from the probability vectors of the different categories as a target category; and finally, when the product of the probability vector of the target class and the L2 norm is greater than a preset threshold, judging that the image to be classified belongs to the target class. Because the output response of the image classifier is considered in the process of judging whether the image to be classified belongs to the target class or not, the image classification method can have a good filtering effect on the background class image, so that the response of the background class image is inhibited, and the accuracy of image classification can be effectively improved.

Referring to fig. 6, fig. 6 is a schematic structural diagram of an image classification apparatus 10 according to an embodiment of the present invention; the image classification device 10 includes:

the image classifier classifying module 11 is configured to input an image to be classified into a pre-trained image classifier, so that the image classifier outputs a plurality of probability vectors of different classes of the image to be classified; wherein the probability vector is the probability that the image to be classified is correctly classified into the current category;

a target category obtaining module 12, configured to obtain, from the multiple probability vectors of different categories, a category corresponding to a largest probability vector as a target category;

the judging module 13 is configured to judge whether a product of the probability vector of the target category and the norm of L2 is greater than a preset threshold; if yes, judging that the image to be classified belongs to the target class; if not, the image to be classified is judged to belong to the background image.

The image classification device 10 further includes an image classifier training module 14, where the image classifier training module 14 includes:

Further, the data dividing unit is configured to:

sorting the classification probabilities of all the target images;

For the specific working process of each module in the image classification device 10, please refer to the working process of the image classification method described in the above embodiment, which is not described herein again.

Compared with the prior art, the image classification device 10 disclosed by the invention firstly inputs the image to be classified into the pre-trained image classifier so that the image classifier outputs a plurality of probability vectors of different types of the image to be classified; then, acquiring a category corresponding to the maximum probability vector from the probability vectors of the different categories as a target category; and finally, when the product of the probability vector of the target class and the L2 norm is greater than a preset threshold, judging that the image to be classified belongs to the target class. Because the output response of the image classifier is considered in the process of judging whether the image to be classified belongs to the target class or not, the image classification method can have a good filtering effect on the background class image, so that the response of the background class image is inhibited, and the accuracy of image classification can be effectively improved.

Referring to fig. 7, fig. 7 is a schematic structural diagram of an image classification device 20 according to an embodiment of the present invention. The image classification device 20 of this embodiment includes: a processor 21, a memory 22 and a computer program stored in said memory 22 and executable on said processor 21. The processor 21, when executing the computer program, implements the steps in the above-described embodiment of the image classification method, such as the steps S11-S14 shown in fig. 1. Alternatively, the processor 21, when executing the computer program, implements the functions of the modules/units in the above-described device embodiments, such as the image classifier classification module 11.

Illustratively, the computer program may be divided into one or more modules/units, which are stored in the memory 22 and executed by the processor 21 to accomplish the present invention. The one or more modules/units may be a series of computer program instruction segments capable of performing specific functions, which are used to describe the execution process of the computer program in the image classification apparatus 20. For example, the computer program may be divided into an image classifier classification module 11, a target class acquisition module 12, a judgment module 13 and an image classifier training module 14, and for the specific functions of each module, reference is made to the specific working process of the image classification device 10 described in the foregoing embodiment, which is not described herein again.

The image classification device 20 may be a computing device such as a desktop computer, a notebook, a palm computer, and a cloud server. The image classification device 20 may include, but is not limited to, a processor 21, a memory 22. It will be appreciated by those skilled in the art that the schematic diagram is merely an example of the image classification device 20 and does not constitute a limitation of the image classification device 20, and may include more or less components than those shown, or combine some components, or different components, for example, the image classification device 20 may also include an input-output device, a network access device, a bus, etc.

The Processor 21 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component, etc. The general purpose processor may be a microprocessor or the processor 21 may be any conventional processor or the like, the processor 21 being the control center of the image sorting apparatus 20, various interfaces and lines connecting the various parts of the overall image sorting apparatus 20.

The memory 22 may be used for storing the computer programs and/or modules, and the processor 21 implements various functions of the image classification apparatus 20 by running or executing the computer programs and/or modules stored in the memory 22 and calling data stored in the memory 22. The memory 22 may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data (such as audio data, a phonebook, etc.) created according to the use of the cellular phone, and the like. In addition, the memory 22 may include high speed random access memory, and may also include non-volatile memory, such as a hard disk, a memory, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), at least one magnetic disk storage device, a Flash memory device, or other volatile solid state storage device.

Wherein, the integrated module/unit of the image classification device 20 can be stored in a computer readable storage medium if it is implemented in the form of software functional unit and sold or used as a stand-alone product. Based on such understanding, all or part of the flow of the method according to the above embodiments may be implemented by a computer program, which may be stored in a computer readable storage medium and used by the processor 21 to implement the steps of the above embodiments of the method. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like. It should be noted that the computer readable medium may contain content that is subject to appropriate increase or decrease as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable media does not include electrical carrier signals and telecommunications signals as is required by legislation and patent practice.

It should be noted that the above-described device embodiments are merely illustrative, where the units described as separate parts may or may not be physically separate, and the parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. In addition, in the drawings of the embodiment of the apparatus provided by the present invention, the connection relationship between the modules indicates that there is a communication connection between them, and may be specifically implemented as one or more communication buses or signal lines. One of ordinary skill in the art can understand and implement it without inventive effort.

While the foregoing is directed to the preferred embodiment of the present invention, it will be understood by those skilled in the art that various changes and modifications may be made without departing from the spirit and scope of the invention.

Claims

1. An image classification method, comprising:

2. The image classification method of claim 1, wherein the training method of the image classifier comprises:

3. The image classification method according to claim 2, wherein the inputting of all the target images under the current category into a preset classifier so that the preset classifier divides the target images into typical data and atypical data includes:

sorting the classification probabilities of all the target images;

4. The image classification method according to claim 3, characterized in that the classification probability of the typical data is larger than the classification probability of the atypical data.

5. The image classification method according to claim 2, characterized in that the output layer is a softmax layer, which satisfies the following definitions:

6. An image classification apparatus, comprising:

7. The image classification apparatus of claim 6, further comprising an image classifier training module comprising:

8. The image classification apparatus according to claim 7, wherein the data division unit is configured to:

sorting the classification probabilities of all the target images;

9. An image classification device comprising a processor, a memory, and a computer program stored in the memory and configured to be executed by the processor, the processor implementing the image classification method according to any one of claims 1 to 5 when executing the computer program.

10. A computer-readable storage medium, comprising a stored computer program, wherein the computer program, when executed, controls an apparatus in which the computer-readable storage medium is located to perform the image classification method according to any one of claims 1 to 5.