CN113139597A

CN113139597A - Statistical thought-based image distribution external detection method

Info

Publication number: CN113139597A
Application number: CN202110433494.2A
Authority: CN
Inventors: 冯晓硕; 陈鸣; 万克
Original assignee: People's Liberation Army 91054 Troops
Current assignee: People's Liberation Army 91054 Troops
Priority date: 2021-04-19
Filing date: 2021-04-19
Publication date: 2021-07-20
Anticipated expiration: 2041-04-19
Also published as: CN113139597B

Abstract

The invention discloses an image distribution external detection method based on statistical thought, and belongs to the technical field of image classification. Training the deep learning model to obtain an image classifier; solving a reduced value vector Z by using an image classifier and a verification data set V; solving the threshold value of each known image category to obtain an image category threshold value vector T; for any input image, subtracting Z from a logits value obtained by calculation of an image classifier, then performing sigmoid calculation to obtain an image class label prediction probability vector Y, comparing Y with T, and if element values in Y are all lower than corresponding element values in T, judging that the input image does not belong to a known image class; otherwise, using the image category label corresponding to the maximum value in the Y as the detection result. The performance of the method of the invention is obviously superior to that of the prior method: effectively enabling the image classifier to successfully reject unknown image classes without being exposed to new image data reduces the risk of opening the space and further reduces the risk of rejecting opening.

Description

Statistical thought-based image distribution external detection method

Technical Field

The invention belongs to the technical field of image classification, and relates to an image distribution external detection method based on statistical thought.

Background

The goal of image classification is to classify different images into different categories based on different characteristics reflected in the image information, achieving minimum classification errors. The existing image classification method generally establishes an image identification model completely, and the established image identification model generally comprises the stages of bottom layer feature extraction, feature coding, spatial feature constraint, classifier design, model fusion and the like. In recent years, with the development of deep learning techniques represented by convolutional neural networks, image feature information extraction techniques have advanced sufficiently, and typical computer vision research such as image classification has been increasingly performed. Deep learning introduces the concept of end-to-end learning, and instead of searching for specific features through elaborating algorithms, deep neural networks are trained to automatically learn the features of input pictures.

While deep learning performs well on various visual recognition tasks, they are generally unable to recognize off-distribution inputs, i.e., sampled inputs that differ from training data. For example, convolutional neural networks often classify an out-of-distribution input image into one of the known classes with high confidence. The current supervised learning also requires that the category appearing in the test data is the category in the training set, and the prediction result is limited to the category appearing in the training set when the prediction service is performed.

And the image classifier should have the function of detecting the out-of-distribution input. Ideally, the image classifier should classify the input image into the correct existing class used in the training and should also detect classes that do not belong to any existing class. This problem is known as out-of-distribution detection or open world classification.

In the existing distributed external detection method, for example, a support vector machine is used for judging unknown types, but the effect of a single classifier is poor due to the lack of enough generalized negative training data; the method is easily influenced by a data set, and if the types with similar characteristics are influenced mutually, the central similarity is close to each other, the decision boundary is not obvious, and the classification result is interfered; in addition, although the rejection capability is increased by optimizing the output value of the fully-connected layer of the CNN model, there is a drawback that a sample similar to the output value of the fully-connected layer may be from an unknown class or a sample that is difficult to classify.

Disclosure of Invention

Aiming at the defects in the prior art, the invention provides an image out-of-distribution detection method based on statistical thought, aiming at effectively detecting the input image out of distribution.

The technical scheme of the invention is as follows:

an image distribution external detection method based on statistical thought comprises the following steps:

step 1: determining a deep learning model, and correspondingly determining a training data set D and a verification data set V with the same m known image categories;

the set of class labels for the m known image classes is

Step 2: training the deep learning model by using a training data set D, and taking the trained model as an image classifier;

and step 3: solving a reduced value vector Z of the image sample according to the image classifier and the verification data set V;

and 4, step 4: solving the threshold value of each known image category according to the verification data set V and the reduced value vector of the image sample, and further obtaining image category threshold value vectors T of m known image categories;

and 5: inputting any image to be detected into an image classifier, and outputting a feature vector of the image by the image classifier;

step 6: subtracting the reduced value vector Z obtained in the step 3 from the feature vector of the image to obtain a feature vector reduced value of the image;

and 7: mapping the feature vector reduction value of the image to a 0-1 interval by using a sigmoid function to obtain an image category prediction probability vector of the image

And 8: make Y reduceRemoving the image category threshold vector T obtained in the step 4 to obtain a final image category detection value vector of the image

The maximum value in L is noted

The final detection result for this image category is: when in use

When the value is less than 0, the image type prediction probability vector values of the input image are all less than the set threshold value of each image type, and the input image is judged not to belong to the known image type; otherwise, with

And taking the corresponding image category label as a detection result.

Further, according to the statistical thought-based image out-of-distribution detection method, the deep learning model is a CNN model.

Further, according to the statistical idea-based image out-of-distribution detection method, step 3 includes the following specific steps:

step 3.1: inputting all the image samples in the verification data set V into an image classifier, and outputting the feature vector of each image sample through the image classifier;

the feature vector of each image sample is a feature vector of m dimensions, wherein each dimension corresponds to each known image category;

step 3.2: according to the image category label corresponding to each image sample searched from the verification data set V, respectively taking out the dimension value corresponding to the image category label from the feature vector of each image sample, and respectively putting the dimension values into the array corresponding to each image category label to obtain m arrays;

step 3.3: calculate the standard deviation σ for each array_iLet decrease the value z_i＝kσ_iK is a positive number, thenObtaining an m-dimensional reduced value vector Z ═ Z₁，z₂，...z_m}。

Further, according to the statistical idea-based image out-of-distribution detection method, step 4 includes the following specific steps:

step 4.1: subtracting the reduced value vector Z from the feature vector of each image sample in the verification data set V to obtain a reduced value of the feature vector of each image sample in the verification data set V;

step 4.2: mapping the feature vector reduction value of each image sample in the verification data set V to a range of 0-1 by using a sigmoid function to obtain an image class prediction probability vector Y of each image sample, wherein each dimension in the vector Y represents the prediction probability of an image classifier on the image class corresponding to the current image sample;

step 4.3: according to the image category label of each image sample searched from the verification data set V, respectively taking out the value of the dimensionality corresponding to the image category label of each image sample from the prediction probability vector of each image sample, and respectively putting the value into the probability array corresponding to each image category label;

step 4.4: constructing each probability array as an array of normal distribution with the average value of 1;

step 4.5: calculating the standard deviation sigma 'of each normal distribution array'_iAccording to standard deviation σ'_iDetermining a threshold t for each image class_i＝max(0.5，1-kσ′_i) Then, a threshold vector T ═ T of the known image class is obtained₁，t₂，...，t_m}。

Further, according to the statistical-idea-based image out-of-distribution detection method, k is 3.

Compared with the prior art, the image distribution external detection method based on the statistical thought has the beneficial effects that:

1. the method is based on the statistical outlier detection idea, a threshold value is designed for each known image category, and the image classifier can effectively reject the unknown categories successfully under the condition that new data is not contacted.

2. Unlike the existing image classifier, the final layer of the image classifier established by the invention adopts a sigmoid function instead of a softmax function, so that the risk of opening space, namely identifying an unknown class as a known class, can be reduced. And the decision boundaries of the sigmoid function are tightened by fitting a normal distribution, further reducing the risk of refusing to open, i.e. identifying a known class as an unknown class. Experimental results show that the performance of the method is obviously superior to that of the existing method, the method can effectively detect the input images outside the distribution, and the method has high availability.

3. Since text classification can use a convolutional neural network similar to image classification, the out-of-distribution detection method of the present invention is also applicable to text classification.

Drawings

FIG. 1 is a flow chart of the statistical idea-based image distribution external detection method of the present invention;

FIG. 2 is a flow chart of the present invention for solving reduced value vectors for image samples;

FIG. 3 is a flow chart of the present invention for solving for thresholds for each known image class.

Detailed Description

To facilitate an understanding of the present application, the present application will now be described more fully with reference to the accompanying drawings. Preferred embodiments of the present application are given in the accompanying drawings. This application may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete.

In order to enable a deep learning model to detect an unknown image which does not belong to the current image category used in the training process, the invention provides an image distribution external detection method based on statistical thought, and the basic thought of the method is as follows: the existing image classifier usually uses a softmax logistic regression model as a final output layer, and because normalization operation is adopted on output values in a prediction stage of an image classifier training process, namely for a sample x, the sum of the output values calculated by softmax is 1, the final image classification prediction result only belongs toThere is no ability to reject unknown image types, which means that the input image does not belong to the known m image types. Aiming at the problem, the method uses a sigmoid function as the last output layer of an image classifier, firstly obtains the output distribution of all samples in a verification data set V, and then obtains the abnormal value of each class set as the threshold value of the image class according to the definition about the abnormal value in statistics, thereby constructing a threshold layer T ═ T containing m threshold values₁，t₂，...，t_m}. In brief, for each input image x, the logits value calculated by the image classifier, i.e., the output value of the full-link layer, is subtracted by a reduced-value vector Z ═ Z₁，z₂，...z_mAnd calculating sigmoid to obtain an image category prediction probability vector Y ═ Y₁，y₂，...，y_mComparing the predicted vector value with a threshold layer T, and if the element values in the predicted vector Y are all lower than the corresponding element values in the threshold layer T, namely the predicted vector values are all lower than a set threshold, judging that the image x does not belong to the known image category; otherwise use and y_iCorresponding image category label l_iAs the final prediction class for image x, where y_i＝max(y₁，y₂，...，y_m)，y_i≥t_i。

The following describes the detailed implementation of the present invention with reference to the accompanying drawings. As shown in fig. 1, the method for detecting an image distribution based on statistical concepts according to the present embodiment includes the following steps:

step 1: a deep learning model is determined and accordingly a training data set D and a validation data set V with the same m known image classes are determined.

The deep learning model determined to be used in the present embodiment is a CNN (Convolutional Neural Networks) model, and may obtain a required training data set D and a verification data set V from various channels, for example, the training data set D and the verification data set V may be created by themselves as needed, or may be obtained from existing image classification data sets such as fast-MNIST, CIFAR100, CIFAR10, and the likeA training data set D and a validation data set y. The determined training data set is denoted as D { (x)₁，y₁)，(x₂，y₂)，...，(x_d，y_d) In which x_iIs the ith image sample, i is any one of 1, 2, 1, d is the total number of samples in the training data set, and

is an image sample x_iThe same m image classes appear in the image class label, the training data set D and the verification data set V,

a set of category labels for m image categories. The verification dataset obtained is denoted V { (x)_d+1，y_d+1)，(x_d+2，y_d+2)，...，(x_d+v，y_d+v) V is the total number of samples of the validation data set.

In the embodiment, a training data set D and a verification data set V are obtained from a CIFAR-10 data set, wherein the CIFAR-10 data set has 60000 color images and is divided into 10 categories, 50000 color images are used for training, and 5000 color images are obtained in each category; 10000 sheets for verification, 1000 sheets per class. The training data set determined in this embodiment is denoted as D { (x)₁，y₁)，(x₂，y₂)，...，(x_i，y_i)，...，(x₅₀₀₀₀，y₅₀₀₀₀) In which x_iIs the ith image sample, and

is an image sample x_iThe image category label to which it belongs. The verification data set obtained in this embodiment is denoted as V { (x)₅₀₀₀₁，y₅₀₀₀₁)，(x₅₀₀₀₂，y₅₀₀₀₂)，...，(x₆₀₀₀₀，y₆₀₀₀₀)}。

Step 2: and (3) training the CNN model by using the training data set D determined in the step 1, and taking the trained CNN model as an image classifier f (x).

Training a CNN model by using the training data set D obtained in the step 1, wherein the trained CNN model can perform image sampling x from an input image sample

Identifies the image class to which it belongs from the m image classes that have appeared in the image, or rejects the sample, i.e. the sample does not belong to

M image categories that have appeared in, and belongs to the reject category. In general, after a deep learning model CNN model is trained by using a training data set, an image classifier f (x) capable of performing m +1 classifications is constructed, wherein the classifiable types are

In the present embodiment, the training data set D { (x) is used₁，y₁)，(x₂，y₂)，...，(x_i，y_i)，...，(x₅₀₀₀₀，y₅₀₀₀₀) An image classifier f (x) capable of classifying 10+1 images is constructed after the existing CNN model is trained, and a classifiable image category label set

And step 3: solving a reduced value vector Z of the image sample according to the image classifier and the verification data set V, as shown in fig. 2, specifically includes the following steps:

step 3.1: inputting the image samples in the verification data set V into an image classifier f (x), and outputting the feature vector of each image sample through a full connection layer of the image classifier f (x).

The image samples in the verification data set V are input into an image classifier f (x) one by one, correspondingly, the feature vectors of the image samples can be obtained from the image classifier f (x) in sequence, each time the feature vectors are output by a full-connection layer of the image classifier f (x), the feature vectors are m-dimensional feature vectors, and each dimension corresponds to each image category in the training data set. That is, the obtained feature vector of each image sample is an m-dimensional feature vector, and is an initial class prediction result of the image classifier f (x) on the input image sample.

In the present embodiment, the verification data set V { (x)₅₀₀₀₁，y₅₀₀₀₁)，(x₅₀₀₀₂，y₅₀₀₀₂)，...，(x₆₀₀₀₀，y₆₀₀₀₀) Inputting the image samples into an image classifier f (x) one by one, correspondingly obtaining 10-dimensional feature vectors of the image samples from a full-connection layer of the image classifier f (x), wherein 10 dimensions of the feature vectors of each image sample correspond to 10 categories in a CIFAR-10 data set one by one, namely obtaining an initial category predicted value of each image sample in the verification data set V. In the present embodiment, 10000 images are shared by the verification data set V, and therefore 10000 10-dimensional vectors are finally obtained.

Step 3.2: and according to the image category label corresponding to each image sample searched from the verification set V, respectively taking out the dimension value corresponding to the image category label from the feature vector of each image sample, and respectively putting the dimension values into the array corresponding to each image category label to obtain m arrays.

In the present embodiment, V { (x) is set for the verification dataset₅₀₀₀₁，y₅₀₀₀₁)，(x₅₀₀₀₂，y₅₀₀₀₂)，...，(x₆₀₀₀₀，y₆₀₀₀₀) Any one of the image data (x)_i，y_i) Wherein i is more than or equal to 50001 and less than or equal to 60000, and corresponding image class label y_i＝l_jWherein j is more than or equal to 1 and less than or equal to 10, the image sample x is taken out_iRepresents the prediction capability of the image classifier f (x) on the class label of the image sample, and is put into an array n which is divided according to the image class and is specially used for storing the j class image data_jIn (1). According to this method, the present embodiment traverses the verification dataset V { (x)₅₀₀₀₁，y₅₀₀₀₁)，(x₅₀₀₀₂，y₅₀₀₀₂)，...，(x₆₀₀₀₀，y₆₀₀₀₀) Each of the drawingsLike the sample, 10 arrays dedicated to storing 10 types of image data respectively are obtained, and each array has 1000 data.

Step 3.3: calculate the standard deviation σ for each array_iLet decrease the value z_i＝kσ_iAnd k is a positive number, an m-dimensional reduced value vector Z is obtained₁，z₂，...z_m}。

In the present embodiment, the standard deviation σ is calculated for each of the 10 sets obtained in step 3.2_iAnd the calculation is verified by experiments, when k is 3, z is led to_i＝3σ_iThe best effect is obtained, and a 10-dimensional reduced value vector Z is obtained₁，z₂，...z₁₀}。

And 4, step 4: according to the verification data set V and the reduced value vector of the image sample, the threshold value of each known image category is solved, and further, an image category threshold value vector T of m known image categories is obtained, as shown in fig. 3, the step specifically includes the following steps:

step 4.1: and (4) subtracting the reduced value vector Z from the feature vector of each image sample in the verification data set V obtained in the step (3.1) to obtain a reduced value of the feature vector of each image sample in the verification data set V.

Step 4.2: and mapping the feature vector reduction value of each image sample in the verification data set V to a 0-1 interval by using a sigmoid function to obtain an image category prediction probability vector Y of each image sample, wherein each dimension in the vector Y represents the prediction probability value of the corresponding image category.

Classifying image samples in the verification data set V by using an image classifier f (x), subtracting a reduced value vector Z from an m-dimensional characteristic vector output after the image samples are calculated by using a CNN model for each image sample, calculating the vector by using a sigmoid function, mapping the result to a 0-1 interval, and finally obtaining an m-dimensional prediction probability vector of the image category of each image sample

Wherein each dimension represents each image corresponding to the current image sample by the image classifier f (x)Like the prediction probability of the class.

The reduction value z is a concept defined by the invention and is used for reducing the value after the full connection layer is reduced, so that the difference between the known class, namely the visible class, and the distributed class, namely the invisible class, in the training data set is expanded after the sigmoid function is calculated, and the purpose of the distributed detection of image classification is finally achieved. Because the output value of the image classifier full-connection layer is large after the image of the unknown class is calculated by the image classifier, although the output value is generally lower than the output value of the image of the known class after the image classifier is calculated by the image classifier, the output value is close to 1 after the sigmoid function is calculated, the value difference between the output value and the value of the image of the known class after the sigmoid function is calculated is small, and the value cannot be effectively detected out of distribution through a threshold value, the output value of the image classifier needs to be reduced, and the vector of the reduced value is Z { Z ═ Z₁，z₂，...z_m-a respective reduction value for each image category. The size of the reduction value is related to the feature extraction capability of the image classifier on the image of the type, the better the feature extraction of the image classifier on the image of a certain type is, and the larger the output logits value is, the larger the reduction value can be set. The significance of setting the reduction value is that after the reduction value is subtracted from the logits value in the calculation process of the image classifier, the output value of the image classifier of the image of the known class is not changed greatly after the sigmoid function is calculated, while the output value of the unknown class is not strong in the feature extraction capability of the image classifier, the logits value is small, and after the reduction value is subtracted, the output value of the sigmoid function is calculated and is obviously changed and is far smaller than the known class, so that the difference between the output value and the known class is increased, and the threshold value is convenient to design.

Step 4.3: according to the image category label of each image sample searched from the verification data set V, the dimension value corresponding to the image category label corresponding to each image sample is taken out from the image category prediction probability vector of each image sample and is respectively put into the probability array corresponding to each image category label.

In the present embodiment, V { (x) is set for the verification set₅₀₀₀₁，y₅₀₀₀₁)，(x₅₀₀₀₂，y₅₀₀₀₂)，...，(x₆₀₀₀₀，y₆₀₀₀₀) Any one of themImage data (x)_i，y_i) Wherein i is more than or equal to 50001 and less than or equal to 60000, and the image class label y thereof_i＝l_jWherein j is more than or equal to 1 and less than or equal to 10, the image sample x is taken out_iThe value of the j-th dimension in the image class prediction probability vector

The value represents the prediction probability of the image classifier f (x) for the image class corresponding to the image data, and is put into a probability array n 'which is divided according to the image class and is specially used for storing the j-th class of image data'_jIn (1). According to this method, the present embodiment traverses the verification set V { (x)₅₀₀₀₁，y₅₀₀₀₁)，(x₅₀₀₀₂，y₅₀₀₀₂)，...，(x₆₀₀₀₀，y₆₀₀₀₀) Obtaining 10 probability arrays which are specially used for storing 10 types of image data for each image sample in the image data, wherein each probability array has 1000 data.

Step 4.4: each probability array is constructed as a normal distribution array with a mean of 1.

This embodiment is applied to any one of the probability arrays

Make it become

Then array n'_iHas an average value of 1.

Step 4.5: calculating n 'of each normal distribution array'_iOf standard deviation sigma'_iAccording to standard deviation σ'_iDetermining the threshold value of each image category, and obtaining a threshold value vector T ═ T of the image categories₁，t₂，...，t₁₀}。

The invention is based on the idea of outlier detection in statistics, where a value (point) is considered an outlier if it is off average, by designing a threshold for each known image class. The threshold for each image category is therefore determined as:

t_i＝max(0.5，1-kσ′_i) (1)

in the present embodiment, as described above, it is experimentally verified that the effect is best when k is 3, and the threshold value t for each image type determined in the present embodiment is determined_i＝max(0.5，1-3σ′_i) And further obtaining a threshold vector T ═ T of all image categories₁，t₂，...，t₁₀}。

And 5: inputting an image of any image category to be detected to an image classifier f (x), and outputting a feature vector of the image by a full connection layer of the image classifier.

Step 6: and (3) subtracting the reduced value vector Z obtained in the step (3) from the feature vector of the image to obtain a feature vector reduced value of the image.

And 8: subtracting the image category threshold vector T obtained in the step 4 from the image category prediction probability vector Y of the image to obtain a final image category detection value vector of the image

Let maximum value in L

The final detection result is:

when in use

When the value is less than 0, the image type prediction probability vector representing the input imageIf the values are all smaller than the set threshold value of each image category, judging that the input image does not belong to the known image category; otherwise, predicting the label l corresponding to the dimension i of the probability vector value by the maximum image category of the input image_iAs a result of the detection.

The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. An image distribution external detection method based on statistical thought is characterized by comprising the following steps:

the set of class labels for the m known image classes is

and 7: using sigmoid function to apply the sameMapping the reduced value of the feature vector of the image to a 0-1 interval to obtain the image type prediction probability vector of the image

And 8: subtracting the image category threshold vector T obtained in the step 4 from Y to obtain a final image category detection value vector of the image

The maximum value in L is noted

The final detection result for this image category is: when in use

And taking the corresponding image category label as a detection result.

2. The statistical-idea-based image out-of-distribution detection method according to claim 1, wherein the deep learning model is a CNN model.

3. The statistical-idea-based image out-of-distribution detection method according to claim 1, wherein step 3 comprises the following specific steps:

step 3.3: calculate the standard deviation σ for each array_iLet decrease the value z_i＝kσ_iAnd k is a positive number, an m-dimensional reduced value vector Z is obtained₁,z₂,…z_m}。

4. The statistical-idea-based image out-of-distribution detection method according to claim 1, wherein step 4 comprises the following specific steps:

step 4.3: according to the image category label of each image sample searched from the verification data set V, respectively taking out the dimension value corresponding to the image category label of each image sample from the image category prediction probability vector of each image sample, and respectively putting the dimension value into the probability array corresponding to each image category label;

step 4.5: calculating the standard deviation sigma 'of each normal distribution array'_iAccording to standard deviation σ'_iDetermining a threshold t for each image class_i＝max(0.5,1-kσ′_i) Then, a threshold vector T ═ T of the known image class is obtained₁,t₂,…,t_m}。

5. The method according to claim 3 or 4, wherein k is 3.