Image classification method based on frequency domain contrast learning
Technical Field
The invention relates to the field of image classification, in particular to an image classification method based on frequency domain contrast learning.
Background
In recent years, with the continuous development of deep learning techniques, computers have achieved considerably high precision in various image classification tasks. However, the effectiveness of the mainstream image classification method is established on the premise that the training data set and the test set satisfy independent and same distribution. In practical application, training data and test data are often difficult to strictly satisfy the assumption of independent and same distribution, and in such a case, a model trained by a traditional method can perform well on a training data set, but an ideal classification effect cannot be achieved on the test data.
The classification methods proposed for the non-independent images with the same distribution are not many, and the currently proposed method is mainly based on a causal inference theory. The method inputs images into a depth model to extract features, takes each dimension of the features as an intervention variable in turn, and takes the features of other dimensions as confounding factors. The model is made to learn a set of sample weights, minimizing the association between each dimensional feature, thereby independently estimating the causal relationship of each dimensional feature to the classification result. However, there is not at all a complete lack of association between each dimension of an image feature, and different dimensions of features have different causal relationships to image classes. The method treats each dimension feature equally, and the classification effect is still not ideal enough.
The non-independent and identical distribution of the training data and the test data is mainly caused by different context information (including the background of the classification target, the texture of the classification target, the action of the classification target and the like) in the image, but the classification target has some characteristics which are not changed under different scenes. The stable characteristics of the learning target can effectively solve the classification problem of the non-independent same-distribution image. However, the diversity of the target stabilization features and the abstraction of the high-dimensional features extracted by the neural network pose challenges to the learning of the stabilization features.
Disclosure of Invention
Objects of the invention
In order to solve the technical problems in the background art, the invention provides an image classification method based on frequency domain contrast learning.
(II) technical scheme
In order to solve the above problems, the present invention provides an image classification method based on frequency domain contrast learning, which comprises the following steps:
s1: respectively carrying out random data enhancement twice on the training set image, wherein the data enhancement operation comprises cutting and size adjustment, horizontal turning, Gaussian blur, color dithering and gray level image conversion;
whether each data enhancement operation is executed is determined by the probability set in advance so as to carry out two times of random data enhancement on the same image to obtain two different enhanced images;
s2: performing discrete cosine transform on the image subjected to data enhancement to obtain an image transferred to a frequency domain;
wherein, the image is represented by RGB color table codes; extracting the frequency domain features of the image can be divided into the following two sub-steps:
s201: converting an image from an RGB color space to a YCbCr color space according to the following formula:
202: the picture converted to YCbCr color space is divided into 8 x 8 small blocks, and three channels are obtained for each block according to the formula F-AfATSolving a corresponding discrete cosine transform coefficient; the transformation matrix formula is as follows:
the original image is divided into 14 small blocks of 8 × 8, each small block contains 64 pixel points, and each pixel point has pixel values of 3 color channels, that is, 192 frequency domain coefficients can be obtained for each image small block, so that image frequency domain coefficients with the dimensions of (192, 14, 14) are obtained;
s3: the image transferred to the frequency domain passes through a depth network, and network parameters are learned through a comparison learning task to obtain the stable characteristics of the image; the method comprises the following substeps:
s301: respectively inputting the image frequency domain coefficients into a feature extraction layer of the depth network to obtain a feature h with the dimension of (N, 2048)iAnd hjThe network structure adopts a residual error neural network;
s302: h is to beiAnd hjInputting into a multi-layer perceptron to obtain an (N, 128) -dimensional feature z for comparison learning trainingiAnd zj;
S303: will ziAnd zjAnd (3) splicing according to the 0 th dimension to obtain the characteristics for calculating the comparative learning loss:
the loss of the comparative learning pre-training is calculated according to the following formula:
wherein σ is a positive number;
s304: performing parameter adjustment on the depth network in return by minimizing the loss of the contrast learning pre-training, and performing global parameter adjustment by adopting a back propagation algorithm until the loss of the contrast learning pre-training does not decrease any more, then converging the model, and ending the contrast learning pre-training step;
s4: predicting the classification result of the training set image by using the extracted features, further learning network parameters, and performing classification tasks;
wherein the 2018-dimensional features extracted in S301 are input into the full link layer and the softmax layerIn the classifier, a predicted classification result is obtained:
n is the number of images in a batch, K is the number of categories of images;
and then calculating a cross entropy loss function by using the classification result:
and finally, carrying out global parameter adjustment through a back propagation algorithm, and optimizing the network parameters by taking the minimized cross entropy loss function as a target until the function value is not reduced any more.
S5: and classifying the images in the test set by using the depth network with optimized parameters:
performing random data enhancement on the test image according to S1, and converting the test image into a frequency domain according to S2; and finally, inputting the image converted into the frequency domain into a feature extraction layer of the depth network, and inputting the image into a classifier consisting of a full connection layer and a softmax layer to obtain a prediction result.
Preferably, in S2, the frequency domain image is input into a convolutional neural network to extract features.
Preferably, in S3, the model learns stable features through pre-training of the contrast learning task in combination with the frequency domain learning and contrast learning framework.
In the invention, images of each category are classified secondarily according to context information, and then relevant data sets are divided: training set: the image classification method comprises an image and a classification label corresponding to the image. And (3) test set: the image classification method comprises an image and a classification label corresponding to the image. But the context information of the images in the test set is different from that in the training set.
In the invention, random data enhancement is performed twice on each batch of pictures, and then the pictures are converted into the frequency domain to obtain two characteristics of the same image, and the training model is used for distinguishing whether the two characteristics come from the same image, so that the stable characteristics of the image are learned, and the classification effect of the non-independent images in the same distribution is improved.
The method can learn the stable characteristics of the same object under different backgrounds, and can better classify the non-independent images with the same distribution compared with the traditional classification method.
Drawings
FIG. 1 is a flowchart of a frequency domain contrast learning-based non-independent image classification method according to the present invention.
Fig. 2 is a model structure diagram of frequency domain contrast learning in the image classification method based on frequency domain contrast learning according to the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail with reference to the accompanying drawings in conjunction with the following detailed description. It should be understood that the description is intended to be exemplary only, and is not intended to limit the scope of the present invention. Moreover, in the following description, descriptions of well-known structures and techniques are omitted so as to not unnecessarily obscure the concepts of the present invention.
As shown in fig. 1-2, the image classification method based on frequency domain contrast learning provided by the present invention includes the following steps:
s1: for training set image [ x1,x2,x3,……xN]Respectively carrying out random data enhancement twice, wherein the data enhancement operation comprises cutting and size adjustment, horizontal turning, Gaussian blur, color dithering and gray level image conversion;
whether each data enhancement operation is executed is determined by the probability set in advance so as to carry out two times of random data enhancement on the same image to obtain two different enhanced images;
s2: performing discrete cosine transform on the image subjected to data enhancement to obtain an image transferred to a frequency domain, namely:
[xi1,xi2,xi3,……xiN]and [ x ]j1,xj2,xj3,……xjN];
Wherein, the image is represented by RGB color table codes;
the enhanced image dimension is (3, 112, 112), where 3 represents that the image has 3 color channels, which are R, G respectively, and B, 112 is the image size set in advance;
extracting the frequency domain features of the image can be divided into the following two sub-steps:
s201: converting an image from an RGB color space to a YCbCr color space according to the following formula:
202: the picture converted to YCbCr color space is divided into 8 x 8 small blocks, and three channels are obtained for each block according to the formula F-AfATSolving a corresponding discrete cosine transform coefficient; the transformation matrix formula is as follows:
the original picture (3, 112, 112) can be divided into 14 8 × 8 small blocks, each small block contains 64 pixel points, and each pixel point has a pixel value of 3 color channels, that is, 192 frequency domain coefficients can be obtained for each image small block, so as to obtain an image frequency domain coefficient with a dimension of (192, 14, 14):
[x’i1,x’i2,x’i3,……x’iN]and [ x'j1,x’j2,x’j3,……x’jN];
S3: the image transferred to the frequency domain passes through a depth network, and network parameters are learned through a comparison learning task to obtain the stable characteristics of the image; the method comprises the following substeps:
s301: image frequency domain coefficient [ x'i1,x’i2,x’i3,……x’iN]And [ x'j1,x’j2,x’j3,……x’jN]Respectively inputting the feature extraction layers of the depth network to obtain a feature h with dimensions of (N, 2048)iAnd hjThe network structure thereof adoptsUsing a residual neural network;
s302: h is to beiAnd hjInputting into a multi-layer perceptron to obtain an (N, 128) -dimensional feature z for comparison learning trainingiAnd zj;
S303: will ziAnd zjAnd (3) splicing according to the 0 th dimension to obtain the characteristics for calculating the comparative learning loss:
the loss of the comparative learning pre-training is calculated according to the following formula:
wherein σ is a positive number;
s304: performing parameter adjustment on the depth network in return by minimizing the loss of the contrast learning pre-training, and performing global parameter adjustment by adopting a back propagation algorithm until the loss of the contrast learning pre-training does not decrease any more, then converging the model, and ending the contrast learning pre-training step;
s4: predicting the classification result of the training set image by using the extracted features, further learning network parameters, and performing classification tasks;
inputting the 2018 dimensional features extracted in the S301 into a classifier consisting of a full-link layer and a softmax layer to obtain a predicted classification result:
n is the number of images in a batchMesh, K is the number of categories of the image;
and then calculating a cross entropy loss function by using the classification result:
and finally, carrying out global parameter adjustment through a back propagation algorithm, and optimizing the network parameters by taking the minimized cross entropy loss function as a target until the function value is not reduced any more.
S5: and classifying the images in the test set by using the depth network with optimized parameters:
performing random data enhancement on the test image according to S1, and converting the test image into a frequency domain according to S2; and finally, inputting the image converted into the frequency domain into a feature extraction layer of the depth network, and inputting the image into a classifier consisting of a full connection layer and a softmax layer to obtain a prediction result.
In the invention, images of each category are classified secondarily according to context information, and then relevant data sets are divided: training set: the image classification method comprises an image and a classification label corresponding to the image. And (3) test set: the image classification method comprises an image and a classification label corresponding to the image. But the context information of the images in the test set is different from that in the training set.
In the invention, random data enhancement is performed twice on each batch of pictures, and then the pictures are converted into the frequency domain to obtain two characteristics of the same image, and the training model is used for distinguishing whether the two characteristics come from the same image, so that the stable characteristics of the image are learned, and the classification effect of the non-independent images in the same distribution is improved.
In an alternative embodiment, in S2, the frequency domain image is input into a convolutional neural network to extract features.
In an alternative embodiment, in S3, the model is made to learn stable features through pre-training of the contrast learning task in conjunction with the frequency domain learning and contrast learning framework.
In conclusion, the method and the device can learn the stable characteristics of the same object under different backgrounds, and can better classify the non-independent images with the same distribution compared with the traditional classification method.
It is to be understood that the above-described embodiments of the present invention are merely illustrative of or explaining the principles of the invention and are not to be construed as limiting the invention. Therefore, any modification, equivalent replacement, improvement and the like made without departing from the spirit and scope of the present invention should be included in the protection scope of the present invention. Further, it is intended that the appended claims cover all such variations and modifications as fall within the scope and boundaries of the appended claims or the equivalents of such scope and boundaries.