CN113077525A

CN113077525A - Image classification method based on frequency domain contrast learning

Info

Publication number: CN113077525A
Application number: CN202110164693.8A
Authority: CN
Inventors: 袁召全; 邵焕; 吴晓
Original assignee: Southwest Jiaotong University
Current assignee: Southwest Jiaotong University
Priority date: 2021-02-06
Filing date: 2021-02-06
Publication date: 2021-07-06

Abstract

An image classification method based on frequency domain contrastive learning, comprising the following steps: S1: Perform two random data enhancements on the training set images respectively, and perform two random data enhancements on the same image, resulting in two different enhancements S2: Discrete cosine transform the image after data enhancement to obtain the image transferred to the frequency domain; S3: Pass the image transferred to the frequency domain through a deep network, and learn network parameters by comparing the learning tasks to obtain the image Stabilizing features; S5: Classify images in the test set using a deep network whose parameters have been optimized. The invention can learn the stable features of the same type of objects under different backgrounds, and can better classify non-IID images than traditional classification methods.

Description

Image classification method based on frequency domain contrast learning

Technical Field

The invention relates to the field of image classification, in particular to an image classification method based on frequency domain contrast learning.

Background

In recent years, with the continuous development of deep learning techniques, computers have achieved considerably high precision in various image classification tasks. However, the effectiveness of the mainstream image classification method is established on the premise that the training data set and the test set satisfy independent and same distribution. In practical application, training data and test data are often difficult to strictly satisfy the assumption of independent and same distribution, and in such a case, a model trained by a traditional method can perform well on a training data set, but an ideal classification effect cannot be achieved on the test data.

The classification methods proposed for the non-independent images with the same distribution are not many, and the currently proposed method is mainly based on a causal inference theory. The method inputs images into a depth model to extract features, takes each dimension of the features as an intervention variable in turn, and takes the features of other dimensions as confounding factors. The model is made to learn a set of sample weights, minimizing the association between each dimensional feature, thereby independently estimating the causal relationship of each dimensional feature to the classification result. However, there is not at all a complete lack of association between each dimension of an image feature, and different dimensions of features have different causal relationships to image classes. The method treats each dimension feature equally, and the classification effect is still not ideal enough.

The non-independent and identical distribution of the training data and the test data is mainly caused by different context information (including the background of the classification target, the texture of the classification target, the action of the classification target and the like) in the image, but the classification target has some characteristics which are not changed under different scenes. The stable characteristics of the learning target can effectively solve the classification problem of the non-independent same-distribution image. However, the diversity of the target stabilization features and the abstraction of the high-dimensional features extracted by the neural network pose challenges to the learning of the stabilization features.

Disclosure of Invention

Objects of the invention

In order to solve the technical problems in the background art, the invention provides an image classification method based on frequency domain contrast learning.

(II) technical scheme

In order to solve the above problems, the present invention provides an image classification method based on frequency domain contrast learning, which comprises the following steps:

s1: respectively carrying out random data enhancement twice on the training set image, wherein the data enhancement operation comprises cutting and size adjustment, horizontal turning, Gaussian blur, color dithering and gray level image conversion;

whether each data enhancement operation is executed is determined by the probability set in advance so as to carry out two times of random data enhancement on the same image to obtain two different enhanced images;

s2: performing discrete cosine transform on the image subjected to data enhancement to obtain an image transferred to a frequency domain;

wherein, the image is represented by RGB color table codes; extracting the frequency domain features of the image can be divided into the following two sub-steps:

s201: converting an image from an RGB color space to a YCbCr color space according to the following formula:

202: the picture converted to YCbCr color space is divided into 8 x 8 small blocks, and three channels are obtained for each block according to the formula F-AfA^TSolving a corresponding discrete cosine transform coefficient; the transformation matrix formula is as follows:

the original image is divided into 14 small blocks of 8 × 8, each small block contains 64 pixel points, and each pixel point has pixel values of 3 color channels, that is, 192 frequency domain coefficients can be obtained for each image small block, so that image frequency domain coefficients with the dimensions of (192, 14, 14) are obtained;

s3: the image transferred to the frequency domain passes through a depth network, and network parameters are learned through a comparison learning task to obtain the stable characteristics of the image; the method comprises the following substeps:

s301: respectively inputting the image frequency domain coefficients into a feature extraction layer of the depth network to obtain a feature h with the dimension of (N, 2048)_iAnd h_jThe network structure adopts a residual error neural network;

s302: h is to be_iAnd h_jInputting into a multi-layer perceptron to obtain an (N, 128) -dimensional feature z for comparison learning training_iAnd z_j；

S303: will z_iAnd z_jAnd (3) splicing according to the 0 th dimension to obtain the characteristics for calculating the comparative learning loss:

the loss of the comparative learning pre-training is calculated according to the following formula:

wherein σ is a positive number;

s304: performing parameter adjustment on the depth network in return by minimizing the loss of the contrast learning pre-training, and performing global parameter adjustment by adopting a back propagation algorithm until the loss of the contrast learning pre-training does not decrease any more, then converging the model, and ending the contrast learning pre-training step;

s4: predicting the classification result of the training set image by using the extracted features, further learning network parameters, and performing classification tasks;

wherein the 2018-dimensional features extracted in S301 are input into the full link layer and the softmax layerIn the classifier, a predicted classification result is obtained:

n is the number of images in a batch, K is the number of categories of images;

and then calculating a cross entropy loss function by using the classification result:

and finally, carrying out global parameter adjustment through a back propagation algorithm, and optimizing the network parameters by taking the minimized cross entropy loss function as a target until the function value is not reduced any more.

S5: and classifying the images in the test set by using the depth network with optimized parameters:

performing random data enhancement on the test image according to S1, and converting the test image into a frequency domain according to S2; and finally, inputting the image converted into the frequency domain into a feature extraction layer of the depth network, and inputting the image into a classifier consisting of a full connection layer and a softmax layer to obtain a prediction result.

Preferably, in S2, the frequency domain image is input into a convolutional neural network to extract features.

Preferably, in S3, the model learns stable features through pre-training of the contrast learning task in combination with the frequency domain learning and contrast learning framework.

In the invention, images of each category are classified secondarily according to context information, and then relevant data sets are divided: training set: the image classification method comprises an image and a classification label corresponding to the image. And (3) test set: the image classification method comprises an image and a classification label corresponding to the image. But the context information of the images in the test set is different from that in the training set.

In the invention, random data enhancement is performed twice on each batch of pictures, and then the pictures are converted into the frequency domain to obtain two characteristics of the same image, and the training model is used for distinguishing whether the two characteristics come from the same image, so that the stable characteristics of the image are learned, and the classification effect of the non-independent images in the same distribution is improved.

The method can learn the stable characteristics of the same object under different backgrounds, and can better classify the non-independent images with the same distribution compared with the traditional classification method.

Drawings

FIG. 1 is a flowchart of a frequency domain contrast learning-based non-independent image classification method according to the present invention.

Fig. 2 is a model structure diagram of frequency domain contrast learning in the image classification method based on frequency domain contrast learning according to the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail with reference to the accompanying drawings in conjunction with the following detailed description. It should be understood that the description is intended to be exemplary only, and is not intended to limit the scope of the present invention. Moreover, in the following description, descriptions of well-known structures and techniques are omitted so as to not unnecessarily obscure the concepts of the present invention.

As shown in fig. 1-2, the image classification method based on frequency domain contrast learning provided by the present invention includes the following steps:

s1: for training set image [ x₁，x₂，x₃，……x_N]Respectively carrying out random data enhancement twice, wherein the data enhancement operation comprises cutting and size adjustment, horizontal turning, Gaussian blur, color dithering and gray level image conversion;

s2: performing discrete cosine transform on the image subjected to data enhancement to obtain an image transferred to a frequency domain, namely:

[x_i1，x_i2，x_i3，……x_iN]and [ x ]_j1，x_j2，x_j3，……x_jN]；

Wherein, the image is represented by RGB color table codes;

the enhanced image dimension is (3, 112, 112), where 3 represents that the image has 3 color channels, which are R, G respectively, and B, 112 is the image size set in advance;

extracting the frequency domain features of the image can be divided into the following two sub-steps:

the original picture (3, 112, 112) can be divided into 14 8 × 8 small blocks, each small block contains 64 pixel points, and each pixel point has a pixel value of 3 color channels, that is, 192 frequency domain coefficients can be obtained for each image small block, so as to obtain an image frequency domain coefficient with a dimension of (192, 14, 14):

[x’_i1，x’_i2，x’_i3，……x’_iN]and [ x'_j1，x’_j2，x’_j3，……x’_jN]；

s301: image frequency domain coefficient [ x'_i1，x’_i2，x’_i3，……x’_iN]And [ x'_j1，x’_j2，x’_j3，……x’_jN]Respectively inputting the feature extraction layers of the depth network to obtain a feature h with dimensions of (N, 2048)_iAnd h_jThe network structure thereof adoptsUsing a residual neural network;

wherein σ is a positive number;

inputting the 2018 dimensional features extracted in the S301 into a classifier consisting of a full-link layer and a softmax layer to obtain a predicted classification result:

n is the number of images in a batchMesh, K is the number of categories of the image;

In an alternative embodiment, in S2, the frequency domain image is input into a convolutional neural network to extract features.

In an alternative embodiment, in S3, the model is made to learn stable features through pre-training of the contrast learning task in conjunction with the frequency domain learning and contrast learning framework.

In conclusion, the method and the device can learn the stable characteristics of the same object under different backgrounds, and can better classify the non-independent images with the same distribution compared with the traditional classification method.

It is to be understood that the above-described embodiments of the present invention are merely illustrative of or explaining the principles of the invention and are not to be construed as limiting the invention. Therefore, any modification, equivalent replacement, improvement and the like made without departing from the spirit and scope of the present invention should be included in the protection scope of the present invention. Further, it is intended that the appended claims cover all such variations and modifications as fall within the scope and boundaries of the appended claims or the equivalents of such scope and boundaries.

Claims

1. an image classification method based on frequency domain contrastive learning, is characterized in that, comprises the following steps:

S1: Perform two random data enhancements on the images of the training set. The data enhancement operations include cropping and resizing, horizontal flipping, Gaussian blurring, color dithering, and grayscale conversion;

Whether each data enhancement operation is performed is determined by a preset probability, so that two random data enhancements on the same image will result in two different enhanced images;

S2: Perform discrete cosine transform on the image after data enhancement to obtain an image transferred to the frequency domain;

Among them, the image is represented by RGB color table code; extracting the frequency domain features of the image can be divided into the following two sub-steps:

S201: Convert the image from the RGB color space to the YCbCr color space according to the following formula:

202: Divide the picture converted into the YCbCr color space into several small blocks of 8×8, find three channels for each block, and find the corresponding discrete cosine transform coefficient according to the formula F= ^AfAT ; the conversion matrix formula is as follows:

Among them, the original image is divided into 14 8×8 small blocks, each small block contains 64 pixels, and each pixel has the pixel value of 3 color channels, that is, each image small block can be calculated as 192 frequency domain coefficients, thereby obtaining image frequency domain coefficients with dimensions (192, 14, 14);

S3: Pass the image transferred to the frequency domain through a deep network, and learn the network parameters by comparing the learning tasks to obtain the stable features of the image; it includes the following sub-steps:

S301: Input the image frequency domain coefficients into the feature extraction layer of the deep network, respectively, to obtain features h _i and h _j with dimensions (N, 2048), and the network structure adopts a residual neural network;

S302: Input h _i and h _j into a multilayer perceptron to obtain (N, 128)-dimensional features z _i and z _j for comparative learning and training;

S303: splicing z _i and z _j according to the 0th dimension to obtain the features used to calculate the contrastive learning loss:

The loss of contrastive learning pre-training is calculated according to the following formula:

Among them, σ is a positive number;

S304: By minimizing the loss of the pre-training of the comparative learning, in turn, adjust the parameters of the deep network, and use the back-propagation algorithm to adjust the global parameters. Until the loss of the pre-training of the comparative learning does not decrease, the model converges, and the comparative learning is pre-trained. The training step is over;

S4: For the training set images, use the extracted features to predict the classification results, further learn network parameters, and perform classification tasks;

Among them, the 2018-dimensional features extracted in S301 are input into the classifier composed of the fully connected layer and the softmax layer, and the predicted classification result is obtained:

N is the number of images in a batch, K is the number of categories of images,;

Then use the classification result to calculate the cross entropy loss function:

Finally, the global parameter adjustment is carried out through the back propagation algorithm, and the network parameters are optimized to minimize the cross entropy loss function until the value of the function no longer decreases;

S5: Classify images in the test set using a deep network with optimized parameters:

The test image is enhanced according to the random data in S1, and then converted to the frequency domain according to S2; finally, the image converted to the frequency domain is input into the feature extraction layer of the deep network, and the classification consisting of the fully connected layer and the softmax layer is input. , and get the prediction result.

2 . The image classification method based on frequency domain contrastive learning according to claim 1 , wherein, in S2 , the frequency domain image is input into a convolutional neural network to extract features. 3 .

3. The image classification method based on frequency domain contrastive learning according to claim 1, is characterized in that, in S3, combined with frequency domain learning and contrastive learning framework, through the pre-training of contrastive learning task, make the model learn stable features.