CN115810127A

CN115810127A - Small sample image classification method based on supervision and self-supervision combined contrast learning

Info

Publication number: CN115810127A
Application number: CN202211614946.8A
Authority: CN
Inventors: 邹修明; 翁小兰
Original assignee: Huaiyin Normal University
Current assignee: Huaiyin Normal University
Priority date: 2022-12-14
Filing date: 2022-12-14
Publication date: 2023-03-17

Abstract

The invention discloses a small sample image classification method based on supervision and self-supervision combined contrast learning, which comprises the following steps: s1, constructing a classification model; s2, extracting features of the backbone network; s3, transforming the supervision characteristic and the self-supervision characteristic; s4, performing combined contrast learning; s5, carrying out small sample image classification test, removing a supervision projector and a self-supervision projector in a classification model, fixing parameters in a trunk convolutional neural network, collecting a small number of image samples on line as training images, inputting the training images into the trunk convolutional neural network to extract characteristics, training a support vector machine classifier, and finally finishing classification test on the test images collected on line by using the support vector machine classifier. The method uses supervised contrast learning to establish the loss function for the supervised and self-supervised learning tasks, and can establish the relation between the contrast relations among samples, thereby obtaining more compact class characteristics.

Description

Small sample image classification method based on supervision and self-supervision combined contrast learning

Technical Field

The invention relates to the technical field of image classification, in particular to a small sample image classification method based on supervision and self-supervision combined contrast learning.

Background

Image classification is a fundamental and important task in the field of computer vision, playing a key role in several areas of research, such as object detection, pedestrian re-recognition, object tracking, etc. With the rapid development of deep learning, the field has made a significant breakthrough in recent years. The model based on the deep neural network achieves the performance of human recognition even on standard test libraries such as ImageNet and the like. However, this impressive advancement relies heavily on large manual annotation datasets to optimize large representation parameters, which severely limits the applicability of deep learning in many real-world scenarios. Instead, humans need little supervision to identify new objects. Therefore, this huge gap between human beings and deep learning has gradually led to research interest in the task of classifying small sample images.

The purpose of small sample image classification is to learn a priori knowledge based on an additional auxiliary data set, so that classification and identification of invisible classes can be completed by using a small amount of supervision information. Current approaches to solving the problem of small sample image classification can be broadly divided into meta-learning based approaches and transfer-learning based approaches, in terms of the manner in which the auxiliary data set is utilized. The former builds a series of small sample image tasks on top of the auxiliary data set, and accumulates meta-knowledge for invisible classification tasks by doing so. The meta-metric learning simultaneously learns the classifier and the metric in the training stage, and achieves good classification performance in a small sample image classification task. For example, li Wenbin et al (Li Wenbin, chen Saiyuan, huo Jing, gaoyang, xulin, wang Lei, luo Jiebo. A small sample learning algorithm based on covariance measures. Application No. 202010783893.7) discloses a small sample learning algorithm based on covariance measures that utilizes local covariance to represent each type of support sample and constructs a layer of covariance measures to measure the consistency of distribution between query samples and each type of support sample. Su Qinliang, etc. (Su Qinliang, chen Jiaxing. A small sample image classification method based on local feature relationship exploration, application No. 202110287779. X) discloses a small sample image classification method based on local feature relationship exploration, which adopts a multi-level graph neural network to finish measurement query of the relationship between an image sample and a support image sample, wherein the local graph neural network establishes a mathematical model for the local feature relationship in a single image, and the task graph neural network establishes a mathematical model for the relationship between image samples. Wei Shigong and the like (Wei Shigong, liu Gongmei, fan Sen and Zhu Longjiao. A small sample image classification method based on multi-level measurement, application No. 202110284727.7) discloses a small sample image classification method based on multi-level measurement, which respectively establishes measurement for each layer of convolution features in a trunk neural network, and finally completes classification and identification of query samples by fusing image level and category level measurement results. Pei Wenjiang and the like (Pei Wenjiang, tian Weiwei and Xia Yi plough, a small sample image classification method based on manifold learning and a high-order graph neural network, application number 202110441901) discloses a small sample image classification method based on manifold learning and a high-order graph neural network.

However, when the above-mentioned method for a thumbnail based on meta-learning adopts a meta-learning strategy, some data are randomly extracted from the auxiliary data set to complete the task of constructing a thumbnail image classification, and information provided by the auxiliary data set cannot be fully utilized. In contrast, recent studies have shown that a classification method based on transfer learning can achieve superior classification performance. This type of approach typically combines a pre-trained feature extractor across the entire helper data set with any conventional classifier to complete the classification decision for the invisible classes. For example, chen et al (Chen W Y, liu C Y, kira Z, wang Y F, huang J B.A binder hook at raw-shot classification [ C ]// Proceedings of the 7th International Conference on Learning retrieval. New Oreans, USA ICLR, 2019.. Thereafter, much work has been devoted to optimizing training using a weighted sum of cross-entropy losses and self-supervision losses, thereby improving the generalization capability of backbone networks, such as Rizve et al (Rizve M N, khan S, khan F S, expanding comprehensive Strengths of investment and equivalent responses for Few-Shot Learning [ C ]// Proceedings of the 33rd IEEE Conference on Computer Vision and Pattern Recognition.) introducing a predicted rotation Pattern as a self-supervision task into the pre-training process, thereby improving the mobility of the backbone network. The existing method still adopts a cross entropy loss function to optimize parameters in the network so as to complete supervision and self-supervision tasks in the pre-training process, however, the cross entropy loss only concerns the accuracy of the prediction probability of a correct label, and ignores the difference of other wrong labels, so that scattered characteristics are learned. In order to solve the problems, the invention discloses a small sample image classification method based on supervision and self-supervision joint contrast learning.

Disclosure of Invention

The invention aims to provide a small sample image classification method based on supervision and self-supervision combined contrast learning, so as to solve the problems in the background technology.

In order to achieve the purpose, the invention provides the following technical scheme: the small sample image classification method based on the joint comparison learning of supervision and self-supervision tasks comprises the following steps:

s1, constructing a classification model; the classification model mainly comprises an image enhancement module, a trunk convolution neural network, a supervision projector and an automatic supervision projector;

s2, extracting features of the backbone network; inputting any image in the extra data set into a classification model, acquiring self-supervision image information of the image through an image enhancement module, and inputting the self-supervision image information into a main neural network to extract features;

s3, transforming the supervision characteristic and the self-supervision characteristic; the image features extracted in the step S2 are respectively input into a supervision projector and an automatic supervision projector to transform the features;

s4, performing combined comparison learning; respectively calculating a supervised contrast learning loss function according to the transformed features in the step S3, and finishing the joint contrast learning of the supervision and self-supervision tasks;

and S5, carrying out classification test on the small sample image. Removing a supervision projector and an automatic supervision projector in the classification model, fixing parameters in the trunk convolutional neural network, collecting a small number of image samples on line as training images, inputting the training images into the trunk convolutional neural network to extract features, training a support vector machine classifier, and finally finishing classification test on the test images collected on line by using the support vector machine classifier.

As a preferred embodiment of the present invention, the classification model constructing step described in step S1 is described as follows:

s11: the image enhancement module in the classification model is denoted T (·);

s12: the trunk convolutional neural network in the classification model is represented as B _θ (. The), θ is a parameter in the network;

s13: the supervised projector in the classification model is denoted P _h H is a parameter therein;

s14: the self-supervised projector in the classification model is denoted P _v V is a parameter therein;

as a preferred embodiment of the present invention, the calculation process of the backbone network feature extraction described in step S2 is as follows:

s21: the method selects rotation prediction as an automatic supervision task, an image enhancement module T (-) executes rotation operation on any image in the extra data set, rotates the image respectively by 0 DEG, 90 DEG, 180 DEG and 270 DEG, marks the image in a rotation mode, completes construction of the automatic supervision task, and any image is represented as x _i With original class label of y _i After the enhancement operation, the rotation label is represented as

S22: to pairExtracting features from any image in the additional data set, wherein any image x is _i And obtaining a Z-dimension feature representation as

As a preferred solution of the present invention, the process of transforming the supervised and unsupervised features described in step S3 is as follows:

s31: inputting the features extracted by the trunk convolutional neural network into a supervision projector to obtain H-dimensional supervision projection features, namely:

s32: inputting the features extracted by the trunk convolutional neural network into an automatic supervision projector to obtain the V-dimensional automatic supervision projection features, namely:

as a preferred aspect of the present invention, the joint contrast learning process described in step S4 is described as follows:

s41: the supervision projection characteristic h of any image _i As anchor point, search in extra data set for its original label y _i The same sample feature is used as a positive example sample, and the set of all the positive example samples is denoted as P (h) _i ) And the set of all samples is denoted as A (h) _i ) At this time, a supervision contrast loss function is calculated for the supervision task, and is expressed as:

s42: feature v of supervision projection of arbitrary images _i As anchor, search in extra data set and its self-monitoring label

The same sample characteristics are used as the positive example samples, and all the positive example samples are formedSet is denoted as P (v) _i ) And the set of all samples is denoted as A (v) _i ) In this case, for the supervisory task, the computational supervision is expressed as:

s43: the computational formula of the loss function of the joint contrast learning of the supervised and the unsupervised tasks is expressed as:

L _SC (θ,h,v)＝L _SC1 (θ,h)+αL _SC2 (θ,v)

s44: and optimizing parameters theta, h and v in the network by using a gradient descent method based on the loss function.

As a preferred embodiment of the present invention, the small sample image classification test procedure described in step S5 is as follows:

s51: removing a supervision projector and an automatic supervision projector in the classification model, and fixing a parameter theta in the trunk convolutional neural network;

s52: 5 image samples are acquired on line to serve as training images, and any image is represented as x _j With class label y _j The features input into the neural network are expressed as

S53: inputting all training image characteristics into parameters of

Support vector machine classifier

Obtaining parameters in a classifier by using the characteristics of a training sample;

s54: on-line collecting test image x, inputting it into main nerve network to extract features as

Then the class prediction value for this picture is:

compared with the prior art, the invention has the beneficial effects that: the method uses supervised contrast learning to establish the loss function for the supervised and self-supervised learning tasks, and can establish the relation between the contrast relations among samples, thereby obtaining more compact class characteristics.

Drawings

FIG. 1 is a flow chart of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Referring to fig. 1, the present invention provides a technical solution: the small sample image classification method based on supervision and self-supervision combined contrast learning comprises the following steps:

step 1: constructing a classification model; an image enhancement module T (-) in the classification model executes rotation operation on any image in the extra data set and labels the rotation model; the trunk convolutional neural network in the classification model is represented as B _θ (. The), θ is a parameter in the network; supervised projector P in a classification model _h H is a parameter in the network; the self-supervised projector in the classification model is denoted P _v V is a parameter therein;

step 2: extracting features of a backbone network; for any image x in the extra data set after passing through the image enhancement module _i And inputting the Z-dimensional characteristics into the trunk convolutional neural network to be expressed as

And step 3: supervised feature and self-supervised feature transformation; for the feature z after passing through the trunk convolution neural network _i Respectively input into a supervision projector and an automatic supervision projector to obtain H-dimensional supervision projection characteristics H _i And a V-dimensional self-supervised projection feature V _i ；

And 4, step 4: performing combined contrast learning; projecting the supervision of any image into a characteristic h _i As anchor points, a positive example sample set P (h) is constructed _i ) And all sample sets A (h) _i ) Calculating a supervision contrast loss function L of the supervision task _SC1 (theta, h), projecting the supervision of arbitrary image into the characteristic v _i Constructing a positive example sample set P (v) as an anchor point _i ) And all sample sets A (v) _i ) Calculating a supervision contrast loss function L of the supervision task _SC2 (theta, v), calculating a loss function L of joint contrast learning of supervised and unsupervised tasks _SC (theta, h, v), optimizing parameters theta, h and v in the network by using a gradient descent method;

and 5: classifying and testing the small sample image; removing supervised projectors P in a classification model _h (. Cndot.) and self-supervising projector P _v (. The) fixing the parameter theta in the trunk convolution neural network, acquiring 5 image samples on line as training images and inputting the training images into the trunk neural network to obtain the characteristic z _j And train support vector machine classifier

Collecting a test image x on line, inputting the test image x into a trunk neural network to extract a characteristic z, and finally predicting the test image by using a trained support vector machine classifier to obtain

In summary, the following steps: the method uses supervised contrast learning to establish the loss function for the supervised and self-supervised learning tasks, and can establish the relation between the contrast relations among samples, thereby obtaining more compact class characteristics.

Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims

1. The small sample image classification method based on supervision and self-supervision combined contrast learning is characterized by comprising the following steps of: the method comprises the following steps:

s3, transforming the supervision characteristic and the self-supervision characteristic; inputting the image features extracted in the step S2 into a supervision projector and a self-supervision projector respectively to transform the features;

s4, performing combined contrast learning; respectively calculating a supervised contrast learning loss function according to the transformed features in the step S3, and finishing the joint contrast learning of the supervision and self-supervision tasks;

s5, carrying out small sample image classification test, removing a supervision projector and a self-supervision projector in a classification model, fixing parameters in a trunk convolutional neural network, collecting a small number of image samples on line as training images, inputting the training images into the trunk convolutional neural network to extract characteristics, training a support vector machine classifier, and finally finishing classification test on the test images collected on line by using the support vector machine classifier.

2. The small sample image classification method based on supervised and unsupervised joint contrast learning of claim 1, characterized in that: the step S1 is specifically described as follows:

s11, an image enhancement module in the classification model is represented as T (·);

s12, trunk convolution nerve in classification modelThe network is denoted as B _θ (. The), θ is a parameter in the network;

s13, the supervision projector in the classification model is expressed as P _h H is a parameter therein;

s14, the self-supervision projector in the classification model is represented as P _v V is a parameter therein.

3. The small sample image classification method based on supervised and unsupervised joint contrast learning of claim 1, characterized in that: the calculation process of the backbone network feature extraction described in the step S2 is as follows:

s21, selecting rotation prediction as an automatic supervision task, performing rotation operation on any image in the additional data set by an image enhancement module T (·), rotating the image by 0 DEG, 90 DEG, 180 DEG and 270 DEG respectively, labeling the image in a rotation mode, and completing construction of the automatic supervision task, wherein any image is represented as x _i With original class label of y _i After the enhancement operation, the rotation label is represented as

S22, extracting features of any image in the extra data set, and extracting any image x in the extra data set _i And obtaining a Z-dimension feature representation as

4. The small sample image classification method based on supervised and unsupervised joint contrast learning of claim 1, characterized in that: the calculation process of the supervised feature and the unsupervised feature transformation described in the step S3 is as follows:

s31, inputting the features extracted by the trunk convolution neural network into a supervision projector to obtain H-dimensional supervision projection features, namely:

s32, inputting the features extracted by the trunk convolutional neural network into an automatic supervision projector to obtain the V-dimensional automatic supervision projection features, namely:

5. the small sample image classification method based on supervised and unsupervised joint contrast learning of claim 1, characterized in that: the process of the joint contrast learning described in the step S4 is:

s41, projecting the supervision of any image into a characteristic h _i As anchor point, search in extra data set for its original label y _i The same sample feature is used as a positive example sample, and the set of all the positive example samples is denoted as P (h) _i ) And the set of all samples is denoted as A (h) _i ) At this time, a supervision contrast loss function is calculated for the supervision task, and is expressed as:

s42, projecting the supervision of any image into the characteristic v _i As anchor, search in extra data set and its self-monitoring label

The same sample feature is used as a positive example sample, and a set of all the positive example samples is denoted as P (v) _i ) And the set of all samples is denoted as A (v) _i ) In this case, for the supervisory task, the computational supervision is expressed as:

s43, the calculation formula of the loss function of the joint contrast learning of the supervision and self-supervision tasks is expressed as follows:

L _SC (θ,h,v)＝L _SC1 (θ,h)+αL _SC2 (θ,v)

and S44, optimizing the parameters theta, h and v in the network by using a gradient descent method based on the loss function.

6. The small sample image classification method based on supervised and unsupervised joint contrast learning of claim 1, characterized in that: the small sample image classification test process described in step S5 is as follows:

s51, removing a supervision projector and an automatic supervision projector in the classification model, and fixing a parameter theta in the trunk convolutional neural network;

s52, acquiring 5 image samples on line as training images, wherein any image is represented as x _j With class label y _j The features input into the neural network are expressed as

S53, inputting all the training image characteristics into parameters of

Support vector machine classifier

Then the class prediction value for this picture is: