CN112633382A

CN112633382A - Mutual-neighbor-based few-sample image classification method and system

Info

Publication number: CN112633382A
Application number: CN202011561516.5A
Authority: CN
Inventors: 刘洋; 蔡登�; 郑途; 张毅; 何晓飞
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University ZJU
Priority date: 2020-12-25
Filing date: 2020-12-25
Publication date: 2021-04-09
Anticipated expiration: 2040-12-25
Also published as: CN112633382B

Abstract

The invention discloses a mutual-neighbor-based few-sample image classification method and system, wherein the few-sample image classification method comprises the following steps: (1) the neural network model forwards deduces visual characteristic representations of the query image and the support image; (2) screening local feature descriptors related to tasks in the few-sample classification by using a mutual neighbor algorithm; (3) calculating the similarity between the query image and each type in the support image set by using the descriptors obtained by screening; (4) training a neural network model after performing few-sample task division by using an image data set with a label; (5) and sorting according to the similarity between the image set and each type of the supported image set, and selecting the class with the maximum similarity as the class prediction of the image. By using the method, the interference of a large amount of local feature descriptors from the background on similarity calculation can be eliminated in the training process of the image classification with few samples, so that the classification result is more robust.

Description

Mutual-neighbor-based few-sample image classification method and system

Technical Field

The invention belongs to the technical field of image classification, and particularly relates to a mutual-neighbor-based few-sample image classification method and system.

Background

In recent years, object classification has received attention from researchers in the industry and academia as an important branch in the field of computer vision. The supervised object classification task has advanced greatly with the benefit of the rapid development of deep learning techniques, but at the same time, there are some limitations to this supervised training approach, i.e. in supervised classification, each class requires enough labeled training samples. However, in practical applications, each class may not have enough training samples, and labeling picture data requires some expertise and often takes a lot of manpower.

The objective of few-sample image classification is to learn a machine learning model for image classification, so that after learning an image classification task of a large amount of data of a certain class, a method for quickly classifying a new image class can be performed with only a small number of samples. Few-sample image classification methods have become a rapidly developing direction in the field of machine learning, and have achieved certain achievements in the classification of medical images, satellite pictures and some rare species.

At present, the mainstream few-sample image classification method mainly adopts a classification method based on image global representation and metric learning. The derivation method comprises the following steps: inputting an image, and deducing a global feature vector representation of the image by a model in a first stage; various metric learning-based methods are used in the second stage to measure the distance between the query image and the support image, which consists of a small number of samples. For example, the MatchingNet model proposed by MatchingNet for One Shot Learning, which was recorded by Oriol Vinyals et al in the society of neural Information Processing Systems (Conference and Workshop on neural Information Processing Systems) in 2016, classifies images by training a two-way long-short memory network for global feature coding and an end-to-end nearest neighbor classifier. In 2017, a prototype network proposed in "prototype Networks for raw-shot Learning" recorded on the Congress and Workshop on neural Information Processing Systems by Jake Snell et al regards the classification problem as finding a prototype center of each class in a semantic space. Aiming at task definition of few-sample image classification, a prototype network learns how to fit a center during training, learns a metric function, and the metric function can find the prototype center of the class in the metric space through a small number of samples. An article "Learning to Complex: relationship Network for Few-shot Learning" recorded in The Conference of International Computer Vision and Pattern Recognition by Flood Sung et al in 2018 considers that an artificially designed metric Pattern is always defective and may fail in some state; to solve this problem, they propose a relational network based on a combination of an embedding unit and a relational unit to let a neural network self-select a specific metric method for different few-sample classification tasks.

Recent advances in few-sample image classification do not use a single feature vector for global characterization of the entire image, but rather learn local-token-based characterization that preserves as much of the respective local information as possible. The DN4 model proposed by an article of reviewing Local Descriptor based Image-to-Class measurement for Few-shot Learning in 2019, which is recorded in The Conference of Computer Vision and Pattern Recognition (The Conference on Computer Vision and Pattern Recognition), uses a naive Bayes nearest neighbor-based measurement method to aggregate similarity measurement values between Local feature representations of query images and a support Image set. In 2019, The Dense classification method is proposed by 'Dense classification and visualization for raw-shot learning' which is included in The Conference of international Computer Vision and Pattern Recognition (The Conference on Computer Vision and Pattern Recognition), and classification prediction is performed on each local feature representation of an image, and prediction values of The local feature representations are averaged to obtain a classification prediction result of The whole image. An article named "DeepEMD: Few-Shot Image Classification with differentiated Earth Mover's Distance" recorded in The Conference of International Computer Vision and Pattern Recognition in 2019 proposes to split an Image into a plurality of Image blocks, introduce a bulldozer Distance as a Distance measurement method between The Image blocks, and calculate The best matching cost between each Image block of a query Image and each Image block of a support Image to represent The similarity between The two Image blocks.

In addition to using local feature representations of images, some recent work has often involved subspace learning methods. For example, a TapNet model proposed in an article named "TapNet: New network augmented with task-adaptive projection for raw-shot Learning" at the international Machine Learning Conference (International Conference on Machine Learning) in 2019 learns task-specific subspace projections and classifies based on the subspace projections between query image features and supporting image features. An article named "Adaptive subspaces for raw-shot learning" in The international Conference on Computer Vision and Pattern Recognition in 2020 proposes a class-specific subspace based on a few learnt classes and uses The projected distances of query images to The respective class subspaces for classification.

Disclosure of Invention

The invention provides a mutual-neighbor-based few-sample image classification method and system, which are used for establishing a one-to-one nearest neighbor mapping relation between a local feature descriptor of a query image and a local feature descriptor of a support image, and pre-screening task-related local feature descriptors with discrimination properties, so that a better few-sample image classification method effect is achieved.

A mutual-neighbor-based few-sample image classification method comprises the following steps:

(1) constructing a pre-trained deep neural network model, so that when the image is subjected to forward derivation, a local feature descriptor representation of the image can be obtained;

(2) carrying out training data preparation, and dividing a few-sample training data set for model training into a plurality of few-sample image classification tasks;

(3) constructing a mutual-neighbor mapping relation of local feature descriptors in the query image and the support image set for the query image and the support image set in each less-sample task, and screening out the local feature descriptors of the query image with the mutual-neighbor relation;

(4) calculating the similarity score of naive Bayes nearest neighbor for the local feature descriptor of the screened query image and the local feature descriptor among the support image sets;

(5) according to the similarity score between the query image and each support class, using the cross entropy as a loss function to train the parameters of the deep network model;

(6) and after the model training is finished, the model is applied, the image to be queried is input, the trained model is used for calculating the similarity between the image to be queried and each class in the support image set, the similarity of each class is sequenced, and the class with the highest similarity in the support image set is selected as the prediction class of the query image.

In the step (1), the specific process of obtaining the local feature descriptor representation of the image is as follows:

extraction of deep visual features θ ∈ R of input x of query image using pre-trained deep neural network model^C ^×H×WThe local feature set is converted into a local feature set representation q ═ { q ═ q of the image₁,…,q_MWhere M ═ H × W denotes the number of local features of a single image, and q ∈ R^CRepresenting one of the local feature vectors; c represents the number of feature channels of the deep visual feature map θ, H represents the height of the feature map, and W represents the width of the feature map.

Extracting input from class c kth support image using same deep neural network model

Deep layer characteristics of

Converting it into a local feature descriptor set representation of the image

Wherein s is^c,k∈R^COne of the local feature descriptor vectors representing a support image;

using the union operation of all the local feature descriptors of K support images from the same class c to obtain the local feature descriptor set of the whole support class c

In the step (2), in the data preparation process, for N classes of low-sample classification tasks with K samples in each class, a training data set is randomly sampled into a set consisting of E low-sample tasks

Wherein

Represents the kth image, x, in the support set image set from the jth class in the ith sample-less task⁽ⁱ⁾Representing the query image in the ith sample-less task, y⁽ⁱ⁾Representing a category of the query image; each sample-less task comprises N x K support image sets

And a query image (x)⁽ⁱ⁾,y⁽ⁱ⁾) And (4) participating together.

The specific process of the step (3) is as follows:

(3-1) first construct a set of local feature descriptors P that support all classes in the image set:

wherein N is the number of all classes in a small sample classification task;

(3-2) for any local feature descriptor q ∈ q of the query image, find it in the set

The nearest local feature descriptor s of (1):

s＝NN_P(q)

wherein, NN_P(q) represents the nearest neighbor function in set P with respect to vector q;

(3-3) for the nearest local feature descriptor s corresponding to q in (3-2), finding its nearest feature descriptor in the set q

(3-4) if q in (3-2) and in (3-3)

If the query local feature descriptors are the same, then q and s are called as the local feature descriptors adjacent to each other;

(3-5) repeating the steps from (3-2) to (3-4), finding all query feature descriptors with mutual neighbor relation, and marking as a new query local feature descriptor set q^*。

The specific process of the step (4) is as follows:

(4-1) for the query local feature descriptor set q obtained by screening in the step (3)^*Any descriptor in q e q^*Computing its local feature descriptor set s together with the supporting c-like images in the image set^cThe maximum similarity of (c) is as follows:

wherein cos (-) is a cosine similarity calculation formula among vectors,

is represented in the set s^cA nearest neighbor function with respect to the vector q;

(4-2) query set q^*All local feature descriptors q e q in (1)^*About branchThe maximum cosine similarity of the c-th class in the support image set is accumulated according to a naive Bayes method, and the similarity of the query image x and the c-th class image in the support image set is calculated as follows:

in the step (5), in the training process, images in each small sample task are subjected to image matching

And query image label y⁽ⁱ⁾Training the neural network by using a loss function L, wherein the specific calculation formula of the loss function L is as follows:

wherein, δ (y)⁽ⁱ⁾C) is an indicator function, i.e. when y⁽ⁱ⁾When the condition is satisfied, the value is 1, and when the condition is not satisfied, the value is 0.

Before the application of the model, the method of the invention can test the model, and the specific process is as follows:

in the test data preparation process, in order to verify the robustness of the model learned by the small-sample task in the step (5), a test data set is randomly sampled into a set consisting of E' small-sample test tasks

Wherein D_testWith the aforementioned D_trainThere are two different points: (a) the image data sources are different, namely the categories of the training set and the testing set are not intersected; (b) d_testDoes not include the query image x⁽ⁱ⁾The classification information of (2), namely the information is only used as a standard to measure the quality of the neural network model classified by few samples, and does not participate in the calculation.

Test prediction phase, for each image in the less sample task

Predicting query image x⁽ⁱ⁾Class (D)

The calculation process of (2) is as follows:

test evaluation phase, if calculated

Is equal to y⁽ⁱ⁾Then the low sample task is considered to be predicted successfully. For test set D consisting of E' few sample tasks_testAnd (5) measuring the robustness of the few-sample neural network model trained in the step (5) by using the average prediction accuracy.

The mutual-neighbor-based few-sample image classification method provided by the invention has all the advantages of the few-sample image classification method, and can correctly eliminate the interference caused by a large number of repeated background local feature descriptors, such as the interference caused by the background feature sky in the few-sample bird classification task process. In practice, it is found that the proposed mutual neighbor algorithm can often extract the feature descriptors related to the foreground target, so that the accuracy of the low-sample classification method based on local feature descriptor classification is greatly improved

The invention also provides a mutual-neighbor-based few-sample image classification system, which comprises a computer system, wherein the computer system comprises:

the visual feature module captures the depth visual features of the input image by utilizing a depth convolution neural network;

the mutual-neighbor screening module screens out local feature descriptors which are most closely related to the task and have discriminability by using a mutual-neighbor algorithm;

the similarity calculation module is used for calculating the similarity between the query image and each support image class in the less-sample classification task by using the local feature descriptors obtained by screening through the mutual neighbor algorithm;

the classification retrieval module is used for classifying the few-sample images by utilizing the similarity between the query image and each support image class;

and the classification generation module is used for outputting a classification result to the outside after the model classification is finished.

Compared with the prior art, the invention has the following beneficial effects:

1. the mutual neighbor algorithm provided by the invention has the advantages that by screening the local feature descriptors which are most relevant to each task in the tasks with few samples, the classification result is determined by more discriminative foreground target features, the influence caused by the background local feature symbols is reduced, and the wrong classification on some query images with higher background ratio is avoided.

2. The invention is proved by a large number of experiments that the model performance superior to other baseline algorithms is demonstrated, and the superiority of the model is proved by experiments.

Drawings

FIG. 1 is a general framework layout of the method of the present invention;

FIG. 2 is a block diagram of a system according to the present invention;

FIG. 3 is a diagram illustrating a field of view of a local descriptor screened in a set of few-sample classification tasks according to an embodiment of the present invention.

Detailed Description

The invention will be described in further detail below with reference to the drawings and examples, which are intended to facilitate the understanding of the invention without limiting it in any way.

As shown in FIG. 1, the main model of the invention is divided into a visual feature module, a mutual neighbor screening module and a similarity calculation module, and the final similarity calculation score is used in the optimization process of the whole module. The method comprises the following specific steps:

step 1, a visual feature module learns depth visual features theta of an input image x in a few-sample image classification training process, and the basic steps are as follows:

1-1. query image in ith task of the Low sample image Classification task first crop image x scaled to 84X 84 size⁽ⁱ⁾As a query input to the network; supporting image set is also used to scale 84 × 84 image set

As a support input to the network.

1-2. for query image x⁽ⁱ⁾Extracting a feature map of the last non-classified layer of the neural network as a depth visual feature theta of the image⁽ⁱ⁾∈R^C×H×WWhere C denotes the number of channels of the feature map, and H and W denote the height and width of the feature map, respectively. In the method for classifying the few-sample images based on the local characteristic characters, a characteristic graph theta of a query image is expressed⁽ⁱ⁾Local feature set representation q ═ { q ═ q converted into query image₁,…,q_MWhere M ═ H × W denotes the number of local features of a single image, q ∈ R^CRepresenting one of the local feature vectors.

1-3, inputting the K-th supported image from the category c in the support set

Extracting the characteristic diagram of the last non-classified layer of the same neural network

Converting it into a local feature descriptor set representation of the image

Wherein

One of the local feature descriptor vectors representing a support image; using the local feature descriptors of K support images from the same class c to perform a set union operation, and obtaining a local feature descriptor set of the whole support image set with respect to the class c

Step 2, the mutual neighbor screening module screens out local feature descriptors related to the task for the few-sample image classification, and the basic steps are as follows:

2-1, constructing a local feature descriptor set P supporting all classes in the image set:

where N is the number of categories of the supporting image set in a low-sample classification task

2-2, initializing the local feature descriptor set q after screening^·＝{}

2-3, for any local feature descriptor q ∈ q of the query image, finding its nearest local feature descriptor s in the set P:

s＝NN_P(q)

wherein NN_P(q) denotes the nearest neighbor function in set P with respect to vector q.

2-4, finding the nearest local feature descriptor s corresponding to q found in the last step and the nearest feature descriptor s in the set q

Wherein NN_q(s) represents the nearest neighbor function in the set q with respect to the vector s.

2-5 if q and

for the same query local feature descriptor, the q and s are called as the local feature descriptors adjacent to each other, and the local feature descriptors are recorded into the screened local feature descriptor setThe method comprises the following steps:

traversing all q e q, repeating the step 2-3 and the step 2-4 to obtain a set q of all local feature descriptors meeting the mutual neighbor condition^*。

Step 3, the similarity calculation module provides a similarity calculation function of the query image and the support image set category based on the local feature descriptors for the few-sample image classification process, and the basic steps are as follows:

3-1, screening the query local feature descriptor set q obtained in the step (2-5)^*Any descriptor in q e q^*Computing its local feature descriptor set s together with the supporting class c image in the image set^cThe maximum similarity of (c) is as follows:

wherein cos (-) is a cosine similarity calculation formula among vectors,

is represented in the set s^cWith respect to the nearest neighbor function of vector q.

3-2. query set q^*All local feature descriptors q e q in (1)^*Accumulating the maximum cosine similarity phi (q, c) of the c-th class in the supporting image set according to a naive Bayes method to obtain the query image x⁽ⁱ⁾The similarity to the class c images in the support image set is calculated as follows:

step 4, the training steps of the mutual-neighbor-based few-sample image classification method are as follows:

4-1. support for class N KThe few-sample image classification task initializes a training data set consisting of E few-sample tasks in a standard classification training data set in a random sampling mode

Wherein

Represents the kth image, x, in the support set image set from the jth class in the ith sample-less task⁽ⁱ⁾Representing the query image in the ith sample-less task, y⁽ⁱ⁾Representing categories of query images in the ith sample-less task. In the network training process of each low-sample task, the network training system comprises N multiplied by K support image sets

And a query image

And (4) participating together.

4-2, selecting an image in a task with less samples

As input to the network model, label y with the query image⁽ⁱ⁾As a result of the correct classification of the task. Extracting local feature descriptors of the image through a visual feature module; screening local feature descriptors related to tasks through a mutual neighbor screening module; and calculating the similarity of the query image and each category in the support image set by using a similarity calculation module.

4-3, using a loss function based on cross entropy to maximize the classification probability, wherein the specific calculation formula of the loss function L is as follows:

wherein, δ (y)⁽ⁱ⁾C) is an indicator function, i.e. when y⁽ⁱ⁾C condition is fullIf sufficient, the value is 1, and if the condition is not satisfied, the value is 0.

And 4-4, repeating the steps d2-d3 by adopting a gradient descent method, and training the parameters of the visual feature module.

Step 5, the sample classification step of the less-sample image classification method based on mutual neighbor is as follows:

5-1. for input query image x⁽ⁱ⁾Using the trained model to calculate its similarity phi (x) with each class in the support image set⁽ⁱ⁾,c)。

And 5-2, sequencing the similarity of all categories, and selecting the category supporting the highest similarity in the image set as the prediction category of the query image.

As shown in fig. 2, a mutual-neighbor-based few-sample classification system is divided into five modules, which are a visual feature module, a mutual-neighbor screening module, a similarity calculation module, a classification retrieval module, and a classification generation module.

The following embodiments are provided to illustrate the technical effects of the present invention by applying the present invention or the method, and detailed steps in the embodiments are not described again.

This example compares the three large public data sets miniImageNet, tiered ImageNet and CUB with the other current leading edge few sample image classification methods. miniImageNet is the most well-known evaluation dataset in the task of few-sample image classification, containing 100 randomly selected classes from the large-scale image dataset ImageNet, 600 images per class; on miniImageNet, 64 classes were used to train the few sample classification neural network, 16 classes were used to cross-validate the robustness of the network, and 20 classes were used to evaluate the generalization ability of the network. tiered ImageNet is a subset of the large-scale image data set ImageNet, as is miniImageNet, and contains a broader class than miniImageNet; wherein 351 self classes from the 20 major classes are used for training, 97 sub-classes from the 6 major classes are used for cross-validation, and pictures of 160 sub-classes from the 8 major classes are used for testing; in the tiered imagenet, a challenging data set, the information overlap between training, cross-validation, test sets is very small. CUB is a fine-grained sorted data set containing a total of 11788 pictures from 200 birds; wherein 100 classes are used for network training, 50 classes are used for network verification, and 50 classes are used for network model testing.

The evaluation index of this embodiment is the average classification accuracy of the low-sample classification task (including two cases, N is 5, K is 1 and N is 5, and K is 5) under the condition of randomly sampling 6000N types of each class of K samples in the test set, and totally compares the low-sample image classification algorithms of 4 current mainstream under the condition of using two large and small sample classification mainstream neural networks (Conv4 and ResNet12) respectively in the visual feature module, and the overall comparison result is shown in table 1 and table 2. Table 1 shows the classification result of the visual feature module with Conv4 as the skeleton network (N ═ 5); table 2 shows the classification result of the visual feature module using ResNet12 as the skeleton network (N ═ 5).

TABLE 1

TABLE 2

As can be seen from tables 1 and 2, the mutual-neighbor-based less-sample image classification framework provided by the invention obtains the optimal effect under each large evaluation index, and fully shows the superiority of the algorithm of the invention.

To further illustrate that the algorithm proposed by the present invention indeed screens out the local descriptors of the foreground object by the mutual neighbor algorithm, the present invention visualizes the receptive field of the local descriptors screened out in a group of few-sample classification tasks, and the result is shown in fig. 3. It can be seen that in the task of classifying the few samples of a bird data set with N-5 and K-1, the main part of the bird is screened out for similarity calculation by constructing the mutual neighbor relationship, and those repeated backgrounds are eliminated from the similarity consideration.

The embodiments described above are intended to illustrate the technical solutions and advantages of the present invention, and it should be understood that the above-mentioned embodiments are only specific embodiments of the present invention, and are not intended to limit the present invention, and any modifications, additions and equivalents made within the scope of the principles of the present invention should be included in the scope of the present invention.

Claims

1. A mutual-neighbor-based few-sample image classification method is characterized by comprising the following steps:

2. The mutual-neighbor-based few-sample image classification method according to claim 1, wherein in the step (1), the specific process of obtaining the local feature descriptor representation of the image is as follows:

extracting deep visual features θ of input x of query image using pre-trained deep neural network model∈R^C×H×WThe local feature set is converted into a local feature set representation q ═ { q ═ q of the image₁，...，q_MWhere M ═ H × W denotes the number of local features of a single image, and q ∈ R^CRepresenting one of the local feature vectors; c represents the channel number of the characteristic diagram, and H and W represent the height and width of the characteristic diagram respectively;

Deep layer characteristics of

Converting it into a local feature descriptor set representation of the image

Wherein s is^c，k∈R^COne of the local feature descriptor vectors representing a support image;

3. The mutual-neighbor-based few-sample image classification method according to claim 2, wherein in the step (2), in the data preparation process, for N classes of few-sample classification tasks with K samples in each class, the training data set is randomly sampled into a set consisting of E few-sample tasks

Wherein

And a query image (x)⁽ⁱ⁾，y⁽ⁱ⁾) And (4) participating together.

4. The mutual-neighbor-based few-sample image classification method according to claim 3, wherein the specific process of step (3) is as follows:

wherein N is the number of all classes in a small sample classification task;

(3-2) for any local feature descriptor q ∈ q of the query image, find its nearest local feature descriptor s in the set P:

s＝NN_P(q)

(3-4) if q in (3-2) and in (3-3)

5. The mutual-neighbor-based few-sample image classification method according to claim 4, wherein the specific process of step (4) is as follows:

wherein cos (-) is a cosine similarity calculation formula among vectors,

(4-2) query set q^*All local feature descriptors q e q in (1)^*The maximum cosine similarity of the c-th class in the support image set is accumulated according to a naive Bayes method, and the similarity of the query image x and the c-th class of images in the support image set is obtained by the following calculation:

6. mutual neighbor based few-sample image segmentation as claimed in claim 5The class method is characterized in that in the step (5), in the training process, the image in each sample-less task is subjected to

7. A mutual-neighbor-based few-sample image classification system, comprising a computer system, the computer system comprising: