CN112633382B

CN112633382B - Method and system for classifying few sample images based on mutual neighbor

Info

Publication number: CN112633382B
Application number: CN202011561516.5A
Authority: CN
Inventors: 刘洋; 蔡登�; 郑途; 张毅; 何晓飞
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University ZJU
Priority date: 2020-12-25
Filing date: 2020-12-25
Publication date: 2024-02-13
Anticipated expiration: 2040-12-25
Also published as: CN112633382A

Abstract

The invention discloses a method and a system for classifying few sample images based on mutual neighbor, wherein the method for classifying the few sample images comprises the following steps: (1) Forward deriving visual feature representations of the query image and the support image from the neural network model; (2) Screening local feature descriptors related to tasks in the few sample classification by using a mutual neighbor algorithm; (3) Calculating the similarity between the query image and each type in the support image set by using the descriptors obtained by screening; (4) Training a neural network model after performing task division with less samples by using the image data set with the labels; (5) And sorting according to the similarity between the images in the supported image set, and selecting the class with the maximum similarity as the class prediction of the image. By using the method and the device, the interference of a large number of local feature descriptors from the background on similarity calculation can be eliminated in the training process of classifying the images with few samples, so that the classification result is more robust.

Description

Method and system for classifying few sample images based on mutual neighbor

Technical Field

The invention belongs to the technical field of image classification, and particularly relates to a method and a system for classifying few sample images based on mutual neighbor.

Background

In recent years, object classification has received attention from researchers in industry and academia as an important branch in the field of computer vision. While the task of supervised object classification has made great progress, which benefits from the rapid development of deep learning techniques, there are some limitations to the training method under such supervised conditions, namely that in supervised classification, each class requires enough labeled training samples. However, in practical applications, each class may not have enough training samples, labeling picture data requires a certain expertise and often takes a lot of manpower.

The goal of the few sample image classification is to learn a machine learning model about image classification such that it can quickly classify new image classes with a small number of samples after learning the image classification task for a large amount of data for a certain class. The few sample image classification method has become a direction of rapid development in the field of machine learning, and has achieved a certain result in the classification of medical images, satellite pictures and some rare species.

The current mainstream few-sample image classification method mainly adopts a classification method based on image global representation and metric learning. The deduction method comprises the following steps: inputting an image, and deriving a global feature vector representation of the image by the model in a first stage; various methods based on metric learning are used in the second stage to measure the distance between the query image and the support image consisting of a small number of samples. For example, the MatchingNet model, as proposed by Oriol Vinylals et al, paper Matching Networks for One Shot Learning, incorporated by the university of neuro information handling systems (Conference and Workshop onNeural Information Processing Systems), classifies images by training a bi-directional long and short term memory network for global feature coding and an end-to-end nearest neighbor classifier. Prototype networks proposed in Prototypical Networks for Few-shot Learning, recorded by Jake Snell et al in the university of neuro-information processing systems (Conference and Workshop onNeural Information Processing Systems) in 2017, treat classification problems as finding prototype centers of each class in semantic space. For task definition of few sample image classification, a prototype network learns how to fit a center during training, and learns a metric function, which can find a prototype center of a class in the metric space through a few samples. The article Learning to Compare: relation Network for Few-shot Learning by Flood sun et al in 2018, incorporated by reference to the international computer vision and pattern recognition conference (The Conference on Computer Vision and Pattern Recognition), considers that an artificially designed measurement pattern is always defective and may fail in some states; to solve this problem, they propose a method of letting the neural network select a specific metric for different few sample classification tasks itself based on a relational network in which an embedding unit and a relational unit are combined.

Recent advances in classification of small sample images do not use a single feature vector to globally characterize the entire image, but rather learn a feature representation based on local features that preserves as much of the local information as possible. The DN4 model proposed in article Revisiting Local Descriptor based Image-to-Class Measure for Few-shot Learning in international computer vision and pattern recognition conference (The Conference on Computer Vision and Pattern Recognition) in 2019 uses a naive bayes nearest neighbor-based measurement method to aggregate similarity measurement values of each local feature representation of a query image and a support image set. The Dense classification and implanting for few-shot classification recorded in 2019 international computer vision and pattern recognition conference (The Conference on Computer Vision and Pattern Recognition) provides a dense classification method, and classification prediction is performed on each local feature representation of an image, and the prediction values of the local feature representations are averaged to obtain a classification prediction result of the whole image. The DeepEMD Few-Shot Image Classification with Differentiable Earth Mover's Distance article in 2019, incorporated by reference in the International computer Vision and Pattern recognition conference (The Conference on Computer Vision and Pattern Recognition), proposes splitting an image into multiple tiles, introducing a bulldozer Distance as a Distance metric method between the tiles, and calculating the best matching cost between each tile of the query image and the supporting image to represent the similarity between the two.

In addition to using local feature representations of images, few recent work has often involved subspace learning methods. For example, a paper titled TapNet: neural network augmented with task-adaptive projection for few-shot learning in the International machine learning Association (Internation Conference on Machine Learning) in 2019 suggests that the TapNet model learns task-specific subspace projections and classifies using distance projections between query image features and support image features based on subspace mapping. In the international computer vision and pattern recognition conference (The Conference on Computer Vision and Pattern Recognition) in 2020, an article named Adaptive subspaces for few-shot learning proposes a class-specific subspace based on several less learning per class, and uses the distance of projection of the query image into each class subspace for classification.

Disclosure of Invention

The invention provides a method and a system for classifying few sample images based on mutual neighbor, which are used for establishing a one-to-one nearest neighbor mapping relation between local feature descriptors of query images and local feature descriptors of support images, and pre-screening out the local feature descriptors with discrimination property related to tasks so as to achieve the effect of a better few sample image classification method.

A few sample image classification method based on mutual neighbor includes the following steps:

(1) Constructing a pre-trained deep neural network model, so that the local feature descriptor representation of the image can be obtained when the image is subjected to forward derivation;

(2) Preparing training data, and dividing a training data set with few samples for model training into a plurality of image classification tasks with few samples;

(3) Constructing a mutual neighbor mapping relation between the query image and the local feature descriptors in the support image set for the query image and the support image set in each less sample task, and screening out the local feature descriptors of the query image with the mutual neighbor relation;

(4) Calculating a similarity score of a naive Bayes nearest neighbor for the local feature descriptors of the screened query images and the local feature descriptors between the support image sets;

(5) Training parameters of a deep network model by using cross entropy as a loss function according to similarity scores between the query image and each support class;

(6) And after model training is finished, the application is carried out, the image to be queried is input, the similarity between the image to be queried and each class in the support image set is calculated by using the trained model, the similarity of each class is ordered, and the class with the highest similarity in the support image set is selected as the prediction class of the query image.

In the step (1), the specific process of obtaining the local feature descriptor representation of the image is as follows:

deep visual features θ∈r of input x of query images are extracted using a pre-trained deep neural network model ^C ^×H×W Converting it into a local feature set representation q= { q of the image ₁ ,…,q _M M=h×w, where m=h×w represents the number of local features of a single image, q∈r ^C Representing one of the local feature vectors; c represents the number of feature channels of the deep visual feature map θ, H represents the height of the feature map, and W represents the width of the feature map.

Extracting input from class c kth support image using the same deep neural network modelDeep features of->The local feature descriptor set representation converting it into an image +.>Wherein s is ^c,k ∈R ^C Representing one of the local feature descriptor vectors of a support image;

the local feature descriptors of all K support images from the same class c are combined together to obtain a local feature descriptor set of the whole support class c

In the step (2), in the data preparation process, for the few sample classification task with K samples for each of N classes, the training data set is randomly sampled into a set consisting of E few sample tasksWherein->Representing the kth image, x, in the ith less sample task from the support set image set of the jth class ⁽ⁱ⁾ Representing query images in the ith less sample task, y ⁽ⁱ⁾ Representing a category of the query image; each of the few sample tasks includes N x K support image setsAnd a query image (x ⁽ⁱ⁾ ,y ⁽ⁱ⁾ ) And participate in the joint.

The specific process of the step (3) is as follows:

(3-1) first constructing a local feature descriptor set P supporting all classes in an image set:

wherein N is the number of all classes in a few sample classification task;

(3-2) for any local feature descriptor q ε q of the query image, find it in the collectionThe most recent local feature descriptor s:

s＝NN _P (q)

wherein NN _P (q) represents the nearest neighbor function in the set P with respect to the vector q;

(3-3) for the nearest local feature descriptor s corresponding to q in (3-2), find its nearest feature descriptor in the set q

(3-4) if q in (3-2) and q in (3-3)For the same query local feature descriptor, q and s are the mutual neighbor local feature descriptors;

(3-5) repeating the steps (3-2) to (3-4), finding all query feature descriptors with mutual neighbor relations, and marking the query feature descriptors as a new query local feature descriptor set q ^* 。

The specific process of the step (4) is as follows:

(4-1) for the query local feature descriptor set q obtained by the screening in the step (3) ^* Any descriptor q e q ^* Computing its local feature descriptor set s with class c images in the support image set ^c The maximum similarity of (2) is as follows:

wherein cos (·) is the cosine similarity calculation formula between vectors,expressed in the set s ^c A nearest neighbor function with respect to vector q;

(4-2) query set q ^* All local feature descriptors q e q in (1) ^* The maximum cosine similarity of the class c in the support image set is accumulated according to a naive Bayes method, and the similarity of the query image x and the class c image in the support image set is calculated as follows:

in step (5), during training, for each image in the task with fewer samplesQuery image tag y ⁽ⁱ⁾ The neural network is trained by using the loss function L, and a specific calculation formula of the loss function L is as follows:

wherein, delta (y) ⁽ⁱ⁾ =c) is an indicator function, i.e. when y ⁽ⁱ⁾ When the condition of =c is satisfied, the value is 1, and when the condition is not satisfied, the value is 0.

The method of the invention can test the model before the model is applied, and the specific process is as follows:

in the test data preparation process, in order to verify the robustness of the model learned by the task with few samples in the step (5), randomly sampling the test data set into a set consisting of E' test tasks with few samplesWherein D is _test With D mentioned earlier _train There are two different points: (a) The sources of the image data are different, namely the categories of the training set and the testing set are not intersected; (b) D (D) _test Does not include query image x ⁽ⁱ⁾ I.e. the information is used only as a criterion to measure the quality of the less sample classified neural network model and not to participate in the calculation.

A test prediction stage for images in each of the few sample tasksPredictive query image x ⁽ⁱ⁾ Class (I)>The calculation process of (1) is as follows:

test evaluation stage, if calculatedEqual to y ⁽ⁱ⁾ This less sample task prediction is considered successful. For test set D consisting of E' few sample tasks _test The average prediction accuracy is used to measure the robustness of the few-sample neural network model trained in step (5).

The small sample image classification method based on the mutual neighbor has all the advantages of the small sample image classification method, and can accurately eliminate the interference caused by a large number of repeated background local feature descriptors, such as the interference caused by background feature sky in the small sample bird classification task process. In practice, the proposed mutual neighbor algorithm is found to be capable of extracting feature descriptors about foreground targets, so that the accuracy of a few-sample classification method based on local feature descriptor classification is greatly improved

The invention also provides a few-sample image classification system based on the mutual neighbor, which comprises a computer system, wherein the computer system comprises:

the visual characteristic module is used for capturing depth visual characteristics of the input image by using the depth convolution neural network;

the mutual neighbor screening module screens out the local feature descriptors with discriminant which are most closely related to the task by using a mutual neighbor algorithm;

the similarity calculation module is used for calculating the similarity between the query image and each supporting image class in a few-sample classification task by utilizing the local feature descriptors obtained by screening the mutual neighbor algorithm;

the classification retrieval module is used for classifying the images with few samples by utilizing the similarity between the query image and each supporting image class;

and the classification generating module is used for outputting classification results to the outside after the model is classified.

Compared with the prior art, the invention has the following beneficial effects:

1. according to the mutual neighbor algorithm provided by the invention, the local feature descriptors most relevant to each task in the few-sample tasks are screened, so that the classification result is determined by more foreground target features with discriminant, the influence caused by background local features is reduced, and the erroneous classification on query images with high background occupation is avoided.

2. A large number of experiments prove that the model performance superior to other baseline algorithms is shown, and experiments prove that the model has superiority.

Drawings

FIG. 1 is a block diagram of a method of the present invention;

FIG. 2 is a schematic flow diagram of a specific module of the system of the present invention;

FIG. 3 is a diagram illustrating the visibility of the receptive field of a selected local descriptor in a set of few sample classification tasks in accordance with an embodiment of the invention.

Detailed Description

The invention will be described in further detail with reference to the drawings and examples, it being noted that the examples described below are intended to facilitate the understanding of the invention and are not intended to limit the invention in any way.

As shown in fig. 1, the main model of the present invention is divided into a visual feature module, a mutual neighbor screening module, and a similarity calculation module, and the final similarity calculation score is used for the optimization process of the whole module. The method comprises the following specific steps:

step 1, a visual feature module learns depth visual features theta of an input image x in a few-sample image classification training process, and basically comprises the following steps:

1-1. Query image in the ith task of the few sample image classification task is first cropped to an image x scaled to 84 x 84 size ⁽ⁱ⁾ As a query input to the network; supporting the image set also simultaneously employs the same cropping as the image set scaled to 84 x 84 sizeAs a support input to the network.

1-2 for query image x ⁽ⁱ⁾ Extracting a feature map of the last non-classified layer of the neural network as a depth visual feature theta of the image ⁽ⁱ⁾ ∈R ^C×H×W Wherein C representsThe number of channels of the signature, H and W, represent the height and width of the signature, respectively. In the few sample image classification method representation based on local feature, the feature map theta of the query image is used for ⁽ⁱ⁾ The local feature set representation q= { q converted into query image ₁ ,…,q _M M=h×w, where m=h×w represents the number of local features of a single image, q∈r ^C Representing one of the local feature vectors.

1-3 image input to the K-th support from support set category cExtracting feature map of last non-classified layer of the same neural network>The local feature descriptor set representation converting it into an image +.>Wherein->Representing one of the local feature descriptor vectors of a support image; the local feature descriptors of all K support images from the same class c are combined to obtain the local feature descriptor set of the whole support image set with the class c>

Step 2, a mutual neighbor screening module screens out task related local feature descriptors for few sample image classification, and the basic steps are as follows:

2-1, constructing a local feature descriptor set P supporting all classes in an image set:

where N is the number of categories supporting the image set in a few sample classification task

2-2. Initializing the filtered local feature descriptor set q ^· ＝{}

2-3 for any local feature descriptor q e q of the query image, find its nearest local feature descriptor s in the set P:

s＝NN _P (q)

wherein NN _P (q) represents the nearest neighbor function in the set P with respect to the vector q.

2-4 for the nearest local feature descriptor s found in the previous step, which corresponds to q, find its nearest feature descriptor in the set q

Wherein NN _q (s) represents the nearest neighbor function in set q with respect to vector s.

2-5 if q andfor the same query local feature descriptor, q and s are called as mutual neighbor local feature descriptors, and are recorded into the filtered local descriptor set:

traversing all q E q, repeating the steps 2-3 and 2-4 to obtain a set q of all local feature descriptors meeting the mutual neighbor condition ^* 。

Step 3, a similarity calculation module provides a similarity calculation function of query images and support image set categories based on local feature descriptors for a few-sample image classification process, and the basic steps are as follows:

3-1, the local feature of the query obtained by screening in the step (2-5)Condition descriptor set q ^* Any descriptor q e q ^* Computing a set s of local feature descriptors classified as c-images in the set of support images ^c The maximum similarity of (2) is as follows:

wherein cos (·) is the cosine similarity calculation formula between vectors,expressed in the set s ^c In relation to the nearest neighbor function of vector q.

3-2 for query set q ^* All local feature descriptors q e q in (1) ^* Accumulating the maximum cosine similarity phi (q, c) of the class c in the support image set according to a naive Bayes method to obtain a query image x ⁽ⁱ⁾ The similarity to class c images in the support image set is calculated as follows:

step 4, training steps of the few-sample image classification method based on the mutual neighbor are as follows:

4-1. Regarding the task of classifying a few sample image of N class K support samples, initializing a training dataset consisting of E few sample tasks in a standard classification training dataset in a random sampling mannerWherein->Representing the kth image, x, in the ith less sample task from the support set image set of the jth class ⁽ⁱ⁾ Representing query images in the ith less sample task, y ⁽ⁱ⁾ Representing the category of the query image in the ith few sample task. Network training for each less sample taskIn the process, N×K support image sets +.>And a query imageAnd participate in the joint.

4-2, selecting an image in a task with less samplesAs input to the network model, query image tag y ⁽ⁱ⁾ As a result of the correct classification of the task. Extracting a local feature descriptor of the image through a visual feature module; screening task related local feature descriptors through a mutual neighbor screening module; and calculating the similarity of each category in the query image and the support image set by using a similarity calculation module.

4-3, using a loss function based on cross entropy to maximize classification probability, wherein a specific calculation formula of the loss function L is as follows:

4-4, repeating the steps d2-d3 by adopting a gradient descent method, and training the parameters of the visual characteristic module.

Step 5, sample classification steps of the few-sample image classification method based on the mutual neighbor are as follows:

5-1 for input query image x ⁽ⁱ⁾ The similarity phi (x) between the training model and each class in the support image set is calculated by using the training model ⁽ⁱ⁾ ,c)。

And 5-2, sorting the similarity of each class, and selecting the class with the highest similarity in the supported image set as the prediction class of the query image.

As shown in fig. 2, a small sample classification system based on mutual neighbor is divided into five modules, namely a visual feature module, a mutual neighbor screening module, a similarity calculation module, a classification search module and a classification generation module.

The following embodiments apply the present invention or the method thereof to the following embodiments to embody the technical effects of the present invention, and specific steps in the embodiments are not described herein.

This example compares with other current frontmost few sample image classification methods on three large public datasets miniImageNet, tieredImageNet and CUB. miniImageNet is the most well-known evaluation dataset in the task of classifying small sample images, and contains 100 categories from a large-scale image dataset, 600 images in each category, randomly selected from the ImageNet; on miniImageNet, 64 classes are used to train a few-sample classification neural network, 16 classes are used to cross-verify the robustness of the network, and 20 classes are used to evaluate the generalization ability of the network. TieredImageNet is the same as miniImageNet and is a subset of the large-scale image dataset ImageNet, and contains a wider class than miniImageNet; wherein 351 self-classes from 20 major classes are used for training, 97 subclasses from 6 major classes are used for cross-validation, and pictures from 160 subclasses from 8 major classes are used for testing; in the challenging dataset, tier imagenet, the information overlap between training, cross-validation, test sets is very small. CUB is a finely-classified dataset containing 11788 pictures from 200 birds in total; of which 100 classes are used for network training, 50 classes are used for verification of the network, and 50 classes are used for testing of the network model.

The evaluation index of this embodiment is the average classification accuracy of the few sample classification task (including two cases of n= 5,K =1 and n= 5,K =5) under randomly sampling 6000N classes of K samples each in the test set, and the total comparison results are shown in table 1 and table 2. Table 1 is the classification result (n=5) of the visual feature module with Conv4 as the skeleton network; table 2 shows the classification results (n=5) of the visual characteristics module using res net12 as the skeleton network.

TABLE 1

TABLE 2

From tables 1 and 2, the less sample image classification frame based on the mutual neighbor provided by the invention obtains the optimal effect under each big evaluation index, and fully demonstrates the superiority of the algorithm of the invention.

In order to further explain that the algorithm provided by the invention is to screen out the local descriptors of the foreground object by the algorithm of the mutual neighbor, the invention visualizes the receptive fields of the local descriptors screened out of a group of few sample classification tasks, and the result is shown in fig. 3. It can be seen that in a few sample classification task for a bird dataset of n= 5,K =1, the main body portion of the bird is screened for similarity calculations by constructing a mutual neighbor relationship, while those duplicate backgrounds are eliminated from consideration for similarity.

The foregoing embodiments have described in detail the technical solution and the advantages of the present invention, it should be understood that the foregoing embodiments are merely illustrative of the present invention and are not intended to limit the invention, and any modifications, additions and equivalents made within the scope of the principles of the present invention should be included in the scope of the invention.

Claims

1. The method for classifying the few-sample images based on the mutual neighbor is characterized by comprising the following steps of:

2. The method for classifying images based on small samples in mutual neighbor according to claim 1, wherein in the step (1), the specific process of obtaining the local feature descriptor representation of the image is as follows:

deep visual features θ∈r of input x of query images are extracted using a pre-trained deep neural network model ^C×H×W Converting it into a local feature set representation q= { q of the image ₁ ，...，q _M M=h×w, where m=h×w represents the number of local features of a single image, q∈r ^C Representing one of the local feature vectors; c represents the channel number of the feature map, and H and W represent the height and width of the feature map respectively;

extracting input from class c kth support image using the same deep neural network modelDeep features of (2)The local feature descriptor set representation converting it into an image +.>Wherein s is ^c，k ∈R ^C Representing one of the local feature descriptor vectors of a support image;

3. The method of classifying images based on mutual neighbor according to claim 2, wherein in the step (2), in the data preparation process, for each of N classes of the task of classifying images with K samples, the training data set is randomly sampled into a set of E tasks with fewer samplesWherein->Representing the kth image, x, in the ith less sample task from the support set image set of the jth class ⁽ⁱ⁾ Representing query images in the ith less sample task, y ⁽ⁱ⁾ Representing a category of the query image; each of the few sample tasks includes N x K support image setsAnd a query image (x ⁽ⁱ⁾ ，y ⁽ⁱ⁾ ) And participate in the joint.

4. The method for classifying small sample images based on mutual neighbor according to claim 3, wherein the specific process of step (3) is as follows:

wherein N is the number of all classes in a few sample classification task;

(3-2) for any local feature descriptor q e q of the query image, find its nearest local feature descriptor s in the set P:

s＝NN _P (q)

5. The method for classifying images based on small samples in mutual neighbor as claimed in claim 4, wherein the specific process of step (4) is as follows:

6. the method of classifying small sample images based on mutual neighbor according to claim 5, wherein in step (5), the images in each small sample task are used in the training processQuery image tag y ⁽ⁱ⁾ The neural network is trained by using the loss function L, and a specific calculation formula of the loss function L is as follows:

7. A few-sample image classification system based on mutual neighbors comprising a computer system, the computer system comprising: