CN114444600A

CN114444600A - Small sample image classification method based on memory enhanced prototype network

Info

Publication number: CN114444600A
Application number: CN202210105376.3A
Authority: CN
Inventors: 杨赛; 杨慧; 周伯俊; 胡彬
Original assignee: Nantong University
Current assignee: Nantong University
Priority date: 2022-01-28
Filing date: 2022-01-28
Publication date: 2022-05-06

Abstract

The invention discloses a small sample image classification method based on a memory enhanced prototype network. Meta learning is the mainstream learning paradigm for solving the small sample image classification task at present, and a prototype network is the most classical model, so that excellent classification performance is obtained in the small sample image classification task. The prototype network uses a segmented learning strategy to double sample tasks and data on a meta-training data set to obtain prior knowledge that quickly adapts to new tasks. However, the random extraction method makes the prototype network not fully utilize all information in the meta-training data set, so the memory element is added in the prototype network to memorize the typical sample representation in the meta-training data set, and the prior information in the meta-training data set can be fully utilized end to correct the prototype.

Description

Small sample image classification method based on memory enhanced prototype network

Technical Field

The invention relates to a small sample image classification method based on a memory enhanced prototype network, and belongs to the field of small sample image classification.

Background

In recent years, with the continuous development of deep learning technology, great breakthroughs are made in various research fields of artificial intelligence. However, this great success relies on training large capacity deep convolutional neural networks using massive amounts of tag data. This training strategy for deep learning techniques greatly limits their application in many practical situations because in many cases only a limited small number of label samples are provided. In this context, small sample learning is becoming a new research focus in the fields of computer vision and machine learning. The technology is a very challenging research topic, and aims to complete classification judgment of a new image model by using only a small amount of samples.

Meta-learning, also called how to learn by learning, decomposes a data set into different tasks in a meta-training stage, takes the generalization performance of a test sample as a learning target, and learns the common part in the different tasks of the learning, which has gradually become a mainstream method for solving the problem of small sample learning. The depth element learning method based on the measurement obtains good performance in a small sample image classification task, the method uses a depth neural network to project an image sample to a certain embedding space, calculates the similarity of the sample in the embedding space, and classifies the similarity into the same category. The classical model is a prototype network proposed by Snell et al (Snell J, Swersky K, Zemel R. prototypical networks for raw-shot learning [ C ]// Proceedings of the 31st annular Conference on Neural Information Processing Systems, Long Beach, CA, USA: NIPS,2017: 4077-4.), the method calculates the mean value of all supported sample features as the prototype of the sample, and classifies the query sample into categories to which the most recently supported sample prototype belongs. How to calculate the more optimal prototypes in each category was studied by subsequent work. For example, for the purposes of Fort et al (for S. multipurpose predictive networks for raw-shot searching on arbitrary [ J ]. arXiv prediction arXiv:1708.02735,2017. it is proposed to use the centroid of each type of sample as a prototype representation and to use the Mahalanobis distance function to calculate the similarity between samples, to calculate the covariance of each type of sample as a variance matrix in the distance function Hilliard et al (Hilliard N, Phillips L, Howland S, et al. Few-thinned with statistical-statistical systematic approach [ J ]. arXiv.: 1802.04376,2018.) to add a correlation network module behind the feature extraction network, which consists of fully connected layers, to further map each type of sample into a 128-dimensional vector, to use a parametric learning method to obtain a prototype representation of each type of sample [ M, sample ] et al (sample assignment) expression of a nonlinear simulation [ J.: sample J.: 1803.00676,2018. sample expression, but the confidence of the prototype produced by this method is greatly reduced since a large number of non-annotated samples come from different classes.

Although the improved method can correct the prototype calculated based on a small number of support samples, the meta-learning paradigm of segmented learning is still used, and the information of the base class sample data cannot be sufficiently mined, so that the invention discloses a small sample image classification method based on a memory enhanced prototype network.

Disclosure of Invention

The technical problem to be solved by the invention is as follows: according to the small sample image classification method based on the memory enhanced prototype network, a memory element is added in the prototype network to memorize typical sample representatives in the base class data set, and the prior information in the base class data set can be fully utilized end to correct the prototypes.

The invention adopts the following technical scheme for solving the technical problems:

the small sample image classification method based on the memory enhanced prototype network comprises the following steps:

step 1, processing an input image data set. The input image data set is denoted as I and is randomly divided into two image subsets, which are respectively meta-training image data sets I_trainAnd meta test image dataset I_test.. In data set I_trainZhongrandAnd extracting images of C categories, randomly extracting K image samples as support samples in all the images in each category, randomly extracting Q image samples as query samples in the rest images, and finishing the construction of a C-way-K-shot classification task on a meta-training data set. The set of c-th class-supported image samples is denoted as

Wherein

Representing the k-th supported sample image,

indicating the category label to which the image corresponds. The set of query image samples of the c-th category is represented as

Wherein

Representing the q-th query sample image,

indicating the category label to which the image corresponds. C-way-K-shot classification tasks are also constructed on the meta-test data set, and the set of the C-th class supporting image samples is expressed as

Wherein

Representing the k-th supported sample image,

Wherein

Representing the q-th query sample image.

And 2, extracting the characteristics of the image sample and initializing a memory element in the meta-training process. Convolutional neural network f with parameter theta as trunk network in assumed model_θAnd inputting the kth training support sample into the backbone network to obtain the D-dimensional characteristic of the kth training support sample expressed as

The q query sample image is input into the backbone network to obtain D-dimensional characteristic representation thereof

The circumscribed memory elements are represented as a matrix M whose jth element M (j) is a vector of the same dimension as the feature, which are initialized to [0,1]]Random numbers within a range.

And 3, calculating an initial prototype. Assuming that in the feature extraction step after step 2, the set of the c-th class supporting image sample features is expressed as

Wherein

Representing the k-th supported sample image,

indicating the category label to which the image corresponds. In the meta-training process, an initial prototype representation in each type of support sample set, namely a set of c-th class support image sample features is calculated

The initial prototype of the c-th class is represented as

Step 4, the read operation and the write operation of the memory device M. The read operation of the memory device requires the completion of an initial prototype

And matching with the memory matrix M, and obtaining a reading weight vector by adopting a method for calculating the similarity between the initial prototype and the matrix elements. At the same time, the initial prototype of the c-th class is calculated

And updating the matching of the elements in the memory matrix M to complete the writing operation.

And 5, correcting linear synthesis of the prototype. Obtaining and initializing prototypes in memory devices based on read weight vectors

The matched matrix elements are expressed as

The corrected prototype of the c-th class is the initial prototype

And with

Is calculated as a linear weighted sum of.

And 6, calculating a training loss function. Inputting query image samples into a backbone network f_θThe characteristics of the q query sample are expressed as

Calculating similarity scores between the correction prototypes and each class of correction prototypes by using Euclidean distance, and converting the similarity scores into probability output values belonging to the c-th support class by using a softmax function

Calculating a probability output value and a true tag value

And optimizing a parameter theta in the backbone network by using a gradient descent algorithm to complete meta-training by using a cross entropy loss function between the two.

Step 7, backbone network f in the fixed element training process_θThe k test support sample is input into the backbone network to obtain the D dimension characteristic thereof, which is expressed as

Inputting the q-th test query sample image into the backbone network to obtain the D-dimensional characteristic thereof expressed as

And (4) correcting the initial prototype in each type of support sample set by using the steps 3 and 4, calculating the similarity between the characteristics of the query sample and each type of sample, and classifying the similarity into the most similar type.

As a preferred embodiment of the present invention, the detailed description of the steps described in step 1 is as follows:

(1) the input image data set is denoted as I and is randomly divided into two image subsets, which are respectively meta-training image data sets I_trainAnd meta test image dataset I_test.。

(2) In data set I_trainAnd randomly extracting images of C categories, randomly extracting K image samples as support samples in all the images in each category, randomly extracting Q image samples as query samples in the rest images, and finishing the construction of a C-way-K-shot classification task on a meta-training data set. The set of the c-th class-supported image samples is represented as

Wherein

Representing the k-th supported sample image,

Wherein

Representing the q-th query sample image,

indicating the category label to which the image corresponds.

(3) In data set I_trainRandomly extracting images of C categories, randomly extracting K image samples as support samples in all the images in each category, randomly extracting Q image samples as query samples in the rest images, and completing the task of constructing a C-way-K-shot classification on a meta-test data set. The set of the c-th class-supported image samples is represented as

Wherein

Representing the k-th supported sample image,

Wherein

Representing the q-th query sample image.

As a preferred embodiment of the present invention, the detailed description of the steps described in step 2 is as follows:

(1) in a modelConvolutional neural network f with backbone network as parameter theta_θThen, the kth training support sample is input into the backbone network to obtain a D-dimensional feature expression of the kth training support sample as follows:

inputting the q-th query sample image into a backbone network to obtain the D-dimensional characteristic expression of the q-th query sample image as follows:

(2) creating an empty matrix M as a memory element, wherein each element in the matrix is a D vector with the same dimension as the feature of the image sample, and the jth element represents M (j) and has a random value in a range of [0,1], that is:

M(j)＝randperm[0,1]

as a preferred embodiment of the present invention, the detailed description of the steps described in step 3 is as follows:

(1) the set of c-th class supporting image sample features is expressed as

Then the formula for computing the initial prototype features of the c-th class training support sample set is:

(2) and sequentially calculating the initial prototype features of the C training sample sets according to the formula.

As a preferred embodiment of the present invention, the detailed description of the steps described in step 4 is as follows:

(1) computing initial prototypes for the c-th class

Obtaining a reading weight vector w by the similarity between the matrix element and the jth matrix element_n(j) The expression is:

(2) initial prototype of the c-th class

Matching the jth element in the memory matrix M, the memory element is updated to complete the write operation W, i.e.:

where W represents a write operation, an average calculation may be employed or substituted.

As a preferred embodiment of the present invention, the detailed description of the steps described in step 5 is as follows:

obtaining and initializing prototypes in memory devices based on read weight vectors

The matched matrix elements are denoted as r_nThen the corrected prototype of category c is the initial prototype

And

is expressed as:

wherein alpha is an adjustable parameter.

As a preferred embodiment of the present invention, the detailed description of the step described in step 6 is as follows:

(1) inputting query image samples into a backbone network f_θOf the feature of (1), qThe D-dimensional features of each query sample are represented as:

(2) qth query sample feature and the c-th correction prototype

The similarity calculation formula is as follows:

where D (-) represents the Euclidean distance function.

(3) Converting the similarity calculation value into an output value by using a Softmax function, wherein the expression is as follows:

(4) probability output value and true tag value

The cross entropy loss function between the two is calculated by the formula:

(5) the iterative formula for optimizing and calculating the parameter theta in the network is as follows:

where β is the learning rate.

As a preferred embodiment of the present invention, the detailed description of the steps described in step 7 is as follows:

(1) backbone network f in fixed element training process_θAnd (4) inputting the test support image sample set and the query image sample set into the backbone network to extract features according to the parameter theta. Inputting the kth test support sample into the backbone network to obtain the D-dimensional characteristic of the kth test support sample as:

(2) the computing formula of the initial prototype feature of the c-th class training support sample set is as follows:

(3) computing initial prototypes for the c-th class

Obtaining a reading weight vector w by the similarity between the matrix element and the jth matrix element_n(j) Determining an initial prototype from the read weights

The matched matrix elements are expressed as

The corrected prototype for the c-th test support sample set is then:

(4) qth query sample feature and the c-th correction prototype

The similarity calculation formula is as follows:

where D (-) represents the Euclidean distance function.

(5) Converting the similarity calculation value into an output value by using a Softmax function, wherein the expression is as follows:

compared with the prior art, the technical scheme adopted by the invention has the following technical effects:

a memory element is added in the prototype network to memorize the typical sample representation in the base class data set, and the prior information in the base class data set can be fully utilized end to correct the prototype, so that the prior information in the base class data set is fully utilized end to improve the representability of the prototype.

Drawings

FIG. 1 is a flow chart of a small sample image classification method based on a memory enhanced prototype network according to the present invention.

Detailed Description

Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings. The embodiments described below with reference to the accompanying drawings are illustrative only for the purpose of explaining the present invention, and are not to be construed as limiting the present invention.

According to the invention, the memory element is added in the prototype network to memorize the typical sample representative in the base class data set, so that the prior information in the base class data set can be fully utilized end to correct the prototype.

Fig. 1 is a flowchart of a small sample image classifier based on a memory enhanced prototype network according to the present invention, which comprises the following specific steps as shown in fig. 1:

step 1: the input image dataset is processed. Assume an input image data set tableShown as I, which is randomly divided into two image subsets, respectively meta-training image data sets I_trainAnd meta test image dataset I_test.. In data set I_trainAnd I_testAnd randomly extracting images of C categories, randomly extracting K image samples as support samples in all the images in each category, randomly extracting Q image samples as query samples in the rest images, and finishing the construction of a C-way-K-shot classification task on a meta-training data set and a meta-test set.

Step 2: and feature extraction of the image sample and initialization of a memory element in the meta-training process. Inputting the training support sample set and the query sample set into the convolutional neural network f with the parameter theta_θExtracting features, and expressing the features of the kth training support sample and the qth query sample image as

And

creating a blank matrix M as a memory element M, and setting each element [0,1] therein]Random numbers within a range.

And step 3: calculation of the initial prototype. Calculating the average value of the sample features of the set supporting the image sample features in each category as an initial prototype, and expressing the initial prototype of the c-th category as

And 4, step 4: the read operation and the write operation of the memory device M. From an initial prototype

The matching with the elements in the memory matrix M completes the read operation and the write operation.

And 5: linear synthesis of the calibration prototype. Obtaining and initializing prototypes in memory devices based on read weight vectors

The matched matrix elements are expressed as

The corrected prototype of the c-th class is the initial prototype

And with

Is calculated as a linear weighted sum of.

And 6: and (4) calculating a training loss function. Query image sample feature calculation using euclidean distance

Similarity scores with each type of corrected prototype, and converting the similarity scores into probability output values belonging to the c-th support category by utilizing a softmax function

Calculating a probability output value and a true tag value

And 7: the meta test procedure is completed. Backbone network f in fixed element training process_θExtracting the characteristics of the test support sample and the query image sample, wherein the characteristics of the kth support sample and the characteristics of the qth query sample are respectively expressed as

And

and (5) correcting the initial prototype in each type of support sample set by utilizing the steps 3 and 4, calculating the similarity between the characteristics of the query sample and each type of sample, and classifying the similarity into the most similar type.

Claims

1. The small sample image classification method based on the memory enhanced prototype network is characterized by comprising the following steps of:

step 1, processing an input image data set, wherein the input image data set is represented as I and is randomly divided into two image subsets which are respectively a meta-training image data set I_trainAnd meta test image dataset I_test.(ii) a In data set I_trainAnd I_testRandomly extracting images of C categories, randomly extracting K image samples as support samples in all the images in each category, randomly extracting Q image samples as query samples in the rest images, and finishing the construction of a C-way-K-shot classification task on a meta-training data set and a meta-test set;

step 2, extracting the characteristics of the image sample and initializing a memory element in the meta-training process; inputting the training support sample set and the query sample set into the convolutional neural network f with the parameter theta_θExtracting features, and expressing the features of the kth training support sample and the qth query sample image as

And

creating a blank matrix as memory element M, and setting each element [0,1] therein]A random number within a range;

step 3, calculating an initial prototype; calculating the average value of the sample features of the set supporting the image sample features in each category as an initial prototype, and expressing the initial prototype of the c-th category as

Step 4, performing the read operation and the write operation of the memory element M; from an initial prototype

Matching with elements in the memory matrix M to complete read operation and write operation;

step 5, linear synthesis of a correction prototype; obtaining and initializing prototypes in memory devices based on read weight vectors

The matched matrix elements are expressed as

The corrected prototype of the c-th class is the initial prototype

And

a linear weighted sum of;

step 6, calculating a training loss function; query image sample feature calculation using euclidean distance

Calculating a probability output value and a true tag value

A cross entropy loss function between the two functions is used for optimizing a parameter theta in the backbone network by using a gradient descent algorithm to complete element training;

step 7, completing the meta-test process; backbone network f in fixed element training process_θExtracting the characteristics of the test support sample and the query image sample, wherein the characteristics of the kth support sample and the characteristics of the qth query sample are respectively expressed as

And

2. The method for classifying small sample images based on a memory enhanced prototype network according to claim 1, wherein the detailed description of step 1 is as follows:

(1) the input image data set is denoted as I and is randomly divided into two image subsets, which are respectively meta-training image data sets I_trainAnd meta test image dataset I_test.；

(2) In data set I_trainRandomly extracting images of C categories, randomly extracting K image samples as support samples in all the images in each category, randomly extracting Q image samples as query samples in the rest images, and completing the construction of a C-way-K-shot classification task on a meta-training data set; the set of the c-th class-supported image samples is represented as

Wherein

Representing the k-th supported sample image,

a category label representing the image; the set of query image samples of the c-th category is represented as

Wherein

Representing the q-th query sample image,

a category label representing the image;

(3) in data set I_testRandomly extracting images of C categories, randomly extracting K image samples as support samples in all the images in each category, randomly extracting Q image samples as query samples in the rest images, and completing the construction of a C-way-K-shot classification task on a meta-test data set; the set of c-th class-supported image samples is denoted as

Wherein

Representing the k-th supported sample image,

Wherein

Representing the q-th query sample image.

3. The method for classifying small sample images based on a memory enhanced prototype network as claimed in claim 1, wherein the detailed description of step 2 is as follows:

(1) convolutional neural network f with parameter theta as trunk network in model_θThen, the kth training support sample is input into the backbone network to obtain a D-dimensional feature expression of the kth training support sample as follows:

M(j)＝randperm[0,1]。

4. the method for classifying small sample images based on a memory enhanced prototype network as claimed in claim 1, wherein step 3 is described in detail as follows:

(1) the set of c-th class supporting image sample features is expressed as

Then the computing formula of the initial prototype feature of the c-th class training support sample set is:

5. The method for classifying small sample images based on a memory enhanced prototype network as claimed in claim 1, wherein step 4 is described in detail as follows:

(1) computing initial prototypes for the c-th class

(2) initial prototype of the c-th class

M_t+1(i)＝W(p_n0，M_t(i))

where W represents a write operation, an average calculation may be employed or instead.

6. The method for classifying small sample images based on a memory enhanced prototype network as claimed in claim 1, wherein the detailed description of step 5 is as follows:

And

is expressed as:

wherein a is an adjustable parameter.

7. The method for classifying small sample images based on a memory enhanced prototype network according to claim 1, wherein step 6 is described in detail as follows:

(1) inputting query image samples into a backbone network f_θThe D-dimensional features of the qth query sample are represented as:

(2) qth query sample feature and the c-th correction prototype

The similarity calculation formula is as follows:

the expression represents a Euclidean distance function;

(4) probability output value and true tag value

The cross entropy loss function between the two is calculated as:

where β is the learning rate.

8. The method for classifying small sample images based on a memory enhanced prototype network as claimed in claim 1, wherein step 7 is described in detail as follows:

(1) backbone network f in fixed element training process_θInputting the test support image sample set and the query image sample set into a backbone network to extract features according to the parameter theta; inputting the kth test support sample into the backbone network to obtain the D-dimensional characteristic expression of the kth test support sample as follows:

(3) computing initial prototypes for the c-th class

The matched matrix elements are expressed as

The corrected prototype for the c-th test support sample set is then:

wherein a is an adjustable parameter.

(4) Characteristics of q query sample and c correction prototype

The similarity calculation formula is as follows:

wherein D (-) represents a Euclidean distance function;