CN113688878A

CN113688878A - Small sample image classification method based on memory mechanism and graph neural network

Info

Publication number: CN113688878A
Application number: CN202110872087.1A
Authority: CN
Inventors: 张志忠; 谢源; 刘勋承; 田旭东; 马利庄
Original assignee: East China Normal University
Current assignee: East China Normal University
Priority date: 2021-07-30
Filing date: 2021-07-30
Publication date: 2021-11-23
Anticipated expiration: 2041-07-30
Also published as: CN113688878B

Abstract

The invention discloses a small sample image classification method based on a memory mechanism and a graph neural network, which is characterized in that the method helps a small sample model to carry out reasoning prediction by means of well-learned conceptual knowledge, and specifically comprises the following steps: the method comprises three stages of pre-training, meta-training and meta-testing, wherein the pre-training takes a trained feature extractor and a trained classifier as initialization weights of an encoder and a memory bank; the meta-training extracts the characteristics of the support set and the query set samples through an encoder, mines each class of relevant information from a memory base as meta-knowledge, and propagates the similarity between task relevant nodes and the meta-knowledge through a graph neural network; and the meta test obtains a classification result through the task related node and the meta knowledge node. Compared with the prior art, the method has the advantages that the human recognition process is used for reference, the memory map augmentation network based on the information bottleneck is used for helping the model to carry out reasoning and forecasting with the help of the well-learned concept knowledge, the method is simple and convenient, the practicability is high, and the method has certain popularization and application prospects.

Description

Small sample image classification method based on memory mechanism and graph neural network

Technical Field

The invention relates to the technical field of small sample image classification, in particular to a small sample image classification method based on a memory mechanism and a graph neural network.

Background

The success of deep learning comes from a large amount of labeled data, and human beings only need to utilize a small amount of samples, so that the deep learning has good generalization capability, and the difference between the two results in the study of people on small samples. Unlike traditional deep learning scenarios, small sample learning does not aim at classifying unknown samples, but rather adapts quickly to new tasks in very limited labeled data and past knowledge.

Recently, the idea of combining meta-learning and context (episode) training has been utilized to achieve a significant advantage in solving this problem. Intuitively, it is a promising trend to transfer knowledge from a known class (i.e., a known class with enough training samples) to a new class (i.e., a new class with a small number of samples) using a scenario (epicode) sampling strategy, simulating the human learning process. Although meta-learning and epicode training strategies have achieved significant success in small sample learning, most of them ignore a critical issue, namely how the knowledge learned in the past faces new tasks when a scenario (epicode) comes to training.

In the prior art, when an unknown task is faced, the learned concept can not be used for reasoning and forecasting by using a model.

Disclosure of Invention

The invention aims to provide a small sample image classification method based on a memory mechanism and a graph neural network, which aims at overcoming the defects of the prior art, adopts a method of a memory graph augmentation network based on information bottleneck, and utilizes the graph neural network and the memory mechanism to enable the learned concept to help the small sample model to carry out inference prediction when facing unknown classification tasks.

The specific technical scheme for realizing the purpose of the invention is as follows: a small sample image classification method based on a memory mechanism and a graph neural network comprises the following steps:

step 1: pre-training phase

1.1) learning a supervised feature extractor and linear classifier on the whole training set;

1.2) Using the trained feature extractor and classifier as the meta-training stage encoder and memory library, respectively

The pre-training phase helps to extract the generalized feature expressions.

Step 2: meta training phase

2.1) use of commonly applied scenario training strategies with small samples, in particular, consider an N-Way K-shot T-query task, which contains support set samples

And query set samples

Extracting, by an encoder, feature representations of support set samples and query set samples as task-related nodes

2.2) in order to facilitate quick adaptation, the invention holds a memory bank to store the characteristic representation of the support set sample, and uses the intra-class mean value to calculate the central point f of each class in the support set sample_cen∈R^[N，d]It is stored in the memory bank as the same class of prototype point f_p∈R^[N，d]Connecting in series, and displaying the serial characteristic tableShow (f)_cat∈R^[N，2d]The iterative update process, which may be regarded as a special knowledge-distillation technique, refines the semantic Information using the Information Bottleneck principle (IB), in order to ensure that the IB works well, i.e. to avoid task-independent interference, while preserving the semantic tag Information, the semantic Information is constrained by the following equation 1:

maxI(f_p，Y)-βI(f_cen，f_p) (1)；

wherein: i (,) represents mutual information, Y represents a label, and β represents a lagrange coefficient.

The method specifically enforces the following constraint of formula 2 to purify information and further purify the memory bank:

expressing the purified characteristics as f_B∈R^[N，d]Prototype points in the same category as the memory library further optimize the memory library by using an updating mode of momentum, which is specifically shown in the following formula 3:

f_p←λf_p+(1-λ)f_B (3)；

wherein: λ is the momentum coefficient.

2.3) the refined prototype representation is further combined with meta-knowledge mining, in the process, for each class center point, the cosine similarity between the class center point and each prototype point in the memory base is firstly calculated, and k prototype points MK ═ m ═ nearest to the center point are selected₁，m₂，...，m_kSplicing and inputting the k prototype points and the central point into an aggregation network, aggregating the information of the k prototype points, outputting the aggregated information as a meta-knowledge node of the class, further expanding a support set as a pseudo sample of the class, and specifically showing by the following formula 4:

wherein: [.,.]Splicing operation is carried out; f (;. theta.; theta)_agg) Performing a conversion: r^2d→R^dConsisting of a fully-connected layer, parameterised by theta_agg。

In addition, a_jIs m_jAnd f_cen[i]The correlation coefficient (c) is represented by the following formula 5:

wherein: tau is a temperature coefficient; the term "is cosine similarity.

2.4) constructing a fully connected graph G ═ (V, E) by the task related nodes and the meta knowledge nodes, wherein,

each node represents the characteristics of a sample, an edge represents the similarity of two nodes, the two nodes from the same class are 1, otherwise, the two nodes are 0, and the edge connected with the query set is initialized to 0.5 due to unknown sample label information of the query set, specifically according to the following formula 6:

wherein:

the support set after the meta-knowledge nodes are expanded.

2.5) updating the node characteristics and the edge characteristics of each layer of the graph neural network with the enhanced memory, giving the node characteristics and the edge characteristics of the previous layer, updating the node characteristics through a domain aggregation process, recalculating the characteristics of the edges based on the updated node characteristics, and updating the node characteristics according to the following formula 7:

wherein: [.,.]Splicing operation is carried out;

is a memory enhancement module

A layer; f. of_node(.；θ_node) Updating the network for a node parameterized as θ_node。

The update rule of the edge feature is as the following formula 8:

2.6) the probability that each query set node belongs to a certain class, via the updating of the multi-layer enhanced graph neural network, can be calculated as: summing the values of all the supporting set nodes of the same type and the query set node edges, wherein the sum is specifically represented by the following formula 9:

wherein: delta (y)_i＝C_k) Is a function of Crohn's function when y_i＝C_kThe value is 1, otherwise it is 0.

2.7) in the meta-training phase, in order to ensure the accurate prediction of the query set, the optimization target is a minimized binary cross entropy loss function expressed by the following formula 10

Wherein: e.g. of the type_iAnd

respectively representing a predicted query set edge label and a real query set edge label;

is the first

A weight coefficient of a layer; BCE represents a binary cross-entropy penalty. In order to keep the meta-knowledge nodes consistent with the predicted labels, another binary cross-entropy loss function is introduced

To estimate the difference between the true value and the prediction of the meta-knowledge node edge label.

The another binary cross entropy loss function

Is shown in the following formula 11:

wherein: e.g. of the type_iAnd

respectively representing predicted meta-knowledge edge labels and real meta-knowledge edge labels;

is the first

A weight coefficient of a layer; BCE represents a binary cross-entropy penalty.

The final optimization objective is a loss function expressed by the following formula 12

Wherein: α and β are equilibrium coefficients, α is 0.2, and β is 0.01.

And step 3: meta test phase

3.1) the meta-test phase is similar to the meta-training node, but all modules and memory banks are not updated, and the samples of the epicode sampling training strategy are from the test set.

In the step 1, a supervised feature extractor and a supervised linear classifier are learned on the whole training set, and the trained feature extractor and classifier are respectively used as the initialization of an encoder and a memory bank in a meta-training phase, which is helpful for extracting generalized feature expression.

In the step 2, a scene training strategy commonly applied by small samples is used, specifically, an N-Way K-shot T-query task is considered, and the characteristic expressions of a support set sample and a query set sample are extracted through an encoder to be used as nodes related to the task.

In the step 2, in order to facilitate quick adaptation, a memory base is established for storing the feature representation of the support set sample, the center point of each class in the support set sample is calculated by using the intra-class mean value, the center point is connected with the prototype point of the same class stored in the memory base in series, the feature representation after the series connection is input into a full connection layer to reduce dimensionality to purify semantic information, the iterative updating process can be regarded as a special knowledge distillation technology, and then the purified information and the information of the corresponding class of the memory base are further optimized in a momentum updating mode.

In the step 2, in meta-knowledge mining, for each class center point, firstly, the cosine similarity between the class center point and each prototype point in the memory base is calculated, and k prototype points MK ═ m ═ closest to the center point are selected₁，m₂，…，m_kSplicing the k prototype points and the central point, inputting the spliced k prototype points and the central point into a polymerization network, polymerizing the information of the k prototype points, and outputting the information as the meta-knowledge of the classAnd the nodes further expand the support set to serve as a pseudo sample of the category.

And 2, constructing a fully-connected graph by the task related nodes and the meta-knowledge nodes, wherein each node represents the characteristics of a sample, the edge represents the similarity of the two nodes, the two nodes are from the same class and are 1, otherwise, the two nodes are 0, and the edge connected with the query set is initialized to 0.5 because the characteristics of the sample of the query set do not know the label information.

And 2, each layer of the graph neural network with the enhanced memory comprises node feature updating and edge feature updating, the node features and the edge features of the previous layer are given, the node features are updated through a domain aggregation process, and the edge features are recalculated based on the updated node features.

Step 2, each layer of the memory-enhanced graph neural network comprises node feature updating and edge feature updating, and in order to keep the meta-knowledge nodes consistent with the predicted labels, another binary cross entropy loss function is introduced to estimate the difference between the truth value and the prediction of the edge labels of the meta-knowledge nodes.

Compared with the prior art, the method has the advantages that the human recognition process is used for reference, the memory map augmentation network based on the information bottleneck can better help the model to carry out reasoning and prediction with the help of the well-learned conceptual knowledge, the method is simple and convenient, the practicability is strong, and the method has certain popularization and application prospects.

Drawings

FIG. 1 is a flow chart of the present invention;

FIG. 2 is a pre-training flow diagram;

FIG. 3 is a flow chart of memory bank purification.

Detailed Description

The invention comprises three steps: the first step is pre-training, training a supervised feature extractor and a linear classifier on a training set, and taking the trained feature extractor and classifier as the initialization weight of a subsequent encoder and a memory bank; the second step is element training, firstly, the characteristics of the support set and the query set samples are extracted through a coder and are used as task related nodes, in order to enable the model to be quickly adapted to a new task, a memory base is established to store the characteristics of the support set samples, the memory base is optimized in a new updating mode to gradually purify discriminant information, finally, each type of related information is mined from the memory base to be used as element knowledge, and the similarity between the task related nodes and the element knowledge is propagated through a graph neural network; the third step is meta-test, similar to the second step, the classification result is obtained through the task related node and the meta-knowledge node, and the memory base and other modules cannot be updated in the meta-test process.

The present invention will be described in further detail with reference to the following drawings and detailed description.

Example 1

Referring to fig. 1, the invention utilizes a neural network and a memory mechanism to help a small sample model to perform inference prediction by means of learned conceptual knowledge, and the specific operation steps are as follows:

s300: and obtaining a trained encoder and a trained classifier from the pre-training stage, and respectively taking the encoder and the classifier as the initialization of a feature extractor and a memory library of the meta-training stage.

S310: and adopting an epicode sampling strategy, sampling N classes from a training set, taking K samples of each class as a support set, and taking 1 sample of each class from the same N classes as a query set.

S320: the support set and query set sampled in step S310 are input to a feature extractor to obtain a feature representation of each sample.

S330: and (5) making an intra-class mean value of the support set sample characteristics to obtain class center points of the N classes, and dynamically updating the memory base by referring to the steps S200-S240.

And S340, retrieving the K neighbors of each class center point from the memory base by using the class center points obtained in the step S330, calculating the similarity degree of each center point and each K neighbor by using cosine distance, splicing the center points and each K neighbor and inputting the spliced center points and each K neighbor to a network for dimensionality reduction, and finally aggregating the K neighbors by using a weighted summation mode to obtain the meta-knowledge nodes of each class.

S350: and (3) constructing a full-connection graph by using each sample feature obtained in the step (S320) and the meta-knowledge node obtained in the step (S340), taking the sample feature and the meta-knowledge node as the initialization of the neural network node of the memory enhanced graph, wherein the edge represents the similarity of the two nodes, the two nodes are 1 if from the same class, and otherwise, the two nodes are 0, and initializing the edge connected with the query set to be 0.5 because the sample feature of the query set does not know the label information.

S360 to S370: and after the initialization of the step S350 is completed, node feature updating and edge feature updating are carried out, each layer of the graph neural network with enhanced memory comprises the node feature updating and the edge feature updating, the node features and the edge features of the previous layer are given, the node features are updated through a domain aggregation process, and the edge features recalculate the edge features based on the updated node features.

S380: after the node feature and edge feature update in steps S370 to S380, the labels of the query set samples are predicted by using the edge feature, and the probability that each query set node belongs to a certain class can be calculated as: and summing the values of all the supporting set nodes of the same type and the query set node edges.

Referring to fig. 2, the pre-training in the first step is to train a supervised feature extractor and classifier by sampling from a training set, and the specific operation steps are as follows:

s100: randomly sampling a batch in a training set, wherein the batch comprises N samples and labels;

s110: inputting the N samples into a feature extractor to obtain the features of the N samples;

s120: inputting the N sample characteristics into a classifier to obtain the probability distribution of the category of each sample;

s130: and taking the category with the maximum probability as a predicted label, calculating a cross entropy loss function, and performing back propagation to update the weight parameters of the feature extractor and the classifier.

Referring to fig. 2, the second step of refining the memory banks, further refining and updating the memory banks using the current task support set, includes the following steps:

s200: inputting the support set sample into an encoder to obtain sample characteristics, and obtaining the central point of each class in the support set sample by using an intra-class mean value;

s210: finding the prototype point of the corresponding class of the memory library according to the class label of the support set sample;

s220: splicing the prototype point and the central point of each class to obtain a higher-dimensional sample feature;

s230: and inputting the high-dimensional features of each class into an IB layer for dimension reduction, and then updating the memory base of the corresponding class by the output of the IB layer of each class and the prototype jog volume.

The invention has been described in further detail in order to avoid limiting the scope of the invention, and it is intended that all such equivalent embodiments be included within the scope of the following claims.

Claims

1. A small sample image classification method based on a memory mechanism and a graph neural network is characterized in that the graph neural network and the memory mechanism are utilized, and the learned concept knowledge is used for helping the small sample model to carry out reasoning and prediction, and specifically comprises the following steps:

step 1: pre-training

Learning a supervised feature extractor and linear classifier on the entire training set as meta-training stage encoder and memory bank

Initializing the weight value;

step 2: yuan training

Extracting characteristics of a support set sample and a query set sample through an encoder, and using the characteristics as task related nodes, wherein the characteristics of the support set sample are stored by a constructed memory library; the memory base is optimized in an updating mode to gradually purify the discriminant information, and finally, relevant information of each class is mined from the memory base to serve as meta-knowledge, and the similarity between task relevant nodes and the meta-knowledge is transmitted through a graph neural network;

and step 3: meta test

And obtaining a classification result through the task related node and the meta-knowledge node, wherein in the meta-testing process, the memory base and other modules are not updated, and the samples of the epsilon sampling training strategy come from a test set.

2. The method for classifying small sample images based on a memory mechanism and a graph neural network as claimed in claim 1, wherein the step 1 specifically comprises the following steps:

1.1: training a supervised feature extractor and linear classifier on the whole training set;

1.2: using trained feature extractors and linear classifiers as meta-training stage encoders and memory banks, respectively

The initialization weight of (1).

3. The method for classifying small sample images based on a memory mechanism and a graph neural network as claimed in claim 1, wherein the step 2 specifically comprises the following steps:

2.1: using a sample containing a supporting set

And query set samples

The N-WayK-shot T-query task extracts the characteristic representation of the support set sample S and the query set sample Q through the encoder to be used as nodes related to the task

2.2: computing the center point f of each class in the support set sample using the intra-class mean_cen∈R^[N,d]And comparing it with the prototype point f of the same class stored in the memory bank_p∈R^[N,d]Carrying out series connection, and expressing the characteristics after series connection as f_cat∈R^[N,2d]Input to a full link layerFew dimensions are used for purifying semantic information, and the semantic information is constrained and purified by the following formula 1:

maxI+f_p,Y)-βI(f_cen,f_p) (1)；

wherein: i (,) represents mutual information; y represents a label; β represents a lagrangian coefficient;

the memory bank is purified and optimized by the following formula 2:

memory bank after purification

Is characteristic of (f)_D∈R^[N,d]Prototype points of the same class as the memory bank are further optimized for the memory bank by momentum update of equation 3:

f_p←λf_p+(1-λ)f_B (3)；

wherein: λ is the momentum coefficient;

2.3: compute class center point and memory library

The k prototype points MK ═ m closest to the center point are selected for cosine similarity between each prototype point in the set₁,m₂,…,m_kSplicing and inputting k prototype points and a central point into an aggregation network, aggregating the information of the k prototype points by the following formula 4, and outputting the aggregated information as a meta-knowledge node expansion support set of the type as a pseudo sample of the type:

wherein: [.,.]Splicing operation is carried out; f (;. theta.; theta)_agg) Represents the execution of a conversion: theta_aggIs R^2d→R^dMade up of fully-connected layersParameterizing; a is_kIs m represented by the following formula 5_kAnd f_cen[i]Correlation coefficient of (a):

wherein: tau is a temperature coefficient; <., > is cosine similarity;

2.4: and constructing a fully connected graph G (V, E) by the task related nodes and the meta knowledge nodes, wherein,

each node represents the characteristics of a sample, an edge represents the similarity of two nodes, the two nodes are from the same class and are 1, otherwise, the two nodes are 0, and the edge connected with the query set is initialized to 0.5 by the following formula 6:

wherein:

a support set after the meta-knowledge nodes are expanded;

2.5: updating the node characteristics and the edge characteristics of each layer of the graph neural network with the enhanced memory, giving the node characteristics and the edge characteristics of the previous layer, and updating the node characteristics through a domain aggregation process, wherein the node characteristics are updated according to the following formula 7:

wherein: [.,.]Splicing operation is carried out; 1 is the 1 st layer of the memory enhancement module; f. of_node(.；θ_node) Updating the network for the node; theta_nodeParameterizing;

the edge features are recalculated by the following equation 8:

2.6: through the updating of the multi-layer enhanced graph neural network, the probability that each query set node belongs to a certain class can be calculated as the sum of the values of all the support set nodes of the same class and the query set node edges by the following formula 9:

wherein: delta (y)_i＝C_k) Is a function of Crohn's function when y_i＝C_kThe value is 1, otherwise 0;

2.7: the optimization target of the memory bank is a minimized binary cross entropy loss function shown in the following formula 10

Wherein: e and

respectively representing a predicted query set edge label and a real query set edge label; lambda [ alpha ]₁Is the weight coefficient of layer 1; BCE represents a binary cross entropy loss;

2.8: the optimization target for the memory bank is another minimized binary cross entropy loss function shown in the following formula 11

Wherein: e and

respectively representing predicted meta-knowledge edge labels and real meta-knowledge edge labels; lambda [ alpha ]₁Is the weight coefficient of layer 1; BCE represents a binary cross entropy loss;

2.9: the final optimization objective of the memory library is the minimization of loss function shown in the following formula 12

Wherein: α and β are equilibrium coefficients, α is 0.2, and β is 0.01.

4. The method of claim 3, wherein the memory bank is a memory mechanism and graph neural network based small sample image classification method

To aggregate discriminative information while diluting task-independent information and to preserve useful information by associating new information with a memory base

Further purifying memory library for medium and old information momentum updating

5. Root of herbaceous plantThe method for classifying small sample images based on a memory mechanism and a graph neural network as claimed in claim 3 or claim 4, wherein the memory library

And establishing a relation between tasks in the training process, and when facing an unknown task, further assisting the model with the learned knowledge to predict.