CN111858991A

CN111858991A - Small sample learning algorithm based on covariance measurement

Info

Publication number: CN111858991A
Application number: CN202010783893.7A
Authority: CN
Inventors: 李文斌; 陈思远; 霍静; 高阳; 徐婧林; 王雷; 罗杰波
Original assignee: Jiangsu Wanwei Aisi Network Intelligent Industry Innovation Center Co ltd; Nanjing University
Current assignee: Jiangsu Wanwei Aisi Network Intelligent Industry Innovation Center Co ltd; Nanjing University
Priority date: 2020-08-06
Filing date: 2020-08-06
Publication date: 2020-10-30

Abstract

The invention discloses a small sample learning algorithm based on covariance measurement, and belongs to the field of computer vision. The small sample learning algorithm based on the covariance measure comprises the following steps: (1) introducing an interpolation training mechanism to learn migratable knowledge, (2) designing a local covariance representation, then embedding the local covariance representation into a deep network to learn and express each concept, and (3) constructing a covariance measurement layer based on the local covariance representation to measure distribution consistency between a query sample and the concept. The invention provides a novel and simple end-to-end covariance measurement network CovaMNet, a second-order local covariance representation is designed to replace the traditional first-order conceptual representation, and a new covariance measurement function is provided. The results of comparison experiments on a plurality of reference data sets are analyzed, so that the CovaMNet framework provided by the invention has competitive effects on a general small sample classification task and a fine-grained small sample classification task.

Description

Small sample learning algorithm based on covariance measurement

Technical Field

The invention relates to a small sample learning algorithm based on covariance measurement, and belongs to the field of computer vision.

Background

Human beings can learn new concepts from few examples and have strong generalization ability which the machine learning algorithm does not have at present, namely, human beings can learn a new concept from one or a few examples, but the machine learning standard algorithm needs more examples to reach the same ability marginally. At present, machine learning algorithms depend too much on labeled data, while in practical application, the cost of data labeling is often high, and how to learn by using a limited small amount of labeled data and make a model have high generalization capability becomes an important subject.

In response to the above problems, it is necessary to utilize more a priori knowledge to assist in characterization and learning.

The currently available research schemes, as shown in fig. 1, can be roughly divided into the following three categories according to the departure point and motivation:

(1) small sample learning based on data expansion;

(2) small sample learning based on meta-learning;

(3) metric-based small sample learning.

The starting point of the small sample learning based on data expansion is as follows: new data is generated to augment the training data. When the data is enough, the traditional machine learning algorithm can be directly adopted to solve the recognition task, and the currently proposed work is mainly based on auto-encoders or generation countermeasure networks (GANs). For example, Antoniou et al propose a data augmentation countermeasure GAN (DAGAN) to generate new samples that are used to help improve the performance of small sample learning, especially single sample learning tasks. Bharath Hariharan et al propose a non-parametric method to learn the mapping analogy relationship between samples by constructing a quadruple in the training data to generate new samples. Elischwartz et al propose a delta-encoder model that learns the migratable deformation (transferable deformations) between pairs of samples by using an encoder, and applies this deformation to other samples to synthesize new samples, and finally train the final classifier using the synthesized samples. Then Yu-Xiong Wang et al propose a one-stage method, which integrates data expansion and final classification tasks, and combines two processes of generating new samples by a generator and expanding a sample learning classifier for learning. Yongqin Xian et al propose a unified feature generation framework, which designs a generation model by combining the respective advantages of a variational auto-encoder (VAE) and a generation countermeasure network, and performs data augmentation in a feature space. Zitian Chen et al also synthesize new samples in the feature space, using a novel auto-encoder to directly synthesize instance features. Bo Liu et al propose a feature migration network FATTEN to model the feature trajectory space caused by the change in the target pose, so that on one trajectory line in the manifold space, the input features and the migrated features can be mapped to the same point in the appearance space.

The small sample learning based on the meta learning is to use the thought of the meta learning and to help the base learner to learn from the task distribution angle. The difference between meta-learning and traditional base learning is that meta-learning mainly learns how to dynamically select the correct bias, whereas the bias in base learning is a priori fixed or user defined. As shown in fig. 2, base learning can only learn a single fixed hypothesis for a particular task data set, while meta learning can learn different hypotheses for different guided base learners of a task using a meta learner. Related work has been proposed in recent years. For example, a meta-learning neural network (MANN) with memory extension solves the problem of how to quickly encode important information of a new task by introducing an additional memory module. Meta-learning long-short term memory networks (Meta-LSTM) learns how to initialize a model and how to update a new task quickly. Model-independent meta-learning (MAML) attempts to learn a good initialization for model learning, so that a model can achieve good performance by only updating a few steps of gradient in the face of a new task. Qianru Sun et al propose a meta-learning migration learning Method (MTL) by pre-training a feature extractor on the auxiliary set, then fine-tuning to get the base learner by combining a small amount of training data of the new task, and then enabling the base learner to adapt to the new task by learning the scaling and offset parameters. Task independent learning (TAML) proposes an entropy-based approach to learn the initial model with the greatest uncertainty. Task-aware feature embedding networks (TAFE-Nets for short) learn how to adapt the picture representations to new tasks through a meta-learning paradigm.

Metric-based small sample learning leverages the idea of metric learning to learn a feature-embedded network with migration capability by exploiting metric learning penalties, as shown in fig. 3. Such methods are simple and effective, and in recent years, a number of related works and methods have been proposed. In 2015, Gregory Koch et al learned a twin network (Simense network) from source data (auxiliary set) for the first time using metric loss, and then multiplexed the feature representation learned by the twin network to solve the target small sample classification task. Oriol Vinyals et al propose matching networks (matching networks) to directly compare the similarity between the query pictures and the support set pictures for classification, and propose an episode training mechanism. Jake Snell et al propose a prototype network (prototypical network), which takes the mean vector as a prototype to characterize each category in the support set, and then finds the nearest prototype (category) corresponding to each query picture for classification. Flood Sung et al propose a relationship network to learn a depth metric function instead of the fixed metric function (e.g., euclidean distance or cosine similarity) used in the conventional classification of small samples. Kelsey r.allen et al propose to use infinite mixture prototypes (infinite mixture prototypes) to characterize each class on the basis of a prototype network in which each class is characterized by only a single prototype, and it is difficult to sufficiently express a complex distribution of classes. Other work includes using methods of target detection to help solve small sample problems, using graph models to solve small sample problems, using dense classification networks to solve small sample problems, etc.

Although the starting point of the above methods is different, the three methods are all from the perspective of transfer learning, and an additional auxiliary data set is utilized to help solve the problem of small sample learning. The present invention focuses mainly on the third category of methods, namely metric-based small sample learning methods.

Disclosure of Invention

The purpose of the invention is as follows: a small sample learning algorithm based on covariance measurement is provided to solve the above problems in the prior art.

The technical scheme is as follows: a small sample learning algorithm based on covariance measurement specifically comprises the following steps:

step (1): dividing a data set adopted by an experiment to obtain a training set, a verification set and a test set;

step (2): scaling the pictures in the data set to ensure that the resolution is 84 x 84;

and (3): constructing a classification task on a data set, sampling an episode training mechanism, and respectively and randomly constructing episodes in a training stage and a testing stage;

and (4): embedding the local covariance representation into a deep network to learn the feature representation of each concept, and extracting a deep local descriptor feature for the query image;

and (5): on the basis of the local covariance representation, calculating distribution consistency between the query image and each category by using a covariance measurement layer;

and (6): and fusing the two modules into a framework based on the local covariance representation and the covariance measurement layer, and performing end-to-end training.

In a further embodiment, in the data set partitioning process of step (1), there are 5 data sets involved, namely 2 small sample image classification data sets miniImageNet and tiededImageNet, and 3 fine-grained benchmark data sets Stanford logs, Stanford Cars and CUB circles; dividing each data set, taking miniImageNet and Standard logs as examples, using 64 classes of 100 classes of the miniImageNet data set as training sets, 16 classes as verification sets and 20 classes as test sets; 120 classes in the Standford logs dataset are used as a training set (auxiliary set), 20 classes are used as a verification set, and 30 classes are used as a test set.

In a further embodiment, in the episode construction process in the step (3), in the training phase, each constructed episode includes a support set and a query set; a 5-way 1-shot classification task, wherein 5 categories each contain 1 support picture and 15 query pictures; the 5-way5-shot classification task is characterized in that each of 5 categories comprises 5 support pictures and 15 query pictures, and the number of the built episodes in the training stage is different from that in the testing stage.

In a further embodiment, the local covariance representation of step (4) differs from the covariance matrices widely used in other tasks in that a sample-based covariance matrix is first defined

The calculation process is shown as formula (1), and richer deep local descriptor feature generations are adoptedRepresenting each picture by replacing global features, and calculating local descriptors of all pictures in a certain category together to obtain local covariance representation of the category

The calculation process is shown as formula (2):

wherein the content of the first and second substances,

represents the mean of the K samples and represents the average of the K samples,

representing a matrix of mean vectors.

In a further embodiment, the covariance measurement layer of step (5) is used to measure the consistency relationship between the samples and the classes; representing pictures in a query set using depth locality descriptors (M d-dimensional depth locality descriptors)

By the covariance measurement formula f (x, sigma) x^TSigma x to calculate the relation between the sample and the class, wherein

On behalf of the sample of the query,

representing a covariance matrix representation of a particular class, computing X and

the local covariance measure between them, the calculation process is shown in equation (3):

wherein the content of the first and second substances,

contains M local similarity values between the query picture and a specific category, and diag (circle) represents a column vector formed by elements on the main diagonal of the matrix.

In a further embodiment, the framework of step (6) comprises two key modules: a convolution embedding module and a covariance measurement module, wherein the convolution embedding module comprises four volume blocks (Conv64F for short), wherein each volume block is composed of a convolution layer, a Batch normalization layer (Batch normalization layer) and a leakage linear rectification layer (leakage ReLU layer); adding a 2 x 2 maximum pooling layer (max-Poolingclayer) after the first two volume blocks respectively, calculating the query picture and C categories to obtain all local similarity values, connecting the local similarity values in series, then realizing the mapping operation by adopting one-dimensional convolution with the step length of M, and finally calculating the final classification result by using softmax and a cross entropy loss function.

Has the advantages that:

(1) the invention provides a novel and simple end-to-end covariance measurement network CovaMNet;

(2) the invention designs a second-order local covariance representation to replace the traditional first-order conceptual representation;

(3) the invention proposes a new covariance measurement function.

The results of comparison experiments on a plurality of reference data sets are analyzed, so that the CovaMNet framework provided by the invention has competitive effects on a general small sample classification task and a fine-grained small sample classification task.

Drawings

FIG. 1 is a diagram of a small sample learning method partition and corresponding representative method of the present invention.

FIG. 2 is a schematic diagram of meta learning and base learning of the present invention.

FIG. 3 is a schematic diagram of the relationship between the query picture and the category picture in the learning comparison of the present invention.

Fig. 4 is a block diagram of the covannet of the present invention for the 5-way 1-shot classification task.

Fig. 5 is a diagram of a four-layer embedded network Conv64F according to the present invention.

Detailed Description

To more fully illustrate the objects, features and advantages of the present invention, the following detailed description of the invention is given in conjunction with the accompanying drawings and specific examples.

The present invention is directed to learning and understanding new concepts (categories) from one instance or a few instances. Since there is only one or a few samples labeled in each concept, directly using machine learning models (e.g., support vector machine SVM, convolutional neural networks CNNs) easily results in model overfitting. At this point, the data per category is too little to adequately express a category through these fewer samples, and it is critical how to efficiently use the auxiliary data set to aid learning, and how to learn migratable knowledge from the auxiliary data set. Three problems to be solved are mainly faced in the present invention:

(1) how to leverage the auxiliary data set to learn and store migratable knowledge;

(2) how to express each concept (category) robustly;

(3) how to reasonably compute the relationship between the test query sample and the concept.

For the first problem, the meta-learning based thumbnail method (meta-learning based methods) is mainly implemented by using meta-learners, and the metric based thumbnail method mainly uses an episode training mechanism (episodic training mechanism) to capture migratable knowledge. The present invention employs an episode training mechanism to solve this problem.

For the second problem, existing work such as Prototypical Nets directly regard the mean value of each class as a prototype (prototype), and do the representation of the class by the prototype; and GNNs use a graph (graph) to represent categories. The present invention is distinguished from these methods that use first order statistics, which are used as a representation of the class.

For the third problem, most of the current work measures the similarity between query samples and categories using simple euclidean distance or cosine similarity. In the working relationship Net, a deep nonlinear relationship network is adopted for measurement. In the invention, a depth covariance measurement function is newly defined to measure the consistency between the samples and the classes.

The invention provides a small sample learning algorithm based on covariance measurement, and a novel and effective covariance measurement-based framework is designed for the small sample learning, so that three important aspects of problems in small sample learning, namely knowledge migration, concept characterization and relation measurement, are considered. Covannet is a proposed end-to-end method, mainly comprising two key modules: local covariance representation and covariance metrics. The first module is used for learning feature representation of the picture, extracting depth local description sub-features of the query picture and calculating local covariance representation of concept categories; the second module mainly comprises a covariance measurement layer, and on the basis of the first module, the distribution consistency between the query picture and each category is calculated. Further, the two modules may be integrated into a unified framework for end-to-end training. In this way, the feature representation and the similarity measure can complement and contribute to each other.

The invention provides a small sample learning algorithm based on covariance measurement, which specifically comprises the following steps:

and (3): constructing a classification task on a data set, sampling an episode training mechanism, and respectively and randomly constructing an episode (episode) in a training stage and a testing stage;

1. Local covariance representation

Covariance matrices have been widely used in many tasks as region descriptors or general characterizations. Covariance matrices also have many good properties, such as second order statistics and symmetric semi-positivity.

Order to

Represents a data matrix, an

The sample-based covariance matrix

Can be defined as shown in formula (1):

wherein

Is the mean of K samples.

Unlike the covariance representation of a general picture set, in small sample learning, there are only a few pictures (for example, K ═ 1 to 5) per class, and it is difficult to calculate a covariance matrix that is effective and guarantees non-singularity. Furthermore, when there are too few samples in a class, the resulting covariance matrix also has difficulty accurately describing the data distribution for that class.

In order to solve the problems, the invention adopts richer deep local descriptor characteristics to replace global characteristics to represent the pictures, and then the local characteristics of all the pictures in the category are usedThe partial descriptor computes a local covariance representation of the analogy. Given a set of pictures of the c-th category

Wherein D_cContaining K pictures and X_iIt can be expressed as M depth local descriptors with d dimension (M441, d 64 in the experiment). Thus the local covariance representation of the c-th class

Can be defined as shown in formula (2):

wherein

A matrix is represented that consists of mean vectors, all columns of which are the mean vectors of the MK depth local descriptors.

In general, the number of all depth local descriptors in a class is far larger than the feature dimension, and the non-singularity of the class can be ensured when the covariance matrix of the class is calculated. Computing the covariance matrix using the depth local descriptors can also capture important local detail information for each class. Meanwhile, the local covariance matrix is embedded in the deep neural network, and can be continuously iteratively learned along with network updating. Furthermore, the method does not need to define the number of samples (shot) of each class, which means that in the training and testing phase, the classes with different numbers of samples can be represented by covariance.

2. Covariance measure

To measure the relationship between the samples and the classes, a covariance metric function is defined as shown in equation (3):

f(x，∑)＝x^T∑x#(3)

wherein the content of the first and second substances,

on behalf of the sample of the query,

representing a covariance matrix representation of a particular class. According to theory 1, when the direction of the sample x is consistent with the direction of the eigenvectors corresponding to the k eigenvalues before Σ, the value of f (x, Σ) is maximized.

Hypothesis of theory 1

Is represented by a covariance matrix corresponding to a specific class in the support set S and satisfies the condition of sigma-V lambda V^TWherein a diagonal matrix

Contains d eigenvalues arranged in descending order, and the corresponding eigenvectors can represent orthogonal matrix

For any non-zero sample

When the direction of x is consistent with the direction of the eigenvector corresponding to k eigenvalues before the sigma, x^TThe value of Σ x is largest.

Likewise, each query picture in the query set Q is represented by a depth local descriptor (M d-dimensional depth local descriptors)

Further X and

the local covariance measure between can be formalized as shown in equation (4):

wherein

Representing M local similarity values between the query picture and the category,

the local covariance representation of the representative class, diag (·) represents the column vector of the elements on the main diagonal of the matrix. Finally, the global similarity value Z may be calculated by linear weighting of the M local similarity values, i.e., Z ═ ω^Tz, where ω is a weight vector.

The covannet structure is shown in fig. 4, and mainly consists of two modules: a convolution embedding module and a covariance measurement module. Referring to the previous work, the convolution-embedded module is shown in fig. 5 and is mainly composed of four convolution blocks (Conv64F for short), each of which is composed of a convolution layer (containing 64 filters of size 3 × 3), a batch normalization layer and a linear rectifying layer with leakage. In addition, the invention also adds a 2X 2 maximum pooling layer after the first two volume blocks respectively.

As shown in fig. 4, the present invention provides a small sample learning algorithm based on covariance measure. The method comprises the following steps in a model training phase:

step (1) inputting a query picture to the embedded network to obtain a tensor of hxwxd. This tensor contains M units (M ═ hw), and each unit represents a d-dimensional depth local descriptor;

step (2) inputting support set to embedded network

Wherein D_cCalculating K pictures containing the category c to obtain the c category D_cLocal covariance representation of

Step (3) in the covariance measurement module, calculating the query picture X and each category

Local covariance similarity values z between;

mapping Z into a global similarity value Z through a full connected layer (full connected layer);

and (5) obtaining a final classification result through softmax and cross entropy loss.

To evaluate the performance of the covannet model proposed in the present invention, a comparison was made over five data sets with one reference model and seven leading edge small sample learning models. The data sets consist of two common small sample picture classification data sets miniImageNet and tiededImageNet, and three fine-grained benchmark data sets Standford Dogs, CUB Dogs, while the models consist of benchmark k neighbors (Baseline k-NN), Matching networks (Matching NetsFCE), Meta-Learner LSTM (Meta-Learner), model independent Meta-learning (MAML), prototype networks (ProtopicalNet), relational networks (relationship Net), Graph Neural Networks (GNN), and simple neural attention Learner (SNAIL). In a benchmark method Baseline k-NN, a classification network with 64 classes is trained by using the same embedded convolutional neural network (Conv64F) and three full-connection layers, after the training stage is finished, a test stage directly uses the trained network to extract features, and a k nearest neighbor classifier is used to obtain a final classification result. In other comparative models, experimental setup and results were consistent with the presenter. In some methods, different network architectures are used for the embedded modules. For example, metric-based methods (such as Prototypical Nets) often employ a four-layer convolutional neural network, each convolutional layer typically using Conv 64F; whereas methods based on meta-learning (such as MAML) use the same four-layer convolutional neural network, in order to mitigate the over-fitting phenomenon, Conv32F is used for each convolutional layer. Since the covannet proposed by the present invention belongs to the metric-based method category, for fairness the covannet uses Conv64F as an embedded module. In addition, the GNN results were retrained after replacing the embedded module with Conv64F, and the SNAIL results were trained using the shallow embedded module Conv 32F.

Table 1 shows the results of the comparative experiments on the data set miniImageNet. The second column indicates what kind of embedded module is used, the third column indicates the type of the method, the fourth column indicates whether the method is subjected to model fine tuning, and the last two columns are classification results set by classification tasks 5-way 1-shot and 5-way 5-shot. It can be seen from the table that the Baeline k-NN method performs much worse than the other comparative methods because the method does not employ an episode training mechanism. The remaining seven methods can be divided into two broad categories, meta-learning based methods and metric based methods. Compared with a small sample learning method based on meta-learning, the CovaMNet result is more competitive when a similar network structure is adopted. Compared with a small sample learning method based on measurement, the CovaMNet has better effect under the same experimental setting, the local covariance representation adopted in the CovaMNet belongs to second-order conceptual representation, and the information acquired by the method is more effective than the information acquired by using the first-order conceptual representation. Table 2 shows the results of the comparative experiments on the data set tiered imagenet.

TABLE 1 average accuracy of classification tasks on miniImageNet dataset

TABLE 2 mean accuracy of classification tasks on tiered ImageNet dataset

The experimental setup on the fine-grained dataset and the miniImageNet dataset remained the same. Unlike the general picture classification task, the fine-grained picture classification task is more challenging because the inter-class spacing is small but the intra-class spacing is large in the fine-grained dataset. This problem becomes particularly acute in small sample scenes. Existing work rarely tests on these three fine-grained datasets. The invention formally introduces a fine-grained data set into a small sample learning problem for the first time. Table 3 shows the results of comparative experiments on the dataset Standford logs, Table 4 shows the results of comparative experiments on the dataset Standford Cars, and Table 5 shows the results of comparative experiments on the dataset CUB logs.

TABLE 3 average accuracy of classification tasks on Standford logs dataset

TABLE 4 average accuracy of classification tasks on the Standford Cars dataset

TABLE 5 average accuracy of classification tasks on CUB Birds dataset

Claims

1. A small sample learning algorithm based on covariance measurement specifically comprises the following steps:

step (2): zooming the pictures in the data set to ensure that the resolution reaches an expected value;

2. The small sample learning algorithm based on covariance measure as claimed in claim 1, wherein: in the data set dividing process of the step (1), the related data sets are 5, namely 2 small sample image classification data sets miniImageNet and tieredImageNet, 3 fine-grained benchmark data sets Stanford logs, Stanfordcars and CUB files;

dividing each data set, taking miniImageNet and Standard logs as examples, using 64 classes of 100 classes of the miniImageNet data set as training sets, 16 classes as verification sets and 20 classes as test sets;

120 classes in the Standford logs dataset are used as a training set, 70 classes are used as a verification set, and 30 classes are used as a test set.

3. The small sample learning algorithm based on covariance measure as claimed in claim 1, wherein: in the construction process of the episodes in the step (3), in a training stage, each constructed episode comprises a support set and a query set; a 5-way 1-shot classification task, wherein 5 categories each contain 1 support picture and 15 query pictures; the 5-way5-shot classification task is characterized in that each of 5 categories comprises 5 support pictures and 15 query pictures, and the number of built episodes in the training stage is different from that in the testing stage.

4. The small sample learning algorithm based on covariance measure as claimed in claim 1, wherein: the local covariance representation of step (4) differs from the covariance matrices widely used in other tasks in that a sample-based covariance matrix is first defined

The calculation process is shown as formula (1), each picture is represented by adopting richer deep local descriptor characteristics to replace global characteristics, and the local of all pictures in a certain category is representedThe partial descriptors are calculated together to obtain a local covariance representation of the class

The calculation process is shown as formula (2):

wherein the content of the first and second substances,

representing a matrix of mean vectors.

5. The small sample learning algorithm based on covariance measure as claimed in claim 1, wherein: the covariance measurement layer of the step (5) is used for measuring the consistency relation between the samples and the classes; representing pictures in a query set using depth-locality descriptors

By a covariance measure formula

To calculate the relationship between the sample and the category, wherein

On behalf of the sample of the query,

a covariance matrix representation representing a particular class,calculating the sum of X

wherein the content of the first and second substances,

6. The small sample learning algorithm based on covariance measure as claimed in claim 1, wherein: the framework of step (6) comprises two key modules: a convolution embedding module and a covariance measurement module,

the convolution embedded module comprises four convolution blocks, wherein each convolution block consists of a convolution layer, a batch normalization layer and a linear rectification layer with leakage; adding a 2 multiplied by 2 maximum pooling layer after the first two volume blocks respectively, firstly calculating the query picture and C categories to obtain all local similarity values in series, then adopting one-dimensional convolution with the step length of M to realize the mapping operation, and finally calculating the final classification result by using softmax and a cross entropy loss function.