CN115294381A

CN115294381A - Small sample image classification method and device based on feature migration and orthogonal prior

Info

Publication number: CN115294381A
Application number: CN202210487137.9A
Authority: CN
Inventors: 李晓旭; 张志敏; 刘俊; 汤卓和; 刘忠源; 张文斌; 曾俊瑀; 马占宇; 陶剑; 董洪飞
Original assignee: Lanzhou University of Technology
Current assignee: Lanzhou University of Technology
Priority date: 2022-05-06
Filing date: 2022-05-06
Publication date: 2022-11-04
Anticipated expiration: 2042-05-06
Also published as: CN115294381B

Abstract

The invention discloses a small sample image classification method and device based on feature migration and orthogonal prior, which are used for researching a small sample classification framework of high-resolution feature extraction on the basis of small sample image classification research based on depth measurement. By introducing feature migration and orthogonal prior small sample image feature learning, assuming a new class and base class shared feature extraction mode and assuming the feature orthogonality of different classes of new class data, an orthogonal feature subspace is learned by constructing an orthogonalized feature adaptation network, so that the features of different classes are orthogonal to each other, and the recognition degree of the features is improved. The method has very important significance for theoretical research of small sample learning and promotion of wide application of machine identification technology. Meanwhile, the advanced technology for breaking through the theoretical bottleneck of small sample learning and mastering artificial intelligence in China will play a role in adding bricks and tiles.

Description

Small sample image classification method and device based on feature migration and orthogonal prior

Technical Field

The invention relates to the technical field of image classification, in particular to a small sample image classification method and device based on feature migration and orthogonal prior.

Background

In recent years, with the development of deep learning, the recognition performance of machines has surpassed that of humans on many large sample image classification tasks. However, when the sample size is small, the recognition level of the machine is still far from that of human. Therefore, image Classification of a small number of training samples, especially small sample Image Classification (Few-shot Image Classification) with only one or a Few labeled samples per class, has received much attention from researchers in recent two years.

The small sample Classification (Few-shot Classification) belongs to the category of small sample Learning (Few-shot Learning), and usually includes two types of data with disjoint class spaces, namely, base class data and new class data. The small sample classification aims to learn classification rules by using knowledge learned by base class data and a small number of labeled samples (support samples) of new class data, and accurately predicts the classes of unlabeled samples (query samples) in a new class task, and the framework of the small sample classification is shown in fig. 1.

The classification of small sample images is a research problem to be solved urgently in the field of computer vision and artificial intelligence at present. The existing and successful large-sample image classification method depends heavily on the number of samples, and the sample size of objects in the real world is subject to long-tail distribution, i.e. the sample size of a large number of objects is seriously insufficient, for example, in the fields of military affairs, medical treatment, industry, astronomy and the like, the sample collection needs to consume a large amount of manpower, material resources, time and economic cost, and the collection of large-scale image samples is difficult. Therefore, the research for classifying small sample images has important value for the wide application of image classification technology.

In the prior art, a classification method based on depth measurement mainly judges a class by comparing distances between samples or between a sample and a class prototype. Technologies such as data enhancement and transfer learning are often combined to make up for the defects of insufficient data quantity and easiness in overfitting of a model, and good classification performance is obtained on a plurality of small sample classification tasks. However, compared with the classification of large sample images, the performance of the existing classification of small sample images is still unsatisfactory, which limits the practicability of the small sample image classification technology to a great extent, and still faces some problems to be solved urgently: and (5) learning features with high identification degree. For classifying large sample images, the existing deep learning technology can learn the image features with high identification degree by increasing model elasticity and sample size. However, for small sample classification tasks where the labeled samples are rare, the existing deep learning techniques are not applicable. Therefore, it is a problem to be solved how to learn a feature representation with high recognition based on base class data and new class data with few labeled samples.

Disclosure of Invention

The invention provides a small sample image classification method and device based on feature migration and orthogonal prior aiming at the technical problem of high-resolution feature learning in small sample image classification.

In order to achieve the above purpose, the invention provides the following technical scheme:

the invention firstly provides a small sample image classification method based on feature migration and orthogonal prior, which comprises the following steps:

s1, data preparation, namely pre-training an image to obtain an embedded module f _θ Extracting the characteristics of an image, wherein the image comprises a training set and a test set;

s2, introducing an orthogonal prior thought into the convolutional neural network model, and constructing a feature learning network model based on feature migration and orthogonal prior;

s3, learning a network model objective function based on the training optimization orthogonal prior feature;

and S4, classifying the images in the test set by using the optimized image orthogonal prior feature learning network model.

Further, step S1 includes:

s11, data are processed

Is divided into

And

two parts, and the two parts are in the same category space, and D is the same as D _train As base class data training model, D _test Testing the model as new data;

s12, for the C-way K-shot classification task, from D _train Randomly selecting C categories, randomly selecting M samples in each category, wherein K samples are used as support samples S _i And the rest M-K samples are used as query samples Q _i ，S _i And Q _i Form a task T _i Same for D _test Has a task

S13, a first training stage: pre-training embedded module f by base class data _θ ，f _θ The method comprises the following steps of (1) containing 4 convolution blocks, wherein each convolution block contains a convolution layer, a pooling layer and a nonlinear activation function; the window size of convolution kernel used by each convolution block is 3x3, one batch normalization, three channels of RGB, one pooling layer, 2 x 2 maximal poolingLayer, the maximum pooling layer of the last two blocks is cut, and a nonlinear activation layer, the activation function of which adopts ReLu.

Further, in the step S2, in the feature learning network model based on feature migration and orthogonal prior, the orthogonalized feature adaptive network consists of three parts: embedded module f _θ Orthogonal adaptation module

And a metrology module; quadrature adaptation module

The method is composed of two layers of convolution layers, the size of a convolution kernel is 5 multiplied by 5, and the method is used for transforming the new sample characteristics and learning an orthogonalized characteristic subspace.

Further, step S3 includes:

s31, in the second stage of training, a classification task is carried out on the new data, and all the support samples are input into the embedded module f with fixed parameters _θ In the method, corresponding support sample characteristics f are obtained _θ (S _ck )；

S32, feature transformation is carried out by utilizing an orthogonal adaptation module to obtain

S33, the transformed features correspond to the masks M of each class _c Multiplying to enable the features between different classes to be orthogonal pairwise;

s34, calculating cosine distance C (P) between similar features by utilizing a measurement module _ci ,P _cj )(i∈[0,K),i≠j)；

S35, optimizing the orthogonal adaptation module by utilizing the mean square error loss function

Further, step S33 calculates the formula as:

wherein S is _ck For the kth supported sample of class c,

representing multiplication of corresponding elements of a matrix of the same order, M _cijh Is a mask M of class c _c Value of ith row and jth column of h channel, M _c The elements of (a) are constituted as follows:

c is the total category number under the current task, H is the number of characteristic channels, and H is an integral multiple of C; in the above formula, when h is within a given range (the range of h starts from 0), M _cijh Equal to 1 and the values of the remaining positions are 0.

Further, step S34 calculates the formula as follows:

wherein, C (P) _ci ,P _cj ) For calculating cosine distance between the same classes, K is the number of supporting samples, c is the c-th class, P _ci Represents the ith supported sample feature in class c, P _cj Represents the jth supported sample feature in class c,

multiplication of corresponding elements of the representative matrix, | | P _ci I represents solving matrix P _ci The two norms of (a).

Further, the mean square error loss function calculation formula of step S35 is as follows:

wherein N is under the current taskTotal number of classes, C (P) _ci ，P _cj ) To calculate the cosine distances between classes, where MSE [ cos (P) ] _ci ，P _cj )，1]＝[cos(P _ci ，P _cj )-1] ² ；

After loss of the support sample is calculated, gradient reduction is carried out, and a mini-batch and Adam optimizer are adopted to update the orthogonal adaptation module

Training is repeated for multiple tasks until the network converges.

Further, the Adam adaptive optimization algorithm comprises the following specific steps:

initializing the data: v. of _dW ＝0，S _dW ＝0，v _db ＝0，S _db =0, which represent biased first and second moment estimates, respectively, dW, db representing the differential of W and b, respectively;

calculating a Momentum exponential weighted average:

v _dW ＝β ₁ v _dW +(1-β ₁ )dW (5)

v _db ＝β ₁ v _db +(1-β ₁ )db (6)

calculating an exponentially weighted average of the gradient differential squares of the RMSprop algorithm formula:

S _dW ＝β ₂ S _dW +(1-β ₂ )(dW) ² (7)

S _db ＝β ₂ S _db +(1-β ₂ )(db) ² (8)

calculating the deviation correction of Momentum and RMSprop algorithms:

deviation correction of Momentum algorithm:

deviation correction of RMSprop algorithm:

gradient descent is carried out, and the weight is updated:

in equations (5) - (14), t represents the t-th iteration, α represents the learning rate, which controls the update rate of the weights, ε represents a very small constant, β ₁ ,β ₂ Respectively representing the exponential decay rates of the first and second moment estimates,

representing the first and second moment estimates after bias correction.

Further, step S4 includes:

s41, test procedure, each task

From the supporting set

And query set

Composition, query set of test set

Input to an embedding module f _θ Orthogonal adaptation module after fine tuning

In (1) obtaining characteristics

S42, respectively matching the characteristics output by the orthogonal adaptation module with the masks M corresponding to different classes _c Multiplication, the concrete operation is shown as formula (1):

wherein the content of the first and second substances,

for the k-th query sample,

representing multiplication of corresponding elements of a matrix of the same order, M _cijh Is a mask M of class c _c Value of ith row and jth column of h channel, M _c The elements of (a) are as follows:

c is the total category number under the current task, H is the number of characteristic channels, and H is an integral multiple of C; in the above formula, when h is within a given range (the range of h starts from 0), M _cijh Equal to 1, the values of the remaining positions being 0;

s43, sending the product into a measurement module, and calculating a query sample

Cosine distances from all supported samples;

and S44, taking the support sample class with the closest distance as the prediction class of the query sample.

On the other hand, the invention also provides a small sample image classification device based on feature migration and orthogonal prior, which is used for realizing the method and comprises the following functional modules:

a pre-training module for pre-training the image to obtain an embedded module f _θ Extracting the characteristics of an image, wherein the image comprises a training set and a test set;

the processing module introduces the idea of orthogonal prior and constructs a feature learning network model based on feature migration and orthogonal prior;

the calculation module is used for solving model parameters based on a training optimization orthogonal prior feature learning network model objective function;

and the classification module is used for classifying the images of the test set by utilizing the optimized image orthogonal prior feature learning network model.

Compared with the prior art, the invention has the beneficial effects that:

the invention discloses a small sample image classification method and device based on feature migration and orthogonal prior, which are based on Deep Convolutional Neural Networks (DCNN), and are used for researching a small sample classification framework of high-resolution feature extraction on the basis of small sample image classification research based on depth measurement. The method comprises the steps of learning an orthogonal feature subspace by introducing feature migration and orthogonal prior small sample image feature learning, assuming a new class and base class shared feature extraction mode and assuming feature orthogonality of new class data among different classes, wherein mutual correlation does not exist, and establishing an orthogonalized feature adaptation network to enable different classes of features to be mutually orthogonal, so that different classes are easily distinguished, and the identification degree of the features is improved. The method has very important significance for theoretical research of small sample learning and promotion of wide application of machine identification technology. Meanwhile, the advanced technology for breaking through the theoretical bottleneck of small sample learning and mastering artificial intelligence in China firstly plays a role in adding bricks and tiles.

Drawings

In order to more clearly illustrate the embodiments of the present application or technical solutions in the prior art, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments described in the present invention, and other drawings can be obtained by those skilled in the art according to the drawings.

FIG. 1 is a small-sample Classification (Few-shot Classification) framework.

Fig. 2 is a flowchart of a small sample image classification method and apparatus based on feature migration and orthogonal prior provided by an embodiment of the present invention.

FIG. 3 shows an embedded module f according to an embodiment of the present invention _θ Structure diagram.

Fig. 4 is a small sample image feature learning network diagram introducing feature migration and orthogonal prior provided by the embodiment of the present invention.

FIG. 5 is a block diagram of an orthogonal adaptation module according to an embodiment of the present invention

The model structure diagram of (1).

Fig. 6 is a schematic diagram of a functional module of a small sample image classification device based on feature migration and orthogonal prior provided by an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention are clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. The embodiments of the present invention, and all other embodiments obtained by those skilled in the art without making any creative efforts, belong to the protection scope of the present invention.

The invention also provides a small sample image classification method based on feature migration and orthogonal prior, and the flow is shown in figure 2 and comprises the following steps.

The method comprises the following steps:

s1, preparing data, and pre-training an image to obtain an embedded module f _θ Extracting features of an image, imageComprises a training set and a testing set;

specifically, step S1 includes:

s11, data are processed

Is divided into

And

two parts, and the two parts have mutually exclusive class spaces, and the D is converted into the D _train As base class data training model, D _test Testing the model as new data;

s12, for the C-way K-shot classification task, selecting D _train Randomly selecting C categories, randomly selecting M samples in each category, wherein K samples are used as support samples S _i And the rest M-K samples are used as query samples Q _i ，S _i And Q _i Form a task T _i Same for D _test Has a task

S13, a first training stage: pre-training embedded module f by base class data _θ ，f _θ The method comprises the steps of containing 4 convolution blocks, wherein each convolution block contains a convolution layer, a pooling layer and a nonlinear activation function; the window size of a convolution kernel used by each convolution block is 3 multiplied by 3, a batch normalization, RGB three channels, a pooling layer, a 2 multiplied by 2 maximum pooling layer, the maximum pooling layers of the last two blocks are cut, and a nonlinear activation layer is used as an activation function of the nonlinear activation layer. For example, for an 84 × 84 × 3RGB image, a 3 × 3 convolution kernel with 64 filters is used per block. Each block is composed of 1 convolution, 1 ReLu, one pooling, as shown in fig. 3. The pre-trained embedded modules can be multiplexed according to different scenes.

S2, introducing an orthogonal prior thought into the convolutional neural network model, and constructing a feature learning network model based on feature migration and orthogonal prior; as shown in fig. 4.

Specifically, in the step S2, in the feature learning network model based on feature migration and orthogonal prior, the orthogonalized feature adaptive network consists of three parts: embedded module f _θ Orthogonal adaptation module

And a metrology module; quadrature adaptation module

The feature subspace is composed of two convolutional layers, the convolutional kernel size is 5 × 5, and is used for transforming and learning the new sample features, and is orthogonalized as shown in fig. 5.

S3, learning a network model objective function based on training optimization orthogonal prior characteristics;

specifically, step S3 includes:

S33, the transformed characteristic corresponds to a Mask (Mask) M of each class _c Multiplying to enable the features between different classes to be orthogonal pairwise;

step S33 is calculated as:

wherein S is _ck For the kth supported sample of class c,

representing the same orderMultiplication of corresponding elements of the matrix, M _cijh Is a mask M of class c _c In the ith row and jth column, the value of the h channel, M _c The elements of (a) are constituted as follows:

c is the total category number of the current task, H is the number of the characteristic channels, and H is an integral multiple of C; in the above formula, when h is within a given range (the range of h starts from 0), M _cijh Equal to 1 and the values of the remaining positions are 0.

S34, calculating cosine distance C (P) between similar features by utilizing a measurement module _ci ,P _cj ) (i belongs to [0, K), i is not equal to j), and the cosine distance of the features among the same classes under a plurality of classes can be obtained;

step S34 is calculated as follows:

wherein, C (P) _ci ,P _cj ) For calculating the cosine distance between the same classes, K is the number of the supporting samples, c is the c-th class, P _ci Represents the ith supported sample feature in class c, P _cj Represents the jth supported sample feature in class c,

multiplication of corresponding elements representing the matrix, | | P _ci I is expressed by solving matrix P _ci The two norms of (a).

Step S35 includes:

the formula is calculated using the mean square error loss function as follows:

wherein N is the total category number under the current task, C (P) _ci ,P _cj ) To calculate the cosine distances between classes, where MSE [ cos (P) ] _ci ，P _cj )，l]＝[cos(P _ci ，P _cj )-1] ² ；

After the loss of the support sample is calculated, gradient descent is carried out, and a mini-batch and Adam optimizer is adopted to update the orthogonal adaptation module

Training is repeated for multiple tasks until the network converges.

The Adam adaptive optimization algorithm comprises the following specific steps:

calculate Momentum exponential weighted average:

v _dW ＝β ₁ v _dW +(1-β ₁ )dW (5)

v _db ＝β ₁ v _db +(1-β ₁ )db (6)

S _dW ＝β ₂ S _dW +(1-β ₂ )(dW) ² (7)

S _db ＝β ₂ S _db +(1-β ₂ )(db) ² (8)

calculating the deviation correction of Momentum and RMSprop algorithms:

and (3) deviation correction of the Momentum algorithm:

deviation correction of RMSprop algorithm:

gradient descent is carried out, and the weight is updated:

representing the first and second moment estimates after bias correction.

And S4, classifying the images of the test set by using the optimized image orthogonal prior feature learning network model.

Step S4 comprises the following steps:

s41, testing process, each task

By supporting set

And query set

Composition, query set of test set

Input to an embedding module f _θ Orthogonal adaptation module after fine adjustment

In order to obtain characteristics

S42, respectively matching the characteristics output by the orthogonal adaptation module with the masks M corresponding to different classes _c Multiplication, the specific operation is shown in formula (1):

wherein the content of the first and second substances,

for the sample of the k-th query,

representing multiplication of corresponding elements of a matrix of the same order, M _cijh As class c mask M _c In the ith row and jth column, the value of the h channel, M _c The elements of (a) are constituted as follows:

s43, mixingThe product is sent to a measurement module to calculate a query sample

Cosine distances from all supported samples; in the training stage, the measurement module calculates the cosine distance between the same type of features without calculating different types, and the measurement module in the same testing stage has different use methods;

and S44, taking the support sample type with the closest distance as the prediction type of the query sample. Different from the traditional training, the model is finely adjusted by using the support sample under the new class, and the query sample is directly tested after the optimization is completed.

On the other hand, the invention also provides a small sample image classification device based on feature migration and orthogonal prior, which is used for implementing the method, and as shown in fig. 6, the device comprises the following functional modules:

The invention learns the orthogonal feature subspace by introducing the feature migration and the orthogonal prior small sample image feature learning, assuming a new class and base class shared feature extraction mode and assuming the feature orthogonality of different classes of new class data, and constructing an orthogonalized feature adaptation network to ensure that the features of different classes are orthogonal to each other, thereby improving the feature identification degree.

The detailed description of the proposed small sample image classification method and apparatus based on feature migration and orthogonal prior and the method thereof are described above with reference to the accompanying drawings. The implementation of the method and the device will be clear to those skilled in the art from the above description of the embodiments.

The algorithms and displays presented herein are not inherently related to any particular computer, virtual machine, or other apparatus. Various general purpose systems may also be used with the teachings herein. The required structure for constructing such a system will be apparent from the description above. Moreover, this disclosure is not intended to be limited to any particular programming language. It will be appreciated that a variety of programming languages may be used to implement the disclosure as described herein, and any descriptions of specific languages are provided above to disclose the best mode disclosed herein.

Similarly, it should be appreciated that in the foregoing description of exemplary embodiments disclosed herein, various features disclosed herein are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more of the various disclosed aspects. However, the disclosed method should not be interpreted to reflect the following schematic: rather, the claims appended hereto are directed to more features than are expressly recited in each claim. Rather, as the following claims reflect, disclosed aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this disclosure.

The above-mentioned embodiments are only specific embodiments of the present application, and are used for illustrating the technical solutions of the present application, but not limiting the same, and the scope of the present application is not limited thereto, and although the present application is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: those skilled in the art can still make modifications or easily conceive of changes to the technical solutions described in the foregoing embodiments, or make equivalents to some of them, within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the embodiments of the present application. Are intended to be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A small sample image classification method based on feature migration and orthogonal prior is characterized by comprising the following steps:

2. The small sample image classification method based on feature migration and orthogonal prior as claimed in claim 1, wherein step S1 comprises:

s11, data are processed

Is divided into

And

s12, for the C-way K-shot classification task, from D _train Randomly selects C categories, and randomly selects M samples in each category, whereinK samples are used as support samples S _i And the rest M-K samples are used as query samples Q _i ，S _i And Q _i Form a task T _i Same for D _test Has a task

S13, a first training stage: pre-training embedded module f by base class data _θ ，f _θ The method comprises the following steps of (1) containing 4 convolution blocks, wherein each convolution block contains a convolution layer, a pooling layer and a nonlinear activation function; the window size of a convolution kernel used by each convolution block is 3 multiplied by 3, a batch normalization, RGB three channels, a pooling layer, a 2 multiplied by 2 maximum pooling layer, the maximum pooling layers of the last two blocks are cut, and a nonlinear activation layer is used as an activation function of the nonlinear activation layer.

3. The small sample image classification method based on feature migration and orthogonal prior as claimed in claim 1, wherein in the feature learning network model based on feature migration and orthogonal prior in step S2, the orthogonalized feature adaptation network consists of three parts: embedded module f _θ Orthogonal adaptation module

And a metrology module; quadrature adaptation module

4. The method for classifying small sample images based on feature migration and orthogonal prior as claimed in claim 1, wherein the step S3 of optimizing the orthogonal prior based on training the feature learning network model comprises:

s31, in the second stage of training, a classification task is carried out on the new data, and all the support samples are inputFixed-parameter embedded module f _θ In the method, corresponding support sample characteristics f are obtained _θ (S _ck )；

5. The small sample image classification method based on feature migration and orthogonal prior as claimed in claim 4, wherein the step S33 is calculated by the formula:

wherein S is _ck For the kth supported sample of class c,

wherein C is the total category number under the current task, H is the number of characteristic channels, and H is CInteger multiples of; in the above formula, when h is within a given range, the range of h starts from 0, and M _cijh Equal to 1 and the values of the remaining positions are 0.

6. The small sample image classification method based on feature migration and orthogonal prior as claimed in claim 4, wherein the calculation formula of step S34 is as follows:

multiplication of corresponding elements of the representative matrix, | | P _ci I is expressed by solving matrix P _ci The two norms of (a).

7. The small sample image classification method based on feature migration and orthogonal prior as claimed in claim 4, wherein the mean square error loss function calculation formula of step S35 is as follows:

wherein N is the total category number under the current task, C (P) _ci ，P _cj ) To calculate the cosine distances between classes, where MSE [ cos (P) ] _ci ，P _cj )，1]＝[cos(P _ci ，P _cj )-1] ² ；

Training is repeated for multiple tasks until the network converges.

8. The small sample image classification method based on feature migration and orthogonal prior as claimed in claim 7, wherein the Adam adaptive optimization algorithm comprises the following specific steps:

calculate Momentum exponential weighted average:

v _dW ＝β ₁ v _dW +(1-β ₁ )dW (5)

v _db ＝β ₁ v _db +(1-β ₁ )db (6)

S _dW ＝β ₂ S _dW +(1-β ₂ )(dW) ² (7)

S _db ＝β ₂ S _db +(1-β ₂ )(db) ² (8)

calculating the deviation correction of Momentum and RMSprop algorithms:

and (3) deviation correction of the Momentum algorithm:

deviation correction of RMSprop algorithm:

gradient descent is carried out, and the weight is updated:

in equations (5) - (14), t represents the t-th iteration, α represents the learning rate, which controls the update rate of the weights, ε represents a very small constant, β ₁ ，β ₂ Respectively representing the exponential decay rates of the first and second moment estimates,

representing the first and second moment estimates after bias correction.

9. The small sample image classification method based on feature migration and orthogonal prior as claimed in claim 1, wherein step S4 comprises:

s41, testing process, each task

From the supporting set

And query set

Composition, query set of test set

In (1) obtaining characteristics

S42, respectively matching the characteristics output by the orthogonal adaptation module with the masks M corresponding to different classes _c Multiplying, and specifically operating as shown in formula (1);

wherein the content of the first and second substances,

for the k-th query sample,

representing multiplication of corresponding elements of a matrix of the same order, M _cijh Is a mask M of class c _c In the ith row and jth column, the value of the h channel, M _c The elements of (a) are constituted as follows:

c is the total category number of the current task, H is the number of the characteristic channels, and H is an integral multiple of C; in the above formula, when h is within a given range, the range of h starts from 0, M _cijh Equal to 1, the values of the remaining positions being 0;

Cosine distances from all supported samples;

10. A small sample image classification device based on feature migration and orthogonal prior, which is used for implementing the method of any one of claims 1-9, and comprises the following functional modules: