CN114882267A

CN114882267A - Small sample image classification method and system based on relevant region

Info

Publication number: CN114882267A
Application number: CN202210342541.7A
Authority: CN
Inventors: 王蕊; 施璠
Original assignee: Institute of Information Engineering of CAS
Current assignee: Institute of Information Engineering of CAS
Priority date: 2022-03-31
Filing date: 2022-03-31
Publication date: 2022-08-09

Abstract

The invention discloses a small sample image classification method and system based on a relevant region. Firstly, considering the regional autocorrelation relation of a sample, and enhancing the characteristics; then, considering the cross-sample area correlation relationship among the support sets, extracting commonalities from the features of the current category, and further realizing feature enhancement; and finally, considering the cross-sample area correlation relationship between the support set and the query set, finding out a target area with similar characteristics of the query set and the support set, and calculating the similarity to obtain a final classification result. According to the qualitative analysis experiment result, after the method provided by the invention is used, the common region in the same category can be better focused, and the part of the region is enhanced, so that the characteristics are more representative; in quantitative analysis, through ablation experiments and comparison experiments, it can be found that the attention module across samples brings promotion to classification results in support set, support set and query set.

Description

Small sample image classification method and system based on relevant region

Technical Field

The invention belongs to the technical field of computer vision, and particularly relates to a method and a system capable of classifying images in a small sample scene.

Background

In the field of computer vision, image understanding is an important problem to be solved, wherein image classification is a basic task of image understanding and plays a crucial role. The image classification task inputs images and outputs predictions of image categories, and usually requires model training in a large-scale data set, so as to obtain the capability of analyzing and processing the images. On one hand, the training process of the image classification task can be divided into supervised training, semi-supervised training and unsupervised training according to the number of training labels. Because the process of labeling data is time-consuming and labor-consuming, it is desirable to have a model with a good classification accuracy with as little data as possible, which is a challenging task. On the other hand, in a real scene, a large-scale data set is difficult to acquire, such as rare species in nature, and the images capable of being acquired are limited; the rare cases in the medical scene have great significance for the progress of medical treatment if more related diseases can be identified by only a few samples; emerging dangerous goods or weapons can better guarantee national safety if a complete security inspection process can be quickly established by using a small amount of samples.

The small sample image classification task aims at solving the problems, reduces the dependence of the model on large-scale labeled data, and enables the classification accuracy of a new class with only a small number of samples to be higher.

The current main methods for small sample learning include the following four methods:

1) an optimization-based approach. The concept of meta-learning, i.e. learning by the model, is used here with the ultimate goal of converging as quickly as possible when the model is faced with a new set of learning tasks. Training based on this method usually consists of two cycles. For example, the MAML is composed of a base learner and a meta learner, in the training process, the base learner is trained for each independent task internally, the meta learner is optimized externally according to the verification effect of the obtained base learner, and finally the optimal initialization parameter of the base learner which can be quickly adapted to the new task is obtained. However, this method also has limitations, on one hand, this meta-learning method is only effective on the shallow network, and the representation learning capability of the shallow network is limited; on the other hand training such a model requires a large number of similar tasks, which are difficult to obtain. In addition ANIL indicates that in the model structure of MAML, an inner loop is not necessary, and it is more important for the network to learn a better feature representation than a faster adaptation.

2) And (3) an augmentation method based on the generated data. On a data level, the problem of too little original data can be solved by generating new data; at the feature level, by using the generation method, not only the boundary between specific categories can be learned, but also the complete boundary of category distribution can be obtained by introducing the concept of data distribution, so that the problem of category combination can be processed. However, the generative method is usually based on a large amount of data, and is not good in a small sample scene with a small data size.

3) A method of metric-based learning. Firstly, a feature extractor is obtained through training, the images obtain feature vectors in a feature space through the feature extractor, the distances among different images are obtained through proper metric functions (such as Euclidean distances, cosine distances and the like), and the images are classified through distance relations. In the prototype network, Euclidean distance is used as a measurement function; using cosine distance as a metric function in the matching network; in the Negative-cosine, the performance of the classification model in small sample learning is improved by improving the cosine classifier to add a boundary to the classification result.

4) A method based on feature embedding. The essence is that a sufficiently good feature extractor needs to be obtained, and the extracted features are more discriminative in a special space for different classes. The importance of obtaining a better feature space in small sample learning is also indicated in the context of RFS. Some approaches improve the representation capabilities of feature extractors by constructing a self-supervised learning task. Some methods also obtain better classification boundaries by processing features.

The method starts from different angles, improves the classification accuracy of the small sample image classification task to a certain extent, and is sent from the angle of measurement learning, so that the regional similarity relation among the image samples is fully mined, and the features are enhanced. In the small sample image classification task, the final classification target is provided by a small number of samples of the support set, and these samples are used as labels in the final classification prediction. However, on this premise, the classification effect of the small sample task is greatly influenced by the quality of the support set. A small number of samples may not represent this category well and problems may arise concerning wrong target areas. For example, when a "person riding a bicycle" appears in one support set image, and the picture is taken as a classification target of the "bicycle" category, the network may erroneously focus on the portion of the "person" therein, resulting in classification failure of other images in which the "person" appears in the final prediction.

In order to solve the problem, some current small sample image classification methods search a region of interest based on structural features or semantic features based on an image attention mechanism, and some current researches also improve the accuracy of small sample image classification based on region similarity from the perspective of region attention of cross samples.

Disclosure of Invention

The invention provides a cross-sample attention mechanism, finds out related areas among different samples in the same class, obtains a correlation matrix, and further obtains a weighting characteristic with correlation information. Similarly, the correlation relationship between the query set and the support set can be constructed to obtain a correlation matrix of the query set relative to the support set, so as to assist the classification of the small sample image.

In the invention, each group of training tasks is divided into a support set and a query set, wherein the support set is used for providing a target type of the current classification task and a small number of training samples, and the query set is used for calculating the classification accuracy. The model training is divided into three steps, firstly, the autocorrelation area of a sample is considered, and the characteristics are enhanced; then, considering cross-sample area correlation among the support sets, extracting commonalities from the features of the current category, and further realizing feature enhancement; and finally, considering the cross-sample area correlation between the support set and the query set, finding out a target area with similar characteristics of the query set and the support set, and calculating the similarity to obtain a final classification result.

In order to achieve the purpose, the technical scheme adopted by the invention is as follows:

a small sample image classification method based on a relevant region comprises the following steps:

sequentially inputting the support set into a feature extractor and a regional autocorrelation module to obtain autocorrelation enhanced support set features;

inputting the query set into a feature extractor and a regional autocorrelation module in sequence to obtain autocorrelation enhanced query set features;

inputting the support set characteristics of the autocorrelation enhancement into a cross-sample correlation module for characteristic enhancement to obtain the support set characteristics of the cross-sample enhancement;

and inputting the cross-sample enhanced support set features and the autocorrelation enhanced query set features into a cross-sample correlation module, and calculating similarity of the support set features and the query set features output by the cross-sample correlation module, thereby realizing classification of the query set.

Further, the feature extractor, the area autocorrelation module and the cross-sample correlation module are trained by adopting the following steps:

(1) and respectively inputting all the images of the support set and the query set into a feature extractor based on a convolutional neural network, and removing the pooling operation of the last layer to obtain the image features with the preserved position relationship.

(2) In order to assist the model to train better and improve the representation capability of the feature extractor, the classification accuracy of the base class is calculated in the training process, namely after the extracted features are subjected to pooling operation, classification is directly performed through a full connection layer, and a classification loss function is calculated.

(3) Inputting the image features obtained in the step (1) into a regional autocorrelation module, enhancing the position features of the image, finding out regions with spatial similarity, facilitating the extraction of better spatial features, and assisting the subsequent cross-sample correlation calculation.

(4) And inputting the samples of the same category in the support set into a cross-sample correlation module, and considering the spatial similarity relation before different samples of the same category. The region with higher similarity can be regarded as a target region of the class, namely a classification target, and the target region is taken as an attention coefficient and is applied to all samples of the class, so that the feature enhancement is realized. The enhanced features are averaged as the class center for the class.

(5) And (4) continuously inputting the support set characteristics obtained in the step (4) through characteristic enhancement and the query set characteristics obtained in the step (3) into a cross-sample correlation module sharing parameters with the step (4), searching for a region which is relatively matched in the query set sample and the support set sample, and regarding the region with higher similarity as a concerned region of the query set sample. And using the attention coefficient as the attention coefficient to perform characteristic enhancement. On the basis, the similarity of the query set characteristics and the support set characteristics is calculated to obtain a correlation loss function.

(6) And (3) combining and calculating the classification loss function based on all samples in the step (2) and the correlation loss function taking the task as the unit in the step (5), performing iterative optimization, and training to obtain parameters of the feature extractor, the region autocorrelation module and the cross-sample correlation module required by the test.

A small sample image classification system based on a relevant region by adopting the method comprises a feature extractor, a region autocorrelation module, a cross-sample correlation module and a classification module;

the feature extractor is used for extracting features from the input query set and the support set;

the regional autocorrelation module is used for enhancing regional correlation of a single sample to obtain autocorrelation enhanced support set characteristics and autocorrelation enhanced query set characteristics;

the cross-sample correlation module is used for calculating the correlation between the support sets and the query sets;

the classification module is used for calculating similarity of the support set features and the query set features output by the cross-sample correlation module, so that classification of the query set is realized.

The invention has the beneficial effects that:

the method is based on the study of the small sample learning of the attention area, and improves the accuracy of the small sample image classification task from the aspect of enhancing the characteristics of the attention area. Firstly, considering the regional autocorrelation relation of a sample, and enhancing the characteristics; then, considering the cross-sample area correlation relationship among the support sets, extracting commonalities from the features of the current category, and further realizing feature enhancement; and finally, considering the cross-sample area correlation relationship between the support set and the query set, finding out a target area with similar characteristics of the query set and the support set, and calculating the similarity to obtain a final classification result. Experiments are performed on a small sample image classification dataset miniImageNet, which improves the baseline model and can be used in other methods as a post-processing method. According to the qualitative analysis experiment result, after the method provided by the invention is used, the common region in the same category can be better focused, and the part of the region is enhanced, so that the characteristics are more representative; in quantitative analysis, through ablation experiments and comparison experiments, it can be found that the attention module across samples brings promotion to classification results in support set, support set and query set.

Drawings

FIG. 1 is a schematic diagram of a small sample correlation study of the cross-sample attention mechanism of the present invention; wherein A is ₁ Attention matrix representing the 1 st support set sample, A ₂ The attention matrix for the 2 nd support set sample is represented and a represents the attention matrix for the query set sample.

FIG. 2 is a schematic diagram of a comparison of the visualization results of the correlation area according to the present invention, which is sequentially from top to bottom an input original image, a feature activation image obtained by directly using a feature extractor, and a feature activation image obtained by using the method of the present invention;

FIG. 3 is a comparison experiment of ReNet and the invention under 5-way-k-shot setting, comparing the effect difference under different shots.

Detailed Description

In order to make the aforementioned and other features and advantages of the invention more comprehensible, embodiments accompanied with figures are described in detail below.

The present embodiment establishes an image classification network structure as shown in fig. 1, and the present part is mainly divided into three parts of regional relevance enhancement of a single sample, relevance between support sets, and relevance between a support set and a query set. The cross-sample correlation module can be used as a post-processing operation in a pre-trained or de-trained feature extractor to improve the final classification accuracy.

1. Feature extraction

After the query set and the support set are input into a feature extractor (an encoder in fig. 1) based on a convolutional neural network, a feature matrix F is obtained respectively _q And F _s . In the step, after all the images of the support set and the query set are input into the feature extractor, the last layer of pooling operation is removed to obtain the image features with the preserved position relationship.

2. Computing a classification loss function

Performing pooling operation on the obtained features to obtain a C-dimensional vector, inputting the C-dimensional vector into a full-connection network h, predicting the category of the current image, and calculating a classification loss function:

L _cls ＝E _x，y [L(h(F(x))，y)]，

where x is the input image, y is the class label of x, F (x) represents the feature matrix F _q And F _s Pooled features, where support and query sets are not distinguished, L is the cross entropy loss function, E _x,y Representing the averaging of all possible x, y corresponding losses.

3. Regional correlation enhancement for individual samples

Feature matrix F _q And F _s Respectively inputting the correlation results into an autocorrelation area calculation model of an area autocorrelation module, calculating the correlation relation of each pixel point relative to surrounding neighbors, adding the correlation results into an original feature matrix as a supplement, and realizing feature enhancement:

F _q ＝F _q +SR(F _q )

F _s ＝F _s +SR(F _s )

wherein SR represents an autocorrelation calculation function, including a function for calculating correlation with surrounding neighbors and subsequent per-channel convolution operation, and characteristic information is preserved.

4. Dependencies between support sets

And taking the characteristics obtained by the regional autocorrelation module as the input of the cross-sample correlation module. In the module, the input support set characteristics F are obtained by calculating the similarity _s With respect to position, the correlation matrix Cor:

where sim (·) denotes a similarity calculation function, Cor(s) _i ,s _j ) The size is H × W × H × W, where H denotes the height of the feature and W denotes the width of the feature, Cor(s) _i ,s _j ) Denotes s _i ,s _j The correlation matrix of the two inputs at spatial locations. Further, an attention matrix A of the ith input relative to the jth input can be obtained _ij ：

Where m, n represent the spatial location coordinates of the region block in input i, τ is the temperature coefficient,

representing the spatial location coordinates of the region block in input j,

representing the spatial location coordinates of the region block in input i. Averaging other images of the support set with respect to the input attention matrix to obtain the attention matrix A of the ith support set image _i ：

Attention matrix A _i As weights, with the original features F of the supporting set _s Multiplication by multiplicationObtaining a feature matrix supporting relevance weighting in the set class

Further carrying out weighted average on spatial positions to obtain C-dimensional feature vector f of corresponding category of the support set sample _s I.e. the support set feature vector enhanced across samples:

5. correlations between support and query sets

Handle

And feature matrix F of the query set _q Inputting the correlation matrix into a cross-sample correlation module, and repeating the steps to obtain a correlation matrix between the query set and the support set:

further attention matrix a can be found:

where m, n represent spatial position coordinates and τ is the temperature coefficient. This is taken as the original feature F of the weight and query set _q Multiplying to obtain a feature matrix of correlation weighting between the query set and the support set

Obtaining the C-dimensional characteristic vector f of the class after the averaging and pooling operation _q I.e. the query set feature vector.

Computing a query set feature vector f _q And support set feature vector f _s The cosine similarity between the two functions, and a correlation loss function is calculated.

Wherein cos (·) represents a cosine similarity calculation function, T represents a temperature coefficient, and s 'and q' represent the characteristics of the support set and the query set of any two different categories in the input batch after the correlation is calculated, respectively.

6. Calculating an overall loss function

The overall loss function comprises the two parts, the classification loss function of the step 2 provides a constraint for the training of the feature extractor, the correlation loss function of the step 5 provides a constraint for the enhancement of the region of interest, the two functions are added to realize end-to-end training, and the overall loss function can be defined as:

L _total ＝L _cls +γL _relation

where γ is the coefficient of the correlation loss function.

7. Obtaining classification results

In the testing stage, cosine similarity between the query set and all support sets is compared, and the category corresponding to the support set with the highest similarity is the prediction category of the query set sample.

The invention provides a method for classifying images in a small sample scene, which comprises the following steps of:

(1) and (3) testing environment:

the system environment is as follows: centos 7;

hardware environment: memory: 64GB, GPU: TITAN XP, hard disk: 2 TB;

(2) experimental data:

the method is carried out on a small sample image classification data set miniImageNet, and considering that the method mainly solves the problem of a support set attention area of more than one sample, when the sample of the support set is more than 1, the method is meaningful only by considering cross-sample correlation among the support sets. In combination with the current mainstream experimental setup, the invention selects a setup of five categories and five samples (5-way-5-shot) in each category, and performs qualitative analysis and quantitative analysis under the setup. In addition, in the comparative experiment, in order to further study the model effect under different sample numbers, the sample number of each category is sequentially tested from 1 to 10.

A subset of the miniImageNet dataset ImageNet, which is much smaller in size and dimension than ImageNet, requires fewer resources for training on this dataset and is often used in the task of small sample classification. There are 100 classes, of which 64 classes are training sets, 16 classes are validation sets, and 20 classes are test sets, each class having 600 images, each image having a size of 84 × 84. In the training stage, after the boundary with the size of 4 is filled around, the boundary is cut into 84 × 84 at random, and the data of color disturbance and horizontal turning are added for enhancing and carrying out regularization operation; the test phase only performs the regularization operation.

In the training, the correlation calculation temperature coefficient was 5, the classification temperature coefficient T was 0.2, the batch size was 128, and the coefficient of the correlation loss function was 0.25. Training optimization using the SGD optimizer, initial learning rate of 0.1, decay with a weight of 0.1 in the 50 th and 70 th training periods, total 80 periods of training, and momentum of 0.9. During testing, 2000 tasks are randomly sampled, each task has 5 categories, each category in the set is supported, 15 samples in each category in the set are inquired, the average classification accuracy of the model is output as a quantitative result, and the regional correlation size across the samples is output as a qualitative result.

(3) Visualization experiment results:

in order to demonstrate the effect of the attention mechanism across samples on the region of interest, the results of the study across sample-related regions are presented by feature activation maps and attention coefficient visualizations. As shown in fig. 2, in each group of samples, the first behavior is the input original image, the second behavior is the visualization result of the feature activation map extracted by the feature extractor, and the third behavior is the visualization result of the attention map obtained after the cross-sample attention module is used. It has been found that the features extracted by the feature extractor do not focus effectively on the target region. After cross-sample correlation operation is used, the network can find the commonalities of the input images, so that the target area is extracted more accurately, and feature enhancement is realized.

Taking the upper half of fig. 2 as an example, when the classified object is "curtain", and the feature activation map is directly selected (the second row), "curtain" is easily regarded as background in the image, and objects such as "person", "instrument", "seat", etc. as foreground are more concerned, thereby causing the classification failure. After the attention module across samples is passed, the obtained attention map focuses on the common features of the pictures, namely the curtain, and the obtained attention map has higher attention degree on the curtain, so that better feature representation is obtained, and the classification accuracy is further improved. In the lower half of fig. 2, the classified object is "lion", and when the feature activation map is directly selected, the region of the "lion" concerned is not complete enough, or the region of the "tree branch" in the image is concerned by mistake. After the attention module of the cross-sample, the lion concerned by the network is more complete, the feature enhancement is realized, and the classification accuracy is improved.

(4) Quantitative experimental results:

in order to quantitatively show the experimental effect, the invention carries out a series of ablation experiments and comparison experiments.

The results of the ablation experiments are shown in table 1. In the table, S2-CA represents the cross-sample attention between the support sets, SQ-CA represents the cross-sample attention between the query set and the support sets, and whether the two modules are used or not is studied. The experiment was performed on a small sample learning data set miniImageNet, and the experimental setup of five samples per category was chosen considering that the correlation between support sets can only be performed when the number of samples is greater than 1. As shown in table 1, cross-sample attention between support sets alone may bring about a 0.51% boost; cross-sample attention between the query set and the support set alone may be boosted by 0.48%; the simultaneous use of two cross-sample attention mechanisms is superior to the results obtained when they are used alone, and will further improve the classification accuracy, bringing about a 0.76% improvement. Ablation experimental results show that cross-sample attention, both between support sets and between query and support sets, carries with it the effect of feature enhancement and that using both simultaneously leads to better results.

TABLE 1 ablation test results

Serial number	S2-CA	SQ-CA	ACC(％)
				1	×	×	82.10±0.30
2	√	×	82.61±0.29
				3	×	√	82.58±0.30
4	√	√	82.86±0.30

In order to prove the improvement of the cross-sample attention among the support sets on the classification effect of the small sample images, experiments are carried out under different support set sample numbers. The cross-sample correlation relationship between the support sets is not considered in ReNet, so that the ReNet can be used as a comparative experiment for correlation research between the support sets. The experimental results are shown in fig. 3, with the abscissa representing the different numbers (shot) of support sets, and the results are represented by broken line graphs, where the solid line represents the average classification accuracy, the dashed line represents the average loss, red is the effect of the model of the present invention, and blue is the effect of the comparative experiment ReNet. It can be seen from fig. 3 that when the number of shots is greater than 1, the method proposed by the model is always superior to the ReNet model that does not use the attention of the support set across samples. In addition, the larger the number of samples (shot) of the support set is, the higher the average classification accuracy is, and the lower the loss is, because as the number of samples increases, the description of one type is more comprehensive, and the final classification is more favorable, and the correlation calculation between the support sets across the samples is also from the perspective of better describing one class, and a common target region is enhanced by searching a correlation region between the same class.

Based on the same inventive concept, another embodiment of the present invention provides a small sample image classification system based on a correlation region using the above method, which includes a feature extractor, a region autocorrelation module, a cross-sample correlation module, and a classification module;

Based on the same inventive concept, another embodiment of the present invention provides an electronic device (computer, server, smartphone, etc.) comprising a memory storing a computer program configured to be executed by the processor, and a processor, the computer program comprising instructions for performing the steps of the inventive method.

Based on the same inventive concept, another embodiment of the present invention provides a computer-readable storage medium (e.g., ROM/RAM, magnetic disk, optical disk) storing a computer program, which when executed by a computer, performs the steps of the inventive method.

Although the present invention has been described with reference to the above embodiments, it should be understood that various changes, substitutions and alterations can be made herein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. A small sample image classification method based on a relevant region is characterized by comprising the following steps:

2. The method of claim 1, wherein the feature extractor, the regional autocorrelation module, and the cross-sample correlation module are trained using the following steps:

(1) respectively inputting all images of the support set and the query set into a feature extractor based on a convolutional neural network, and removing the pooling operation of the last layer to obtain image features with reserved position relations;

(2) after pooling operation is carried out on the features extracted by the feature extractor, classification is carried out through a full connection layer, and a classification loss function is calculated;

(3) inputting the image features obtained in the step (1) into a regional autocorrelation module, and enhancing the position features of the image to obtain autocorrelation enhanced support set features and autocorrelation enhanced query set features;

(4) inputting samples of the same category in a support set into a cross-sample correlation module, considering the spatial similarity relation between different samples of the same category, regarding a region with higher similarity as a target region of the category, namely a classification target, and applying the target region as an attention coefficient to all samples of the category, so as to realize feature enhancement and obtain a cross-sample enhanced support set feature;

(5) inputting the cross-sample enhanced support set characteristics obtained in the step (4) and the self-correlation enhanced query set characteristics obtained in the step (3) into a cross-sample correlation module, searching a matched area in the query set sample and the support set sample, and calculating the similarity of the query set characteristics and the support set characteristics to obtain a correlation loss function;

(6) and (3) combining the classification loss function in the step (2) and the correlation loss function in the step (5), performing iterative optimization, and training to obtain parameters of the feature extractor, the region autocorrelation module and the cross-sample correlation module.

3. The method according to claim 1 or 2, wherein the regional autocorrelation module calculates the correlation relationship of each pixel point relative to the surrounding neighbors, and adds the correlation relationship to the feature matrix as a supplement to realize feature enhancement.

4. The method of claim 1 or 2, wherein the cross-sample correlation module obtains the cross-sample enhanced support set features by:

taking the features obtained by the area autocorrelation module as the input of the cross-sample correlation module, and obtaining the input support set features F in the cross-sample correlation module by calculating the similarity _s With respect to position, the correlation matrix Cor:

where sim (·) denotes a similarity calculation function, Cor(s) _i ,s _j ) The size is H × W × H × W, which represents s _i ,s _j A correlation matrix of the two inputs at spatial locations; further obtaining an attention matrix A of the ith input relative to the jth input _ij ：

Wherein m, n represent space position coordinates, tau is a temperature coefficient,

representing the spatial location coordinates of the region block in input j,

representing the spatial position coordinates of the region block in input i; will support set other graphsAveraging the image with respect to the input attention matrix to obtain the attention matrix A of the ith support set image _i ：

Attention matrix A _i As weights, with the original features F of the supporting set _s Multiplying to obtain a feature matrix supporting relevance weighting in the set class

And further carrying out weighted average on spatial positions to obtain a support set feature enhanced across samples:

5. the method of claim 4, wherein the cross-sample relevance module computes the relevance between the support set and the query set using the following steps:

will be provided with

And feature matrix F of the query set _q Inputting the correlation matrix into a cross-sample correlation module to obtain a correlation matrix between the query set and the support set:

then, an attention matrix a is obtained:

this is taken as the original feature F of the weight and query set _q Multiplying to obtain a feature matrix of correlation weighting between the query set and the support set

Obtaining a query set feature vector after averaging and pooling operations:

6. The method of claim 2, wherein the classification loss function and the correlation loss function form an overall loss function, wherein the classification loss function provides a constraint for training of the feature extractor, wherein the correlation loss function provides a constraint for enhancement of the region of interest, and wherein the two loss functions are added to achieve end-to-end training.

7. A small sample image classification system based on a relevant region by adopting the method of any one of claims 1 to 6, which is characterized by comprising a feature extractor, a region autocorrelation module, a cross-sample correlation module and a classification module;

8. An electronic apparatus, comprising a memory and a processor, the memory storing a computer program configured to be executed by the processor, the computer program comprising instructions for performing the method of any of claims 1 to 6.

9. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program which, when executed by a computer, implements the method of any one of claims 1 to 6.