CN114882267A - Small sample image classification method and system based on relevant region - Google Patents

Small sample image classification method and system based on relevant region Download PDF

Info

Publication number
CN114882267A
CN114882267A CN202210342541.7A CN202210342541A CN114882267A CN 114882267 A CN114882267 A CN 114882267A CN 202210342541 A CN202210342541 A CN 202210342541A CN 114882267 A CN114882267 A CN 114882267A
Authority
CN
China
Prior art keywords
sample
correlation
cross
module
support set
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210342541.7A
Other languages
Chinese (zh)
Inventor
王蕊
施璠
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Information Engineering of CAS
Original Assignee
Institute of Information Engineering of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Information Engineering of CAS filed Critical Institute of Information Engineering of CAS
Priority to CN202210342541.7A priority Critical patent/CN114882267A/en
Publication of CN114882267A publication Critical patent/CN114882267A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/18Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
    • G06F18/24133Distances to prototypes
    • G06F18/24137Distances to cluster centroïds
    • G06F18/2414Smoothing the distance, e.g. radial basis function networks [RBFN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Mathematical Physics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Computational Mathematics (AREA)
  • Computing Systems (AREA)
  • Pure & Applied Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Multimedia (AREA)
  • Algebra (AREA)
  • Probability & Statistics with Applications (AREA)
  • Operations Research (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a small sample image classification method and system based on a relevant region. Firstly, considering the regional autocorrelation relation of a sample, and enhancing the characteristics; then, considering the cross-sample area correlation relationship among the support sets, extracting commonalities from the features of the current category, and further realizing feature enhancement; and finally, considering the cross-sample area correlation relationship between the support set and the query set, finding out a target area with similar characteristics of the query set and the support set, and calculating the similarity to obtain a final classification result. According to the qualitative analysis experiment result, after the method provided by the invention is used, the common region in the same category can be better focused, and the part of the region is enhanced, so that the characteristics are more representative; in quantitative analysis, through ablation experiments and comparison experiments, it can be found that the attention module across samples brings promotion to classification results in support set, support set and query set.

Description

Small sample image classification method and system based on relevant region
Technical Field
The invention belongs to the technical field of computer vision, and particularly relates to a method and a system capable of classifying images in a small sample scene.
Background
In the field of computer vision, image understanding is an important problem to be solved, wherein image classification is a basic task of image understanding and plays a crucial role. The image classification task inputs images and outputs predictions of image categories, and usually requires model training in a large-scale data set, so as to obtain the capability of analyzing and processing the images. On one hand, the training process of the image classification task can be divided into supervised training, semi-supervised training and unsupervised training according to the number of training labels. Because the process of labeling data is time-consuming and labor-consuming, it is desirable to have a model with a good classification accuracy with as little data as possible, which is a challenging task. On the other hand, in a real scene, a large-scale data set is difficult to acquire, such as rare species in nature, and the images capable of being acquired are limited; the rare cases in the medical scene have great significance for the progress of medical treatment if more related diseases can be identified by only a few samples; emerging dangerous goods or weapons can better guarantee national safety if a complete security inspection process can be quickly established by using a small amount of samples.
The small sample image classification task aims at solving the problems, reduces the dependence of the model on large-scale labeled data, and enables the classification accuracy of a new class with only a small number of samples to be higher.
The current main methods for small sample learning include the following four methods:
1) an optimization-based approach. The concept of meta-learning, i.e. learning by the model, is used here with the ultimate goal of converging as quickly as possible when the model is faced with a new set of learning tasks. Training based on this method usually consists of two cycles. For example, the MAML is composed of a base learner and a meta learner, in the training process, the base learner is trained for each independent task internally, the meta learner is optimized externally according to the verification effect of the obtained base learner, and finally the optimal initialization parameter of the base learner which can be quickly adapted to the new task is obtained. However, this method also has limitations, on one hand, this meta-learning method is only effective on the shallow network, and the representation learning capability of the shallow network is limited; on the other hand training such a model requires a large number of similar tasks, which are difficult to obtain. In addition ANIL indicates that in the model structure of MAML, an inner loop is not necessary, and it is more important for the network to learn a better feature representation than a faster adaptation.
2) And (3) an augmentation method based on the generated data. On a data level, the problem of too little original data can be solved by generating new data; at the feature level, by using the generation method, not only the boundary between specific categories can be learned, but also the complete boundary of category distribution can be obtained by introducing the concept of data distribution, so that the problem of category combination can be processed. However, the generative method is usually based on a large amount of data, and is not good in a small sample scene with a small data size.
3) A method of metric-based learning. Firstly, a feature extractor is obtained through training, the images obtain feature vectors in a feature space through the feature extractor, the distances among different images are obtained through proper metric functions (such as Euclidean distances, cosine distances and the like), and the images are classified through distance relations. In the prototype network, Euclidean distance is used as a measurement function; using cosine distance as a metric function in the matching network; in the Negative-cosine, the performance of the classification model in small sample learning is improved by improving the cosine classifier to add a boundary to the classification result.
4) A method based on feature embedding. The essence is that a sufficiently good feature extractor needs to be obtained, and the extracted features are more discriminative in a special space for different classes. The importance of obtaining a better feature space in small sample learning is also indicated in the context of RFS. Some approaches improve the representation capabilities of feature extractors by constructing a self-supervised learning task. Some methods also obtain better classification boundaries by processing features.
The method starts from different angles, improves the classification accuracy of the small sample image classification task to a certain extent, and is sent from the angle of measurement learning, so that the regional similarity relation among the image samples is fully mined, and the features are enhanced. In the small sample image classification task, the final classification target is provided by a small number of samples of the support set, and these samples are used as labels in the final classification prediction. However, on this premise, the classification effect of the small sample task is greatly influenced by the quality of the support set. A small number of samples may not represent this category well and problems may arise concerning wrong target areas. For example, when a "person riding a bicycle" appears in one support set image, and the picture is taken as a classification target of the "bicycle" category, the network may erroneously focus on the portion of the "person" therein, resulting in classification failure of other images in which the "person" appears in the final prediction.
In order to solve the problem, some current small sample image classification methods search a region of interest based on structural features or semantic features based on an image attention mechanism, and some current researches also improve the accuracy of small sample image classification based on region similarity from the perspective of region attention of cross samples.
Disclosure of Invention
The invention provides a cross-sample attention mechanism, finds out related areas among different samples in the same class, obtains a correlation matrix, and further obtains a weighting characteristic with correlation information. Similarly, the correlation relationship between the query set and the support set can be constructed to obtain a correlation matrix of the query set relative to the support set, so as to assist the classification of the small sample image.
In the invention, each group of training tasks is divided into a support set and a query set, wherein the support set is used for providing a target type of the current classification task and a small number of training samples, and the query set is used for calculating the classification accuracy. The model training is divided into three steps, firstly, the autocorrelation area of a sample is considered, and the characteristics are enhanced; then, considering cross-sample area correlation among the support sets, extracting commonalities from the features of the current category, and further realizing feature enhancement; and finally, considering the cross-sample area correlation between the support set and the query set, finding out a target area with similar characteristics of the query set and the support set, and calculating the similarity to obtain a final classification result.
In order to achieve the purpose, the technical scheme adopted by the invention is as follows:
a small sample image classification method based on a relevant region comprises the following steps:
sequentially inputting the support set into a feature extractor and a regional autocorrelation module to obtain autocorrelation enhanced support set features;
inputting the query set into a feature extractor and a regional autocorrelation module in sequence to obtain autocorrelation enhanced query set features;
inputting the support set characteristics of the autocorrelation enhancement into a cross-sample correlation module for characteristic enhancement to obtain the support set characteristics of the cross-sample enhancement;
and inputting the cross-sample enhanced support set features and the autocorrelation enhanced query set features into a cross-sample correlation module, and calculating similarity of the support set features and the query set features output by the cross-sample correlation module, thereby realizing classification of the query set.
Further, the feature extractor, the area autocorrelation module and the cross-sample correlation module are trained by adopting the following steps:
(1) and respectively inputting all the images of the support set and the query set into a feature extractor based on a convolutional neural network, and removing the pooling operation of the last layer to obtain the image features with the preserved position relationship.
(2) In order to assist the model to train better and improve the representation capability of the feature extractor, the classification accuracy of the base class is calculated in the training process, namely after the extracted features are subjected to pooling operation, classification is directly performed through a full connection layer, and a classification loss function is calculated.
(3) Inputting the image features obtained in the step (1) into a regional autocorrelation module, enhancing the position features of the image, finding out regions with spatial similarity, facilitating the extraction of better spatial features, and assisting the subsequent cross-sample correlation calculation.
(4) And inputting the samples of the same category in the support set into a cross-sample correlation module, and considering the spatial similarity relation before different samples of the same category. The region with higher similarity can be regarded as a target region of the class, namely a classification target, and the target region is taken as an attention coefficient and is applied to all samples of the class, so that the feature enhancement is realized. The enhanced features are averaged as the class center for the class.
(5) And (4) continuously inputting the support set characteristics obtained in the step (4) through characteristic enhancement and the query set characteristics obtained in the step (3) into a cross-sample correlation module sharing parameters with the step (4), searching for a region which is relatively matched in the query set sample and the support set sample, and regarding the region with higher similarity as a concerned region of the query set sample. And using the attention coefficient as the attention coefficient to perform characteristic enhancement. On the basis, the similarity of the query set characteristics and the support set characteristics is calculated to obtain a correlation loss function.
(6) And (3) combining and calculating the classification loss function based on all samples in the step (2) and the correlation loss function taking the task as the unit in the step (5), performing iterative optimization, and training to obtain parameters of the feature extractor, the region autocorrelation module and the cross-sample correlation module required by the test.
A small sample image classification system based on a relevant region by adopting the method comprises a feature extractor, a region autocorrelation module, a cross-sample correlation module and a classification module;
the feature extractor is used for extracting features from the input query set and the support set;
the regional autocorrelation module is used for enhancing regional correlation of a single sample to obtain autocorrelation enhanced support set characteristics and autocorrelation enhanced query set characteristics;
the cross-sample correlation module is used for calculating the correlation between the support sets and the query sets;
the classification module is used for calculating similarity of the support set features and the query set features output by the cross-sample correlation module, so that classification of the query set is realized.
The invention has the beneficial effects that:
the method is based on the study of the small sample learning of the attention area, and improves the accuracy of the small sample image classification task from the aspect of enhancing the characteristics of the attention area. Firstly, considering the regional autocorrelation relation of a sample, and enhancing the characteristics; then, considering the cross-sample area correlation relationship among the support sets, extracting commonalities from the features of the current category, and further realizing feature enhancement; and finally, considering the cross-sample area correlation relationship between the support set and the query set, finding out a target area with similar characteristics of the query set and the support set, and calculating the similarity to obtain a final classification result. Experiments are performed on a small sample image classification dataset miniImageNet, which improves the baseline model and can be used in other methods as a post-processing method. According to the qualitative analysis experiment result, after the method provided by the invention is used, the common region in the same category can be better focused, and the part of the region is enhanced, so that the characteristics are more representative; in quantitative analysis, through ablation experiments and comparison experiments, it can be found that the attention module across samples brings promotion to classification results in support set, support set and query set.
Drawings
FIG. 1 is a schematic diagram of a small sample correlation study of the cross-sample attention mechanism of the present invention; wherein A is 1 Attention matrix representing the 1 st support set sample, A 2 The attention matrix for the 2 nd support set sample is represented and a represents the attention matrix for the query set sample.
FIG. 2 is a schematic diagram of a comparison of the visualization results of the correlation area according to the present invention, which is sequentially from top to bottom an input original image, a feature activation image obtained by directly using a feature extractor, and a feature activation image obtained by using the method of the present invention;
FIG. 3 is a comparison experiment of ReNet and the invention under 5-way-k-shot setting, comparing the effect difference under different shots.
Detailed Description
In order to make the aforementioned and other features and advantages of the invention more comprehensible, embodiments accompanied with figures are described in detail below.
The present embodiment establishes an image classification network structure as shown in fig. 1, and the present part is mainly divided into three parts of regional relevance enhancement of a single sample, relevance between support sets, and relevance between a support set and a query set. The cross-sample correlation module can be used as a post-processing operation in a pre-trained or de-trained feature extractor to improve the final classification accuracy.
1. Feature extraction
After the query set and the support set are input into a feature extractor (an encoder in fig. 1) based on a convolutional neural network, a feature matrix F is obtained respectively q And F s . In the step, after all the images of the support set and the query set are input into the feature extractor, the last layer of pooling operation is removed to obtain the image features with the preserved position relationship.
2. Computing a classification loss function
Performing pooling operation on the obtained features to obtain a C-dimensional vector, inputting the C-dimensional vector into a full-connection network h, predicting the category of the current image, and calculating a classification loss function:
L cls =E x,y [L(h(F(x)),y)],
where x is the input image, y is the class label of x, F (x) represents the feature matrix F q And F s Pooled features, where support and query sets are not distinguished, L is the cross entropy loss function, E x,y Representing the averaging of all possible x, y corresponding losses.
3. Regional correlation enhancement for individual samples
Feature matrix F q And F s Respectively inputting the correlation results into an autocorrelation area calculation model of an area autocorrelation module, calculating the correlation relation of each pixel point relative to surrounding neighbors, adding the correlation results into an original feature matrix as a supplement, and realizing feature enhancement:
F q =F q +SR(F q )
F s =F s +SR(F s )
wherein SR represents an autocorrelation calculation function, including a function for calculating correlation with surrounding neighbors and subsequent per-channel convolution operation, and characteristic information is preserved.
4. Dependencies between support sets
And taking the characteristics obtained by the regional autocorrelation module as the input of the cross-sample correlation module. In the module, the input support set characteristics F are obtained by calculating the similarity s With respect to position, the correlation matrix Cor:
Figure BDA0003575208740000051
where sim (·) denotes a similarity calculation function, Cor(s) i ,s j ) The size is H × W × H × W, where H denotes the height of the feature and W denotes the width of the feature, Cor(s) i ,s j ) Denotes s i ,s j The correlation matrix of the two inputs at spatial locations. Further, an attention matrix A of the ith input relative to the jth input can be obtained ij
Figure BDA0003575208740000052
Where m, n represent the spatial location coordinates of the region block in input i, τ is the temperature coefficient,
Figure BDA0003575208740000053
representing the spatial location coordinates of the region block in input j,
Figure BDA0003575208740000061
representing the spatial location coordinates of the region block in input i. Averaging other images of the support set with respect to the input attention matrix to obtain the attention matrix A of the ith support set image i
Figure BDA0003575208740000062
Attention matrix A i As weights, with the original features F of the supporting set s Multiplication by multiplicationObtaining a feature matrix supporting relevance weighting in the set class
Figure BDA0003575208740000063
Figure BDA0003575208740000064
Further carrying out weighted average on spatial positions to obtain C-dimensional feature vector f of corresponding category of the support set sample s I.e. the support set feature vector enhanced across samples:
Figure BDA0003575208740000065
5. correlations between support and query sets
Handle
Figure BDA0003575208740000066
And feature matrix F of the query set q Inputting the correlation matrix into a cross-sample correlation module, and repeating the steps to obtain a correlation matrix between the query set and the support set:
Figure BDA0003575208740000067
further attention matrix a can be found:
Figure BDA0003575208740000068
where m, n represent spatial position coordinates and τ is the temperature coefficient. This is taken as the original feature F of the weight and query set q Multiplying to obtain a feature matrix of correlation weighting between the query set and the support set
Figure BDA0003575208740000069
Figure BDA00035752087400000610
Obtaining the C-dimensional characteristic vector f of the class after the averaging and pooling operation q I.e. the query set feature vector.
Figure BDA00035752087400000611
Computing a query set feature vector f q And support set feature vector f s The cosine similarity between the two functions, and a correlation loss function is calculated.
Figure BDA00035752087400000612
Wherein cos (·) represents a cosine similarity calculation function, T represents a temperature coefficient, and s 'and q' represent the characteristics of the support set and the query set of any two different categories in the input batch after the correlation is calculated, respectively.
6. Calculating an overall loss function
The overall loss function comprises the two parts, the classification loss function of the step 2 provides a constraint for the training of the feature extractor, the correlation loss function of the step 5 provides a constraint for the enhancement of the region of interest, the two functions are added to realize end-to-end training, and the overall loss function can be defined as:
L total =L cls +γL relation
where γ is the coefficient of the correlation loss function.
7. Obtaining classification results
In the testing stage, cosine similarity between the query set and all support sets is compared, and the category corresponding to the support set with the highest similarity is the prediction category of the query set sample.
The invention provides a method for classifying images in a small sample scene, which comprises the following steps of:
(1) and (3) testing environment:
the system environment is as follows: centos 7;
hardware environment: memory: 64GB, GPU: TITAN XP, hard disk: 2 TB;
(2) experimental data:
the method is carried out on a small sample image classification data set miniImageNet, and considering that the method mainly solves the problem of a support set attention area of more than one sample, when the sample of the support set is more than 1, the method is meaningful only by considering cross-sample correlation among the support sets. In combination with the current mainstream experimental setup, the invention selects a setup of five categories and five samples (5-way-5-shot) in each category, and performs qualitative analysis and quantitative analysis under the setup. In addition, in the comparative experiment, in order to further study the model effect under different sample numbers, the sample number of each category is sequentially tested from 1 to 10.
A subset of the miniImageNet dataset ImageNet, which is much smaller in size and dimension than ImageNet, requires fewer resources for training on this dataset and is often used in the task of small sample classification. There are 100 classes, of which 64 classes are training sets, 16 classes are validation sets, and 20 classes are test sets, each class having 600 images, each image having a size of 84 × 84. In the training stage, after the boundary with the size of 4 is filled around, the boundary is cut into 84 × 84 at random, and the data of color disturbance and horizontal turning are added for enhancing and carrying out regularization operation; the test phase only performs the regularization operation.
In the training, the correlation calculation temperature coefficient was 5, the classification temperature coefficient T was 0.2, the batch size was 128, and the coefficient of the correlation loss function was 0.25. Training optimization using the SGD optimizer, initial learning rate of 0.1, decay with a weight of 0.1 in the 50 th and 70 th training periods, total 80 periods of training, and momentum of 0.9. During testing, 2000 tasks are randomly sampled, each task has 5 categories, each category in the set is supported, 15 samples in each category in the set are inquired, the average classification accuracy of the model is output as a quantitative result, and the regional correlation size across the samples is output as a qualitative result.
(3) Visualization experiment results:
in order to demonstrate the effect of the attention mechanism across samples on the region of interest, the results of the study across sample-related regions are presented by feature activation maps and attention coefficient visualizations. As shown in fig. 2, in each group of samples, the first behavior is the input original image, the second behavior is the visualization result of the feature activation map extracted by the feature extractor, and the third behavior is the visualization result of the attention map obtained after the cross-sample attention module is used. It has been found that the features extracted by the feature extractor do not focus effectively on the target region. After cross-sample correlation operation is used, the network can find the commonalities of the input images, so that the target area is extracted more accurately, and feature enhancement is realized.
Taking the upper half of fig. 2 as an example, when the classified object is "curtain", and the feature activation map is directly selected (the second row), "curtain" is easily regarded as background in the image, and objects such as "person", "instrument", "seat", etc. as foreground are more concerned, thereby causing the classification failure. After the attention module across samples is passed, the obtained attention map focuses on the common features of the pictures, namely the curtain, and the obtained attention map has higher attention degree on the curtain, so that better feature representation is obtained, and the classification accuracy is further improved. In the lower half of fig. 2, the classified object is "lion", and when the feature activation map is directly selected, the region of the "lion" concerned is not complete enough, or the region of the "tree branch" in the image is concerned by mistake. After the attention module of the cross-sample, the lion concerned by the network is more complete, the feature enhancement is realized, and the classification accuracy is improved.
(4) Quantitative experimental results:
in order to quantitatively show the experimental effect, the invention carries out a series of ablation experiments and comparison experiments.
The results of the ablation experiments are shown in table 1. In the table, S2-CA represents the cross-sample attention between the support sets, SQ-CA represents the cross-sample attention between the query set and the support sets, and whether the two modules are used or not is studied. The experiment was performed on a small sample learning data set miniImageNet, and the experimental setup of five samples per category was chosen considering that the correlation between support sets can only be performed when the number of samples is greater than 1. As shown in table 1, cross-sample attention between support sets alone may bring about a 0.51% boost; cross-sample attention between the query set and the support set alone may be boosted by 0.48%; the simultaneous use of two cross-sample attention mechanisms is superior to the results obtained when they are used alone, and will further improve the classification accuracy, bringing about a 0.76% improvement. Ablation experimental results show that cross-sample attention, both between support sets and between query and support sets, carries with it the effect of feature enhancement and that using both simultaneously leads to better results.
TABLE 1 ablation test results
Serial number S2-CA SQ-CA ACC(%)
1 × × 82.10±0.30
2 × 82.61±0.29
3 × 82.58±0.30
4 82.86±0.30
In order to prove the improvement of the cross-sample attention among the support sets on the classification effect of the small sample images, experiments are carried out under different support set sample numbers. The cross-sample correlation relationship between the support sets is not considered in ReNet, so that the ReNet can be used as a comparative experiment for correlation research between the support sets. The experimental results are shown in fig. 3, with the abscissa representing the different numbers (shot) of support sets, and the results are represented by broken line graphs, where the solid line represents the average classification accuracy, the dashed line represents the average loss, red is the effect of the model of the present invention, and blue is the effect of the comparative experiment ReNet. It can be seen from fig. 3 that when the number of shots is greater than 1, the method proposed by the model is always superior to the ReNet model that does not use the attention of the support set across samples. In addition, the larger the number of samples (shot) of the support set is, the higher the average classification accuracy is, and the lower the loss is, because as the number of samples increases, the description of one type is more comprehensive, and the final classification is more favorable, and the correlation calculation between the support sets across the samples is also from the perspective of better describing one class, and a common target region is enhanced by searching a correlation region between the same class.
Based on the same inventive concept, another embodiment of the present invention provides a small sample image classification system based on a correlation region using the above method, which includes a feature extractor, a region autocorrelation module, a cross-sample correlation module, and a classification module;
the feature extractor is used for extracting features from the input query set and the support set;
the regional autocorrelation module is used for enhancing regional correlation of a single sample to obtain autocorrelation enhanced support set characteristics and autocorrelation enhanced query set characteristics;
the cross-sample correlation module is used for calculating the correlation between the support sets and the query sets;
the classification module is used for calculating similarity of the support set features and the query set features output by the cross-sample correlation module, so that classification of the query set is realized.
Based on the same inventive concept, another embodiment of the present invention provides an electronic device (computer, server, smartphone, etc.) comprising a memory storing a computer program configured to be executed by the processor, and a processor, the computer program comprising instructions for performing the steps of the inventive method.
Based on the same inventive concept, another embodiment of the present invention provides a computer-readable storage medium (e.g., ROM/RAM, magnetic disk, optical disk) storing a computer program, which when executed by a computer, performs the steps of the inventive method.
Although the present invention has been described with reference to the above embodiments, it should be understood that various changes, substitutions and alterations can be made herein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (9)

1. A small sample image classification method based on a relevant region is characterized by comprising the following steps:
sequentially inputting the support set into a feature extractor and a regional autocorrelation module to obtain autocorrelation enhanced support set features;
inputting the query set into a feature extractor and a regional autocorrelation module in sequence to obtain autocorrelation enhanced query set features;
inputting the support set characteristics of the autocorrelation enhancement into a cross-sample correlation module for characteristic enhancement to obtain the support set characteristics of the cross-sample enhancement;
and inputting the cross-sample enhanced support set features and the autocorrelation enhanced query set features into a cross-sample correlation module, and calculating similarity of the support set features and the query set features output by the cross-sample correlation module, thereby realizing classification of the query set.
2. The method of claim 1, wherein the feature extractor, the regional autocorrelation module, and the cross-sample correlation module are trained using the following steps:
(1) respectively inputting all images of the support set and the query set into a feature extractor based on a convolutional neural network, and removing the pooling operation of the last layer to obtain image features with reserved position relations;
(2) after pooling operation is carried out on the features extracted by the feature extractor, classification is carried out through a full connection layer, and a classification loss function is calculated;
(3) inputting the image features obtained in the step (1) into a regional autocorrelation module, and enhancing the position features of the image to obtain autocorrelation enhanced support set features and autocorrelation enhanced query set features;
(4) inputting samples of the same category in a support set into a cross-sample correlation module, considering the spatial similarity relation between different samples of the same category, regarding a region with higher similarity as a target region of the category, namely a classification target, and applying the target region as an attention coefficient to all samples of the category, so as to realize feature enhancement and obtain a cross-sample enhanced support set feature;
(5) inputting the cross-sample enhanced support set characteristics obtained in the step (4) and the self-correlation enhanced query set characteristics obtained in the step (3) into a cross-sample correlation module, searching a matched area in the query set sample and the support set sample, and calculating the similarity of the query set characteristics and the support set characteristics to obtain a correlation loss function;
(6) and (3) combining the classification loss function in the step (2) and the correlation loss function in the step (5), performing iterative optimization, and training to obtain parameters of the feature extractor, the region autocorrelation module and the cross-sample correlation module.
3. The method according to claim 1 or 2, wherein the regional autocorrelation module calculates the correlation relationship of each pixel point relative to the surrounding neighbors, and adds the correlation relationship to the feature matrix as a supplement to realize feature enhancement.
4. The method of claim 1 or 2, wherein the cross-sample correlation module obtains the cross-sample enhanced support set features by:
taking the features obtained by the area autocorrelation module as the input of the cross-sample correlation module, and obtaining the input support set features F in the cross-sample correlation module by calculating the similarity s With respect to position, the correlation matrix Cor:
Figure FDA0003575208730000011
where sim (·) denotes a similarity calculation function, Cor(s) i ,s j ) The size is H × W × H × W, which represents s i ,s j A correlation matrix of the two inputs at spatial locations; further obtaining an attention matrix A of the ith input relative to the jth input ij
Figure FDA0003575208730000021
Wherein m, n represent space position coordinates, tau is a temperature coefficient,
Figure FDA0003575208730000022
representing the spatial location coordinates of the region block in input j,
Figure FDA0003575208730000023
representing the spatial position coordinates of the region block in input i; will support set other graphsAveraging the image with respect to the input attention matrix to obtain the attention matrix A of the ith support set image i
Figure FDA0003575208730000024
Attention matrix A i As weights, with the original features F of the supporting set s Multiplying to obtain a feature matrix supporting relevance weighting in the set class
Figure FDA0003575208730000025
Figure FDA0003575208730000026
And further carrying out weighted average on spatial positions to obtain a support set feature enhanced across samples:
Figure FDA0003575208730000027
5. the method of claim 4, wherein the cross-sample relevance module computes the relevance between the support set and the query set using the following steps:
will be provided with
Figure FDA0003575208730000028
And feature matrix F of the query set q Inputting the correlation matrix into a cross-sample correlation module to obtain a correlation matrix between the query set and the support set:
Figure FDA0003575208730000029
then, an attention matrix a is obtained:
Figure FDA00035752087300000210
this is taken as the original feature F of the weight and query set q Multiplying to obtain a feature matrix of correlation weighting between the query set and the support set
Figure FDA00035752087300000211
Figure FDA00035752087300000212
Obtaining a query set feature vector after averaging and pooling operations:
Figure FDA00035752087300000213
computing a query set feature vector f q And support set feature vector f s The cosine similarity between the two functions, and a correlation loss function is calculated.
6. The method of claim 2, wherein the classification loss function and the correlation loss function form an overall loss function, wherein the classification loss function provides a constraint for training of the feature extractor, wherein the correlation loss function provides a constraint for enhancement of the region of interest, and wherein the two loss functions are added to achieve end-to-end training.
7. A small sample image classification system based on a relevant region by adopting the method of any one of claims 1 to 6, which is characterized by comprising a feature extractor, a region autocorrelation module, a cross-sample correlation module and a classification module;
the feature extractor is used for extracting features from the input query set and the support set;
the regional autocorrelation module is used for enhancing regional correlation of a single sample to obtain autocorrelation enhanced support set characteristics and autocorrelation enhanced query set characteristics;
the cross-sample correlation module is used for calculating the correlation between the support sets and the query sets;
the classification module is used for calculating similarity of the support set features and the query set features output by the cross-sample correlation module, so that classification of the query set is realized.
8. An electronic apparatus, comprising a memory and a processor, the memory storing a computer program configured to be executed by the processor, the computer program comprising instructions for performing the method of any of claims 1 to 6.
9. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program which, when executed by a computer, implements the method of any one of claims 1 to 6.
CN202210342541.7A 2022-03-31 2022-03-31 Small sample image classification method and system based on relevant region Pending CN114882267A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210342541.7A CN114882267A (en) 2022-03-31 2022-03-31 Small sample image classification method and system based on relevant region

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210342541.7A CN114882267A (en) 2022-03-31 2022-03-31 Small sample image classification method and system based on relevant region

Publications (1)

Publication Number Publication Date
CN114882267A true CN114882267A (en) 2022-08-09

Family

ID=82668891

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210342541.7A Pending CN114882267A (en) 2022-03-31 2022-03-31 Small sample image classification method and system based on relevant region

Country Status (1)

Country Link
CN (1) CN114882267A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115858930A (en) * 2022-12-09 2023-03-28 贝壳找房(北京)科技有限公司 Code-based information query method, apparatus, medium, and computer program product
CN116071609A (en) * 2023-03-29 2023-05-05 中国科学技术大学 Small sample image classification method based on dynamic self-adaptive extraction of target features
CN116168257A (en) * 2023-04-23 2023-05-26 安徽大学 Small sample image classification method, device and storage medium based on sample generation

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115858930A (en) * 2022-12-09 2023-03-28 贝壳找房(北京)科技有限公司 Code-based information query method, apparatus, medium, and computer program product
CN115858930B (en) * 2022-12-09 2024-02-20 贝壳找房(北京)科技有限公司 Code-based information query method, apparatus, medium, and computer program product
CN116071609A (en) * 2023-03-29 2023-05-05 中国科学技术大学 Small sample image classification method based on dynamic self-adaptive extraction of target features
CN116168257A (en) * 2023-04-23 2023-05-26 安徽大学 Small sample image classification method, device and storage medium based on sample generation

Similar Documents

Publication Publication Date Title
US10650042B2 (en) Image retrieval with deep local feature descriptors and attention-based keypoint descriptors
CN111259850B (en) Pedestrian re-identification method integrating random batch mask and multi-scale representation learning
Richard et al. Weakly supervised action learning with rnn based fine-to-coarse modeling
CN114882267A (en) Small sample image classification method and system based on relevant region
CN108960059A (en) A kind of video actions recognition methods and device
CN111680678B (en) Target area identification method, device, equipment and readable storage medium
KR102140805B1 (en) Neural network learning method and apparatus for object detection of satellite images
CN105243154A (en) Remote sensing image retrieval method and system based on significant point characteristics and spare self-encodings
KR20200010672A (en) Smart merchandise searching method and system using deep learning
CN112529005A (en) Target detection method based on semantic feature consistency supervision pyramid network
CN113361636A (en) Image classification method, system, medium and electronic device
CN118072252B (en) Pedestrian re-recognition model training method suitable for arbitrary multi-mode data combination
CN112150504A (en) Visual tracking method based on attention mechanism
CN114897136A (en) Multi-scale attention mechanism method and module and image processing method and device
CN111915618A (en) Example segmentation algorithm and computing device based on peak response enhancement
CN111985616B (en) Image feature extraction method, image retrieval method, device and equipment
CN118115932A (en) Image regressor training method, related method, device, equipment and medium
CN116935411A (en) Radical-level ancient character recognition method based on character decomposition and reconstruction
Dong et al. Scene-oriented hierarchical classification of blurry and noisy images
CN116630749A (en) Industrial equipment fault detection method, device, equipment and storage medium
CN116543021A (en) Siamese network video single-target tracking method based on feature fusion
CN110472088A (en) A kind of image search method based on sketch
CN114429648B (en) Pedestrian re-identification method and system based on contrast characteristics
CN111061774B (en) Search result accuracy judging method and device, electronic equipment and storage medium
CN117237984B (en) MT leg identification method, system, medium and equipment based on label consistency

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination