CN115272692A

CN115272692A - Small sample image classification method and system based on feature pyramid and feature fusion

Info

Publication number: CN115272692A
Application number: CN202210733595.6A
Authority: CN
Inventors: 王先知; 许洁斌; 艾浩然
Original assignee: South China University of Technology SCUT
Current assignee: South China University of Technology SCUT
Priority date: 2022-06-27
Filing date: 2022-06-27
Publication date: 2022-11-01

Abstract

The invention discloses a small sample image classification method based on feature pyramid and feature fusion, which comprises the following steps: s1, constructing a characteristic pyramid relation network model, wherein the model comprises a characteristic extraction module, a relation module and a characteristic fusion module; s2, expanding the data set, and dividing the data set into a training set, a verification set and a test set; s3, training a model, namely sampling a support set and a query set from a training set; s4, inputting a support set image and a query set image, extracting the features of the images by a feature extraction module, outputting the feature vectors of the images, and fusing the feature vectors by a feature fusion module; s5, inputting the feature vectors into a relation module, outputting the similarity values of the support set image and the query set image by the relation module, and processing all the similarity values to obtain final similarity values; s6, calculating the loss of the model, updating parameters of the model, and repeating iterative training until the error value of the loss tends to be stable; and S7, storing the trained model, and using the model for small sample image classification testing.

Description

Small sample image classification method and system based on feature pyramid and feature fusion

Technical Field

The invention relates to the field of small sample learning and meta-learning, in particular to a small sample image classification method and system based on a feature pyramid and feature fusion.

Background

The deep neural network model usually needs a large number of training samples with labeled data to achieve a better training effect. In reality, a large amount of manpower and material resources are often consumed for labeling samples, or sample data which can be used in some cases is few, and at this time, if a small amount of samples are directly used for training, an overfitting problem is generated, and small sample learning is generated for solving the problem.

A basic model for small sample learning is defined as p = C (f (x | θ) | w), where the feature extractor may be denoted as f, the classifier may be denoted as C, x denotes an input image to be recognized, θ denotes a parameter of the feature extractor f, w denotes a parameter of the classifier C, and p denotes a prediction result output by the model. In the process of small sample learning, due to the fact that the number of samples is small, overfitting of model parameters theta and w can be caused by direct training, and accuracy is reduced on a target task.

Defining a training set of similar previous tasks by large amounts of available data as D_baseDefining a small sample learning data set containing the target detection task as D_novel. Model is at D_baseThe training is carried out, and a better parameter theta and w are learned. Model at D with initialization parameters already obtained_novelTraining to obtain new model parameter theta₁And w₁Updating the original parameters, and the updated new model p = C (f (x | θ))₁)|w₁) Can accurately finish image segmentationAnd (4) class tasks.

Around the core problem of small sample number, the existing small sample learning strategies are mainly solved by a data enhancement-based method, a metric learning-based method, a model-based method and a parameter optimization-based method. The metric learning is also called similarity learning, and the task of the metric learning is to learn a pair of similarity metrics S (-) and similar samples have higher similarity scores and dissimilar samples have lower similarity scores. The S can be a distance measurement without learning or a neural network capable of learning, and the similarity score output by the S can be used for inquiring the sample classification of the test set. However, in the existing small sample learning based on metric learning, only the similarity and distance measurement between the final outputs of the models are concerned, the similarity and measurement of the middle layer of the model network are not concerned, the identification accuracy is low, and the final classification accuracy of the models is influenced.

Disclosure of Invention

The invention aims to overcome the problem of low accuracy of small sample image classification in the prior art, and provides a small sample image classification method based on a feature pyramid and feature fusion.

In order to achieve the purpose, the technical scheme adopted by the invention is as follows:

a small sample image classification method based on feature pyramid and feature fusion comprises the following steps:

s1, constructing a characteristic pyramid relation network model of a plurality of layers of neural networks, wherein each layer of neural network comprises a characteristic extraction module, a relation module and a characteristic fusion module;

s2, acquiring a data set, expanding the data set, and dividing the expanded data set into a training set, a verification set and a test set;

s3, training a characteristic pyramid relation network model by adopting a C-way K-shot mode, and sampling a support set and a query set from a training set respectively in each training;

s4, inputting a support set image and a query set image, extracting the features of the images by a feature extraction module, outputting the feature vectors of the images, and fusing the feature vectors of the support set image and the query set image by a feature fusion module;

s5, inputting the fused feature vectors into a relation module, outputting the similarity values of the support set image and the query set image by the relation module, and processing the similarity values output by all the relation modules to obtain a final similarity value;

s6, calculating the loss of the characteristic pyramid relation network model, updating parameters of the characteristic pyramid relation network model, and repeating iterative training until the loss error value tends to be stable;

and S7, storing the trained characteristic pyramid relation network model, and using the characteristic pyramid relation network model for small sample image classification testing.

Further, the feature fusion module comprises a feature fusion item, and the feature fusion item is:

C′(F_S,F_Q)＝Concate(F_S,F_Q,Mul(F_S,F_Q))

in the formula, F_SFeature vectors representing images of a query set, F_QThe feature vector of the support set image is represented, concate (·,) represents the operation of splicing in the feature channel, and Mul (·,) operation represents the multiplication of the feature graph according to the position corresponding elements.

Further, in step S6, the similarity score of a group of images is regarded as a regression task, and a mean square error MSE function is used as a loss function of each layer of neural network, where the mean square error MSE function is:

MSE(r,y_S,y_Q)＝(r-1(y_S＝＝y_Q))²

where r represents the similarity score of each layer of neural network output, y_SLabel representing support set image, y_QA label representing a query set image.

Further, in step S6, the loss of the feature pyramid relationship network model is calculated by using a loss function, where the loss function is:

in the formula, r_lSimilarity score, y, representing the output of the l-th neural network_SLabel representing the support set image, y_QLabels representing the query set images, MSE representing the mean square error function, and n representing the number of layers of the neural network.

Further, in step S2, the acquired data set is expanded by rotation at 90 degrees, 180 degrees, and 270 degrees.

Further, in the feature pyramid relationship network model, the last activation function of the full connection layer of the relationship module uses a Sigmoid function, and all other activation functions use a ReLU function.

A small sample image classification system based on feature pyramid and feature fusion comprises:

the characteristic extraction module is used for extracting the characteristics of the input image;

the characteristic fusion module is used for fusing the characteristics of the input image;

and the relation module is used for judging the similarity of the input image characteristics of the support set and the image characteristics of the query set.

Further, the feature extraction module comprises four volume blocks and two maximum pooling layers of 2*2, and the volume blocks, the maximum pooling layers, the volume blocks and the convolution blocks are connected in sequence.

Further, the relation module comprises two volume blocks, two maximum pooling layers of 2*2, a ReLU full-connection layer and a Sigmoid full-connection layer, and the volume blocks, the maximum pooling layers, the ReLU full-connection layer and the Sigmoid full-connection layer are sequentially connected.

Further, the convolution block comprises a convolution layer, a Batch Norm layer and a ReLU activation function layer, the convolution kernel size of the convolution layer is 3*3, and the number of output channels is 64.

Compared with the prior art, the method improves the classification precision of the small sample images by constructing the characteristic pyramid relation network (FPRN) model, and can still quickly obtain the detection result through the characteristic pyramid relation network (FPRN) model due to the small self-weight of the characteristic pyramid relation network (FPRN) model, so that the accuracy is high.

Drawings

FIG. 1 is a schematic diagram of a feature pyramid relationship network FPRN.

Fig. 2 is a schematic structural diagram of the feature extraction module.

Fig. 3 is a schematic structural diagram of the relationship module.

Fig. 4 is a schematic diagram of the structure of the convolution Block Conv Block.

Detailed Description

The method and system for classifying small sample images based on feature pyramid and feature fusion of the present invention will be further described with reference to the accompanying drawings and specific embodiments.

The invention discloses a small sample image classification method based on a feature pyramid and feature fusion, which comprises the following steps of;

s1, constructing a characteristic pyramid relation network model of a plurality of layers of neural networks, wherein each layer of neural network comprises a characteristic extraction module, a relation module and a characteristic fusion module.

And S2, acquiring a data set, expanding the data set, and dividing the expanded data set into a training set, a verification set and a test set.

And S3, training the characteristic pyramid relation network model by adopting a C-way K-shot mode, and sampling a support set and a query set from the training set respectively in each training.

And S4, inputting the support set image and the query set image, extracting the features of the images by the feature extraction module, outputting the feature vectors of the images, and fusing the feature vectors of the support set image and the query set image by the feature fusion module.

And S5, inputting the fused feature vectors into a relation module, outputting the similarity values of the support set image and the query set image by the relation module, and processing the similarity values output by all the relation modules to obtain the final similarity value.

And S6, calculating the loss of the characteristic pyramid relation network model, updating the parameters of the characteristic pyramid relation network model, and repeating iterative training until the loss error value tends to be stable.

And S7, storing the trained feature pyramid relationship network model, and using the feature pyramid relationship network model for small sample image classification testing.

Referring to fig. 1, the invention also discloses a small sample image classification system based on feature pyramid and feature fusion, which comprises a feature extraction module, a feature fusion module and a relation module, wherein the feature extraction module is used for extracting features of input images, the feature fusion module is used for fusing the features of the input images, and the relation module is used for judging the similarity of the features of the input support set images and the features of the query set images.

Specifically, in the neural network, the deeper the neural network hierarchy, the larger the receptive field, the more attention is paid to the overall features of the image, and the shallower the neural network hierarchy, the smaller the receptive field, the more attention is paid to the local features of the image. For example, animals are classified, deep neural networks can distinguish specific animal species characteristics, and shallow neural networks can extract hair characteristics, background texture characteristics and the like, so that the characteristics of the shallow neural networks can also be used for helping to distinguish animal species. Based on this, the invention provides a Feature Pyramid Relationship Network (FPRN) model.

The feature pyramid relational network model has a plurality of layers of neural networks, each layer of neural network including a Feature Extraction Module (FEM), a feature fusion module, and a Relational Module (RM). The feature extraction module is used for extracting features of the input images, the feature fusion module is used for fusing the features of the input images, and the relation module is used for judging the similarity of the input image features of the support set and the input image features of the query set.

As shown in fig. 2, the feature extraction module includes four convolution blocks and two maximum pooling layers 2*2, in the feature extraction module, the convolution blocks, the maximum pooling layers, the convolution blocks, and the convolution blocks are connected in sequence, and each convolution block outputs two feature vectors in a group for feature fusion.

As shown in fig. 3, the relationship module includes two volume blocks, two maximum pooling layers of 2*2, a ReLU full-connected layer, and a Sigmoid full-connected layer, and in the relationship module, the volume blocks, the maximum pooling layers, the ReLU full-connected layer, and the Sigmoid full-connected layer are connected in sequence. The input of the relation module is the feature fused by the feature fusion module, and the output of the relation module is a similarity score used for judging the similarity of the input image feature of the support set and the image feature of the query set.

As shown in FIG. 4, the convolution block consists of a convolution layer with a convolution kernel size of 3*3 and a number of output channels of 64, a Batch Norm layer, and a ReLU activation function layer.

In the feature pyramid relational network model, all other activation functions are ReLU functions except that the activation function of the last full-connection layer of the relational module is a Sigmoid function. The last output of the relationship module uses the Sigmoid function because the present invention expects to output a similarity score between 0 and 1.

The feature fusion module comprises feature fusion items, and the feature fusion items are as follows:

C′(F_S,F_Q)＝Concate(F_S,F_Q,Mul(F_S,F_Q))

in the formula, F_SRepresenting feature vectors, F, produced by convolved blocks of the query set image_QRepresenting a feature vector generated by a convolution block of a support set image, concatee (·,) representing the operation of splicing in a feature channel, and Mul (·,) operation representing the multiplication of a feature graph according to position corresponding elements.

The acquired data set is expanded by rotation through 90 degrees, 180 degrees and 270 degrees.

For the C-way 1-shot problem, feature fusion can be directly carried out on the support set feature graph and the query set feature graph extracted by the feature extraction module. For the C-way K-shot (K > 1) problem, the feature graphs extracted by the feature extraction module from a plurality of images of the support set are added element by element according to corresponding positions, feature fusion is carried out on the feature graphs and the feature graphs of the query set, and then the similarity scores are calculated in the relation module.

In the training stage, C categories are randomly extracted from a training set, and K samples are extracted from each category to serve as a support set of a characteristic pyramid relation network (FPRN) model; and extracting a batch of samples from the rest data in the C categories as a query set. A Feature Pyramid Relationship Network (FPRN) model is expected to learn from training the ability to distinguish the C classes from the C × K samples, and each training run samples different classes.

And fusing the characteristic vectors obtained from the support set image and the query set image at different network depths together through a characteristic fusion item. In the feature fusion term proposed by the invention, through Mul (F)_S,F_Q) Can be introduced of F_SAnd F_QThe interesting area is more prominent, and the similarity can be judged by the characteristic pyramid relation network.

The relation module outputs a similarity score from 0 to 1, and the similarity scores output by the relation modules of all the neural networks are weighted and averaged to obtain the final similarity score of the Feature Pyramid Relation Network (FPRN) model.

Taking the similarity score of a group of images of the support set image and the query set image as a regression task, and taking a Mean Square Error (MSE) function as a loss function of each layer of neural network, wherein the MSE function is as follows:

MSE(r,y_s,y_Q)＝(r-1(y_S＝＝y_Q))²

where r represents the similarity score of each layer of neural network output, y_SLabel representing the support set image, y_QA label representing a query set image. When the labels are the same, (y)_S＝＝y_Q) Has a value of 1, and when the labels are different, (y)_S＝＝y_Q) The value of (d) is 0.

In the Feature Pyramid Relationship Network (FPRN) model, the relationship module of each layer of neural network outputs a similarity score, so the total loss function of the Feature Pyramid Relationship Network (FPRN) model is:

in the formula, r_lSimilarity score, y, representing the output of the l-th layer_SLabel representing the support set image, y_QLabels representing the query set images, MSE represents the mean square error function.

And calculating the loss of the characteristic pyramid relation network (FPRN) model through a loss function, and reversely propagating and updating the parameters of the characteristic pyramid relation network (FPRN) model. And repeating the iterative training until the error value of the loss calculated by the loss function tends to be stable.

And storing the trained characteristic pyramid relation network model, and using the characteristic pyramid relation network model for small sample image classification testing. The small sample image classification method based on the feature pyramid and the feature fusion provided by the invention obtains a better detection effect on two public data sets.

The omniroot dataset contains 1623 character classes in 50 different languages, each character class containing 20 samples written by different people. In the training process, each training of the 20-way 1-shot is composed of 1 support set image and 10 query set images in each category, and each training of the 20-way 5-shot is composed of 5 support set images and 5 query set images in each category. In the testing process, the classification result of a characteristic pyramid relation network (FPRN) model is evaluated by randomly sampling 1000 times in a testing set, wherein 1-shot samples 1 testing set image every time, and 5-shots samples 5 testing set images every time.

The miniImagenet dataset consists of a total of 60000 color images in 100 categories, each category consisting of 600 samples, 64 categories for training, 16 categories for verification, and 20 categories for testing. On the miniImagenet data set, the invention adopts the setting of 5-way 1-shot and 5-way 5-shot. In the training process, each training of the 5-way 1-shot is composed of 1 support set image and 15 query set images in each category, and each training of the 5-way 5-shot is composed of 5 support set images and 10 query set images in each category. In the testing process, the invention randomly samples 600 times in the test to evaluate the classification result of a Feature Pyramid Relation Network (FPRN) model, wherein in the setting of 5-way 1-shot and 5-way 5-shot, 15 test set images are sampled each time.

The present invention compares the results of the image classification methods of the Feature Pyramid Relationship Network (FPRN) model with other popular metric learning-based small sample learning models. The models used for comparison mainly include twin networks (Siamese networks), prototype networks (prototypes networks), matching networks (Matching networks) and relationship networks (relationship networks). The results of comparing the feature pyramid relational network model (FPRN) to these model benchmarks on the Omniglo dataset are shown in table 1.

Table 1 omniroot dataset experimental results

The results of comparing model references of the Feature Pyramid Relationship Network (FPRN) model with the twin Network (siame Network), the Prototype Network (proto type Network), the Matching Network (Matching networks) and the relationship Network (relationship Network) on the miniimage data set are shown in table 2.

TABLE 2 miniImagenet data set Experimental results

As shown in tables 1 and 2, experimental data show that the Feature Pyramid Relationship Network (FPRN) model provided by the present invention achieves the highest determination accuracy in each experiment. On the Ominigilot data set, the characteristic pyramid relational network (FPRN) model provided by the invention can achieve 98.3% of classification accuracy in the setting of 20-way 1-shot and can achieve 99.2% of classification accuracy in the setting of 20-way 5-shot. On a miniImagenet data set, the characteristic pyramid relation network (FPRN) model provided by the invention can reach 50.2% of classification accuracy in the setting of 5-way 1-shot and can reach 66.7% of classification accuracy in the setting of 5-way 5-shot.

The invention compares the detection speed of a relation network model and a characteristic pyramid relation network (FPRN) model in 5-way 1-shot setting on a miniImagenet data set. The video card used in the experiment is an NVIDIA Quadro P2000 video card, the judging speed of the relation network is 17.1fps, the judging speed of the characteristic pyramid relation network model (FPRN) is 16.3fps, and the judging speed of the characteristic pyramid relation network model (FPRN) is 4.7% slower than that of the relation network model. In the experimental setup, the detection accuracy of the Feature Pyramid Relationship Network (FPRN) model is 50.2%, the detection accuracy of the relationship network model is 47.3%, the absolute value of the detection accuracy of the Feature Pyramid Relationship Network (FPRN) model is improved by 2.9% compared with the absolute value of the detection accuracy of the relationship network model, and the accuracy is improved by 6.1% according to percentage calculation. The Feature Pyramid Relationship Network (FPRN) model achieves a 6.1% percentage accuracy improvement at the sacrifice of 4.7% detection time.

In conclusion, the accuracy of small sample image classification is improved by constructing the characteristic pyramid relation network (FPRN) model, and the detection result can be still quickly obtained through the characteristic pyramid relation network (FPRN) model due to the fact that the quantity of the characteristic pyramid relation network (FPRN) model is small, and the accuracy is high.

The above description is intended to describe in detail the preferred embodiments of the present invention, but the embodiments are not intended to limit the scope of the claims of the present invention, and all equivalent changes and modifications made within the technical spirit of the present invention should fall within the scope of the claims of the present invention.

Claims

1. A small sample image classification method based on feature pyramid and feature fusion is characterized by comprising the following steps:

2. The feature pyramid and feature fusion based small sample image classification method of claim 1, characterized in that the feature fusion module includes feature fusion terms, the feature fusion terms are:

C′(F_S，F_Q)＝Concate(F_S，F_Q，Mul(F_S，F_Q))

3. The small sample image classification method based on feature pyramid and feature fusion as claimed in claim 1, characterized in that in step S6, the similarity scores of a group of images are regarded as a regression task, and a mean square error MSE function is used as a loss function of each layer of neural network, and the mean square error MSE function is:

MSE(r，y_S，y_Q)＝(r-1(y_S＝＝y_Q))²

4. The small sample image classification method based on feature pyramid and feature fusion as claimed in claim 3, characterized in that in step S6, the loss of the feature pyramid relation network model is calculated by using a loss function, and the loss function is:

5. The method for classifying small sample images based on feature pyramid and feature fusion as claimed in claim 1, wherein in step S2, the acquired data set is expanded by rotation at 90 degrees, 180 degrees and 270 degrees.

6. The method for classifying small sample images based on feature pyramid and feature fusion as claimed in claim 1, wherein in the feature pyramid relational network model, the activation function of the last full-connection layer of the relational module uses Sigmoid function, and all other activation functions use ReLU function.

7. A small sample image classification system based on feature pyramid and feature fusion is characterized by comprising:

8. The small sample image classification system based on feature pyramid and feature fusion of claim 7, characterized in that the feature extraction module comprises four volume blocks and two maximum pooling layers of 2*2, and the volume block, the maximum pooling layer, the volume block, and the convolution block are connected in sequence.

9. The feature pyramid and feature fusion based small sample image classification system of claim 7, wherein the relationship module comprises two volume blocks, two maximum pooling layers of 2*2, a ReLU full-connected layer, and a Sigmoid full-connected layer, and the volume blocks, the maximum pooling layers, the ReLU full-connected layer, and the Sigmoid full-connected layer are connected in sequence.

10. The feature pyramid and feature fusion based small sample image classification system of claim 8 or 9, where the convolution block comprises a convolution layer with a convolution kernel size of 3*3 and a number of output channels of 64, a Batch Norm layer and a ReLU activation function layer.