CN112861720A

CN112861720A - Remote sensing image small sample target detection method based on prototype convolutional neural network

Info

Publication number: CN112861720A
Application number: CN202110172985.6A
Authority: CN
Inventors: 程塨; 施佩珍; 闫博唯; 姚西文; 韩军伟; 郭雷
Original assignee: Northwestern Polytechnical University
Current assignee: Northwestern Polytechnical University
Priority date: 2021-02-08
Filing date: 2021-02-08
Publication date: 2021-05-28
Anticipated expiration: 2041-02-08
Also published as: CN112861720B

Abstract

The invention provides a remote sensing image small sample target detection method based on a prototype convolutional neural network. A target detection network mainly comprising a feature extraction and class prototype acquisition module, a prototype guide RPN module, a redirection feature graph module and a detector module is constructed, basic learning of the network model is firstly carried out on a base class data set containing a large number of labeled samples, then fine tuning is carried out on the network model on a balanced subdata set, and finally multi-class target detection of the small-sample remote sensing image is realized through post-processing operations such as non-maximum suppression. The method can quickly and accurately detect different types of targets from the optical remote sensing image with the complex background by using a small amount of new-type labeled data, and has higher detection precision and higher detection speed.

Description

Remote sensing image small sample target detection method based on prototype convolutional neural network

Technical Field

The invention belongs to the technical field of remote sensing image processing, and particularly relates to a remote sensing image small sample target detection method based on a prototype convolutional neural network, which can be applied to remote sensing image target detection under the condition of small samples with few labeled sample data.

Background

With the continuous development of satellite remote sensing technology, the acquisition of massive remote sensing images becomes easier. However, the labeling work of the remote sensing image requires a lot of manpower and financial resources. In addition, some categories of targets are rare, and there is a problem that data is difficult to acquire. Therefore, how to use a small amount of labeled samples to realize target detection of remote sensing images becomes one of the problems to be solved urgently at present.

The existing small sample target detection methods can be broadly divided into three categories, namely target detection based on meta learning, target detection based on metric learning and target detection based on fine adjustment. The target detection based on the meta-learning is mainly to construct different tasks through each iteration, so that the network has strong generalization performance, and can adapt quickly when new tasks of new categories are encountered, thereby realizing the target detection task of a small sample. The target detection based on metric learning is mainly realized by learning a metric space, in the space, the closer the targets 'distances' of the same category are, the better, the farther the targets 'distances' of different categories are, the better, the distances are obtained by some metric ways, such as common Euclidean distance, cosine similarity, and the like. The fine-tuning-based target detection mainly comprises the steps of training an initial model in advance, and then constructing a balanced data set to fine tune parameters of the initial model, so that small-sample target detection is realized. However, the remote sensing image has the characteristics of large scale change, dense arrangement and the like, and is influenced by illumination, cloud layers, target forms and complex background environments, so that the remote sensing image and the natural scene image have larger difference. These problems present a significant challenge to the task of target detection of small samples of optically remotely sensed images. The existing small sample target detection method is difficult to be directly applied to remote sensing images.

Disclosure of Invention

In order to overcome the defects of the prior art, the invention provides a remote sensing image small sample target detection method based on a prototype convolutional neural network. Constructing a target detection network mainly comprising a feature extraction and classification prototype acquisition module, a prototype guide RPN module, a redirection feature map module and a detector module, wherein the feature extraction and classification prototype acquisition module respectively extracts the features of a support set image and a query set image by using a parameter-shared deep convolutional neural network, and performs global average pooling operation on the feature map of the support set image to acquire a classification prototype; the prototype-guided RPN module guides the process of the region of interest generated by the region-proposed network by using the acquired type prototype; the redirection feature map module adopts a channel attention mechanism to re-model the channel information of the foreground target candidate frame features; and the detection module detects the characteristics after the re-modeling by using the classification network and the regression network and outputs a detection frame containing the target prediction category and the target prediction position. The method comprises the steps of firstly carrying out network model basic learning on a base class data set, then finely adjusting a network model on a balanced subdata set, wherein the subdata set simultaneously comprises a base class and a new class sample, and finally carrying out post-processing operations such as non-maximum suppression and the like to realize multi-class target detection of the remote sensing image of the small sample. The method can quickly and accurately detect different types of targets from the optical remote sensing image with the complex background by using a small amount of new-type labeled data, and has higher detection precision and higher detection speed.

A remote sensing image small sample target detection method based on a prototype convolutional neural network is characterized by comprising the following steps:

step 1, preparing base class training data: training an image dataset D with a base class_baseThe target instance images of all the categories form a support set image, wherein the target instance image is an imageThe middle target example surrounds the image block in the frame; training an image dataset D from a base class_baseRandomly extracting the whole image containing any type and quantity of target examples as a query set image; then, preprocessing all the support set images and the query set images, wherein the preprocessed images serve as training data; the preprocessing operation comprises the steps of carrying out normalization processing on all images, adjusting the length and width of the images in the support set to be M multiplied by M, and adjusting the length and width of the images in the query set to be M '× M'; the value of M 'is 0.8 to 1.2 times of the size of the base class training image, and the value of M is 0.28 times of M'; the base class training image data set is a remote sensing image target detection DIOR data set;

step 2, constructing a target detection network: the target detection network mainly comprises a feature extraction and classification prototype acquisition module, a prototype guide RPN module, a redirection feature map module and a detector module;

the specific implementation process of the feature extraction and classification prototype acquisition module is as follows: firstly, inputting a support set image and a query set image, respectively extracting the features of the support set image and the query set image by using a feature extraction backbone network B to obtain corresponding feature maps, and recording the feature map of the support set image as { F }_s,1,F_s,2,...,F_s,CF, inquiring a characteristic graph of the set image_qWherein, the feature extraction backbone network B adopts a convolution layer of the first four stages of the ResNet-101 network, F_s,iA feature map representing a support image of the ith category, i being 1,2, …, C being the number of categories; then, feature map { F) of support set image_s,1,F_s,2,...,F_s,CPerforming global average pooling operation to obtain prototype vector p of each category₁,p₂,...,p_C}，p_iA prototype vector representing the ith class, i ═ 1,2, …, C;

the prototype-guided RPN module is used for generating an interested area possibly containing a target, and the specific implementation process comprises the following steps: class prototype vector { p) obtained by feature extraction and class prototype acquisition module₁,p₂,...,p_CThe input is a three-layer full-connection network, and a length and all RPN classifiers are outputRecombining the vectors with the same length after the convolution kernels are unfolded according to the shape which is the same as that of the convolution kernels of the RPN classifier to form a group of new convolution kernel parameters, taking the parameters as the parameters of the auxiliary classifier, respectively carrying out front-background scoring on the anchor points in the feature map of the query set image by using the auxiliary classifier and the RPN classifier, and adding the obtained two scores to serve as the foreground target score of the anchor points; then, determining the label of each anchor point according to the anchor point division rule of the RPN, wherein the label comprises a foreground type, a background type and a neglected sample; ranking the foreground target scores from high to low, and adjusting r anchor points with higher scores by using an RPN regressor to obtain r interested areas;

the redirection feature map module performs feature extraction on the region of interest obtained by the prototype-guided RPN module by using the RoI Align operation of the Mask R-CNN network to obtain a feature map { F) of the region of interest₁,F₂,...,F_rIn which F_iA feature map representing the ith region of interest, i being 1,2, …, r, r being the number of regions of interest; then, multiplying the category prototype vector by the feature map of the region of interest channel by channel to obtain a redirected feature map;

the detector module utilizes a second-stage detector of the Faster R-CNN network to detect the redirection characteristic diagram output by the redirection characteristic diagram module and outputs a detection frame containing the target prediction category and position; wherein, the classification loss of the detector module adopts a cross entropy loss function, and the regression loss of the detector module adopts a SmoothL1 loss function;

step 3, training a target detection network: inputting the preprocessed support set images and query set images obtained in the step 1 into the target detection network constructed in the step 2 for training to obtain the target detection network trained by the base class data set;

step 4, fine tuning training data preparation: first, an image dataset D is trained from the base class_baseRandomly extracting 3N labeled sample images of each category, and combining the labeled sample images with a new training image data set D_novelCombined to form a fine-tuning training image data set D_few(ii) a The new classThe number of the labeled samples contained in the image data set is not more than 30, and N is the number of the labeled samples of each category;

then, the fine-tuning training image data set D_fewSubstitute base class training image dataset D_baseExecuting the processing in the step 1 to obtain fine tuning training data;

step 5, fine adjustment of the target detection network is carried out: inputting the support set images and the query set images of the fine tuning training data obtained in the step 4 into the trained target detection network obtained in the step 3, and training the network again to obtain the target detection network trained by the fine tuning data set;

then, the data set D constructed in step 4 is used_fewInputting each type of labeled sample into the trained feature extraction backbone network B, obtaining feature representative vectors thereof through global average pooling operation, and calculating the average vector of all the feature representative vectors as a prototype vector of the type; so processed, each class gets a prototype vector, get C_fewPersonal prototype vector, C_fewAs a data set D_fewThe number of image categories contained in (a);

step 6, target detection: inputting the preprocessed data to be detected as a query set image into the feature extraction backbone network B trained in the step 5 to obtain the image features of the query set, and inputting the C obtained in the step 5_fewInputting the individual category prototype vector and the image features of the query set into a trained prototype guide RPN module, then obtaining a detection frame containing a target prediction category and a target prediction position through a trained redirection feature map module and a trained detector module, filtering out redundant detection frames by adopting a non-maximum suppression method, and obtaining a final target detection result of the data to be detected through the residual detection frames.

The invention has the beneficial effects that: due to the adoption of the feature extraction and class prototype acquisition module of the shared backbone network, the overfitting problem can be effectively relieved, the memory is saved, and the calculation speed is increased; due to the adoption of the prototype-guided RPN, the quality of the obtained region of interest is higher, and the detection of a subsequent detector is facilitated; the detection precision can be further improved due to the adoption of a processing mode of feature redirection; the base class training data and the fine tuning training data are respectively adopted to train the target detection network in sequence, so that the target detection network can realize a target detection task of a complex background remote sensing image under the condition that only a small number of new class label samples are contained, and the target detection network has high detection precision and good robustness.

Drawings

FIG. 1 is a flow chart of a method for detecting a small sample target of a remote sensing image based on a prototype convolutional neural network;

FIG. 2 is a schematic diagram of a two-stage training of the method of the present invention;

FIG. 3 is an exemplary graph of the test results of the method of the present invention.

Detailed Description

The present invention will be further described with reference to the following drawings and examples, which include, but are not limited to, the following examples.

The hardware environment for implementation is: intel (R) core (TM) i3-8100 CPU computer, 8.0GB memory, the running software environment is: ubuntu16.04.5lts and Pycharm 2018. The experiment used a large-scale public database of remote sensing images, DIOR Dataset, with 23463 images in total, of which 5862 were partitioned into the training set, 5863 were partitioned into the validation set, and the remaining 11738 were partitioned into the test set. Covering 20 common remote sensing image objects. Each category contains approximately 1200 images with an image size of 800 x 800 pixels and a spatial resolution ranging from 0.5 meters per pixel to 30 meters per pixel. In order to verify the effectiveness of the proposed solution, baseball fields, basketball fields, bridges, chimneys, ships, airplanes, airports, highway toll stations, docks, track and field sites, dams, golf courses, oil storage tanks, tennis courts and vehicles are used as the base classes, and highway service areas, overpasses, stadiums, train stations and windmills are used as the new classes. 11725 images of the training set and the verification set are used for training, images containing the new class in the 11725 images are removed according to the division of the base class and the new class, and the residual 8573 images are used as a base class training data set D_base. The number of the labeled samples of the new class is 10 per class, and the labeled samples of the new class are connected with each class in the base classRandomly drawn 10 labeled samples jointly construct a small sample dataset D_few. Finally, the remaining 11738 images were tested as a test set.

As shown in fig. 1, the method for detecting a small sample target of a remote sensing image based on a prototype convolutional neural network of the present invention is implemented as follows:

1. base class training data preparation

First, the image dataset D is trained with the base class_baseAnd on the basis, respectively constructing a support set image and a query set image. Wherein the query set images are directly from D_baseThe method is obtained by random extraction, namely, the whole image containing any type and number of target instances is randomly extracted to be used as a query set image. Support set image training image dataset by base class D_baseThe target instance images are image blocks in a target instance surrounding frame in the image, and the target instance can be obtained by extracting the target instance from the whole remote sensing image by utilizing a real label frame of the image.

Then, respectively preprocessing the support set image and the query set image, including: (1) using the mean value R of three channel components of RGB per image_mean、G_mean、B_meanAnd standard deviation R_std、G_std、B_stdAnd normalizing the image:

wherein, I_p,cRepresents the c-channel component, I 'of the pre-normalized image'_p,cRepresenting the c-channel component, Mean, of the normalized image_cMean, Std, representing the c channel of the image_cRepresenting the standard deviation of the c channel of the image.

(2) Adjusting the length and width of the images in the support set to M multiplied by M and the length and width of the images in the query set to M ' × M ', wherein the value of M ' is D_baseThe size of the original image is 0.8 to 1.2 times, and the value of M is 0.28 times of that of M'.

2. Building an object detection network

The target detection network mainly comprises a feature extraction and classification prototype acquisition module, a prototype guide RPN module, a redirection feature diagram module and a detector module.

(1) Feature extraction and category prototype acquisition module

The invention adopts the convolution layers of the first four stages of the ResNet-101 network as the feature extraction backbone network B. The ResNet-101 network structure is described in the documents "K.He, X.Zhang, S.Ren, and J.Sun," Deep residual learning for image recognition, "in Proceedings of the IEEE conference on computer vision and pattern recognition,2016, pp.770-778".

Firstly, inputting a support set image and a query set image, respectively extracting the features of the support set image and the query set image by using a feature extraction backbone network B to obtain corresponding feature maps, and recording the feature map of the support set image as { F }_s,1,F_s,2,...,F_s,CF, inquiring a characteristic graph of the set image_qWherein F is_s,iA feature map indicating the support image of the ith category, i being 1,2, …, C being the number of categories included in the support set image; then, feature map { F) of support set image_s,1,F_s,2,...,F_s,CPerforming global average pooling operation to obtain prototype vector p of each category₁,p₂,...,p_C}，p_iA prototype vector representing the ith class, i ═ 1,2, …, C.

(2) Prototype-guided RPN module

The prototype-guided RPN module is used for generating a region of interest possibly containing a target, and the specific implementation process is as follows: class prototype vector { p) obtained by feature extraction and class prototype acquisition module₁,p₂,...,p_CInputting the result into a three-layer fully-connected network, outputting a vector with the same length as the unfolded convolution kernels of the RPN classifier, recombining the vector according to the same shape as the convolution kernels of the RPN classifier to form a group of new convolution kernel parameters, taking the parameters as the parameters of an auxiliary classifier, and respectively performing front-background printing on anchor points in a feature map of an image of the query set by using the auxiliary classifier and the RPN classifierDividing, and adding the obtained two fractions to serve as the foreground target score of the anchor point; then, determining the label of each anchor point according to the anchor point division rule of the RPN, wherein the label comprises a foreground type, a background type and a neglected sample; the foreground target scores are sorted from high to low, and r anchor points with higher scores are adjusted by using an RPN regressor to obtain r regions of interest, where r is 256 in this embodiment.

Among them, the RPN method is described in "S.ren, R.Girshick, R.Girshick, and J.Sun," Faster R-CNN: Towards read-Time Object Detection with Region pro-posal Networks, "IEEE Transactions on Pattern Analysis & Machine Analysis, vol.39, No.6, pp.1137-1149,2017".

(3) Redirection feature map module

The redirection feature map module firstly performs feature extraction on the region of interest obtained by the prototype-guided RPN module by using the RoI Align operation proposed by Kaiming He in the Mask R-CNN network proposed in 2017 to obtain a feature map { F) of the region of interest₁,F₂,...,F_rIn which F_iA feature map representing the ith region of interest, i being 1,2, …, r, r being the number of regions of interest; then, multiplying the class prototype vector and the feature map of the region of interest channel by channel to obtain a redirected feature map { F }_1c1,F_1c2,...,F_1c,F_2c1,F_2c2,...,F_2c,...,F_rc1,F_rc2...,F_rcIn which F_i,jAnd the redirection feature map is used for performing channel-by-channel multiplication on the feature map representing the ith interested area and the prototype vector with the category j.

(4) Detector module

Detecting the redirection feature map output by the redirection feature map module by using a second-stage detector of the Faster R-CNN network, and outputting a detection frame containing the target prediction type and position; wherein, the classification loss of the detector module adopts a cross entropy loss function, and the regression loss of the detector module adopts a SmoothL1 loss function.

The Faster R-CNN is described in the documents "S.Ren, R.Girshick, R.Girshick, and J.Sun," Faster R-CNN: Towards read-Time Object Detection with Region pro-posal Networks, "IEEE Transactions on Pattern Analysis & Machine understanding, vol.39, No.6, pp.1137-1149,2017.7".

3. Training target detection network

And (3) inputting the preprocessed support set images and query set images obtained in the step (1) into the target detection network constructed in the step (2) for training to obtain the target detection network trained by the base class data set.

4. Fine-tuning training data preparation

Constructing a data set D simultaneously containing the base class labeled sample and the new class labeled sample_fewNamely: training an image dataset D from a base class_baseRandomly extracting 3N labeled sample images of each category, and combining the labeled sample images with a new training image data set D_novelCombined to form a fine-tuning training image data set D_few(ii) a The new-class image data set contains less labeled samples, the number of labeled samples does not exceed 30, N is the number of labeled samples of each class, and in the embodiment, N is 10.

Then, the fine-tuning training image data set D_fewSubstitute base class training image dataset D_baseAnd (4) executing the processing in the step (1) to obtain fine tuning training data.

5. Fine tuning of target detection networks

And (4) the fine tuning training data obtained through the processing in the step (4) also comprises preprocessed support set images and query set images, the support set images and the query set images are respectively input into the target detection network trained on the base class data set obtained in the step (3), the network is trained again to perform network fine tuning, and the target detection network trained through the fine tuning data set is obtained. As shown in fig. 2.

Then, the data set D_fewInputting each type of labeled sample into the retrained feature extraction backbone network B, obtaining feature representative vectors thereof through global average pooling operation, and calculating the average vector of all the feature representative vectors as a prototype vector of the type; so processed, each class gets a prototype vector, get C_fewPersonal prototype vector, C_fewAs a data set D_fewNumber of image categories contained in (1), C in the present embodiment_fewIs 20.

6. Target detection

And (3) taking the data to be detected preprocessed according to the method in the step (1) as a query set image, sending the query set image into a retrained backbone network B, combining 20 class prototype vectors obtained in the step (5), and guiding an RPN module, a redirection feature map module and a detector module through the retrained prototype to obtain a detection frame for predicting the class and the position of the target in the query set image. And filtering redundant detection frames by adopting a non-maximum value inhibition method, and giving a final detection result of each picture, wherein a score threshold value is set to be 0.3, and an NMS coincidence threshold value is set to be 0.5 to filter the redundant detection frames.

Such NMS methods are described in the literature "A.Neubeck and L.Gool," efficiency Non-Maximum suppression. "18 th International Conference on Pattern Recognition,2006, pp.850-855.

FIG. 3 shows an exemplary graph of the test results obtained by the present invention. Meanwhile, in order to verify the effectiveness of the method, the detection result is measured by selecting the mAP value, the value is between 0 and 1, and the larger the value is, the better the detection effect is. The method of mAP calculation is described in The literature "M.Everingham, SMA.Eslami and L Van Gool," The passive visual object classes challenge: A retroactive, "International journal of computer vision,2015, pp.98-136.

The detection result obtained by the method is compared with a small sample target detection method Meta R-CNN method, and the comparison result is shown in Table 1, so that the method has higher mAP value on both the base class and the new class and has higher detection precision.

TABLE 1

	mAP of base class	mAP of new class
			Meta R-CNN process	52.3％	17.0％
The method of the invention	52.6％	18.6％

Claims

1. A remote sensing image small sample target detection method based on a prototype convolutional neural network is characterized by comprising the following steps:

step 1, preparing base class training data: training an image dataset D with a base class_baseThe target example images of all the categories form a support set image, wherein the target example image is an image block in a target example surrounding frame in the image; training an image dataset D from a base class_baseRandomly extracting the whole image containing any type and quantity of target examples as a query set image; then, preprocessing all the support set images and the query set images, wherein the preprocessed images serve as training data; the preprocessing operation comprises the steps of carrying out normalization processing on all images, adjusting the length and width of the images in the support set to be M multiplied by M, and adjusting the length and width of the images in the query set to be M '× M'; the value of M 'is 0.8 to 1.2 times of the size of the base class training image, and the value of M is 0.28 times of M'; the base class training image data set is a remote sensing image target detection DIOR data set;

the prototype-guided RPN module is used for generating an interested area possibly containing a target, and the specific implementation process comprises the following steps: class prototype vector { p) obtained by feature extraction and class prototype acquisition module₁,p₂,...,p_CInputting the vectors into a three-layer fully-connected network, outputting a vector with the same length as that of all convolution kernels of an RPN classifier after expansion, recombining the vector according to the shape same as that of the convolution kernels of the RPN classifier to form a group of new convolution kernel parameters, taking the parameters as parameters of an auxiliary classifier, respectively carrying out front-background scoring on anchor points in a feature map of an image of a query set by using the auxiliary classifier and the RPN classifier, and adding the obtained two scores to serve as foreground target scores of the anchor points; then, determining the label of each anchor point according to the anchor point division rule of the RPN, wherein the label comprises a foreground type, a background type and a neglected sample; ranking the foreground target scores from high to low, and adjusting r anchor points with higher scores by using an RPN regressor to obtain r interested areas;

step 4, fine tuning training data preparation: first, an image dataset D is trained from the base class_baseRandomly extracting 3N labeled sample images of each category, and combining the labeled sample images with a new training image data set D_novelCombined to form a fine-tuning training image data set D_few(ii) a The number of the labeled samples contained in the new image data set is not more than 30, and N is the number of the labeled samples of each category;

then, the data set D constructed in step 4 is used_fewRespectively inputting each type of labeled sample into the trained feature extraction backbone network B, obtaining feature representative vectors thereof through global average pooling operation, and calculating average vectors of all the feature representative vectorsAs prototype vectors for the class; so processed, each class gets a prototype vector, get C_fewPersonal prototype vector, C_fewAs a data set D_fewThe number of image categories contained in (a);