CN112861720B

CN112861720B - Remote sensing image small sample target detection method based on prototype convolutional neural network

Info

Publication number: CN112861720B
Application number: CN202110172985.6A
Authority: CN
Inventors: 程塨; 施佩珍; 闫博唯; 姚西文; 韩军伟; 郭雷
Original assignee: Northwestern Polytechnical University
Current assignee: Northwestern Polytechnical University
Priority date: 2021-02-08
Filing date: 2021-02-08
Publication date: 2024-05-14
Anticipated expiration: 2041-02-08
Also published as: CN112861720A

Abstract

The invention provides a remote sensing image small sample target detection method based on a prototype convolutional neural network. The method mainly comprises the steps of constructing a target detection network mainly comprising a feature extraction and category prototype acquisition module, a prototype guidance RPN module, a redirection feature map module and a detector module, firstly carrying out network model basic learning on a basic category data set containing a large number of labeling samples, then carrying out fine tuning on a network model on a balanced sub data set, and finally carrying out post-processing operations such as non-maximum suppression and the like to realize multi-category target detection of a small sample remote sensing image. The method can rapidly and accurately detect different types of targets from the optical remote sensing image with the complex background by using a small amount of new labeling data, and has higher detection precision and higher detection speed.

Description

Remote sensing image small sample target detection method based on prototype convolutional neural network

Technical Field

The invention belongs to the technical field of remote sensing image processing, and particularly relates to a remote sensing image small sample target detection method based on a prototype convolutional neural network, which can be applied to remote sensing image target detection under the condition of small samples with few marked sample data.

Background

With the continuous development of satellite remote sensing technology, the acquisition of massive remote sensing images has become easier and easier. However, the labeling of remote sensing images requires a lot of manpower and financial resources. In addition, some kinds of targets are rare, and there is a problem that data is difficult to acquire. Therefore, how to use a small amount of labeling samples to realize target detection of remote sensing images is one of the problems to be solved in the present day.

The existing small sample target detection methods can be summarized into three categories, namely target detection based on meta-learning, target detection based on metric learning and target detection based on fine tuning. The target detection based on meta learning mainly constructs different tasks through each iteration, so that the network has strong generalization performance, and can be quickly adapted when encountering new tasks of new types, thereby realizing the target detection task of a small sample. The object detection based on measurement learning mainly comprises learning a measurement space, wherein the closer the object 'distance' of the same category is, the better the object 'distance' of different categories is, and the 'distance' is obtained through some measurement modes, such as common Euclidean distance, cosine similarity and the like. The target detection based on fine tuning mainly comprises the steps of training an initial model in advance, and then constructing a balanced data set to fine tune parameters of the initial model, so that small sample target detection is realized. However, besides the characteristics of large scale change, dense arrangement and the like, the remote sensing image is also influenced by illumination, cloud layers, target forms and complex background environments, so that the remote sensing image and the natural scene image have larger difference. These problems present a significant challenge to the task of small sample target detection of optical remote sensing images. The existing small sample target detection method is difficult to be directly applied to remote sensing images.

Disclosure of Invention

In order to overcome the defects of the prior art, the invention provides a remote sensing image small sample target detection method based on a prototype convolutional neural network. The method comprises the steps of constructing a target detection network mainly comprising a feature extraction and category prototype acquisition module, a prototype guide RPN module, a redirection feature map module and a detector module, wherein the feature extraction and category prototype acquisition module respectively extracts features of a support set image and a query set image by using a deep convolution neural network shared by parameters, and performing global average pooling operation on feature maps of the support set image to acquire a category prototype; the prototype guiding RPN module guides the process of the region of interest generated by the region suggestion network by using the acquired category prototype; the redirection feature map module adopts a channel attention mechanism to remodel the channel information of the foreground target candidate frame features; and the detection module detects the re-modeled characteristics by using the classification network and the regression network, and outputs a detection frame containing the target prediction category and the target position. According to the invention, network model basic learning is firstly carried out on a basic class data set, then fine adjustment is carried out on a network model on a balanced sub data set, the sub data set simultaneously comprises basic class and new class samples, and finally multi-class target detection of a small sample remote sensing image is realized through post-processing operations such as non-maximum suppression and the like. The method can rapidly and accurately detect different types of targets from the optical remote sensing image with the complex background by using a small amount of new labeling data, and has higher detection precision and higher detection speed.

A remote sensing image small sample target detection method based on a prototype convolutional neural network is characterized by comprising the following steps:

Step 1, basic training data preparation: forming a support set image by using all kinds of target instance images in the base class training image dataset D _base, wherein the target instance images are image blocks in a target instance bounding box in the image; randomly extracting the whole image containing any category and number of target examples from the basic class training image dataset D _base to be used as a query set image; then, preprocessing all the support set images and the query set images, wherein the preprocessed images are used as training data; the preprocessing operation comprises the steps of carrying out normalization processing on all images, adjusting the length and width of the images in the support set to be M multiplied by M, and adjusting the length and width of the images in the query set to be M 'multiplied by M'; the value of M 'is 0.8 to 1.2 times of the size of the basic training image, and the value of M is 0.28 times of the size of M'; the basic training image data set is a remote sensing image target detection DIOR data set;

step 2, constructing a target detection network: the target detection network mainly comprises a feature extraction and category prototype acquisition module, a prototype guidance RPN module, a redirection feature map module and a detector module;

The specific implementation process of the feature extraction and category prototype acquisition module is as follows: firstly, inputting a support set image and a query set image, respectively carrying out feature extraction on the support set image and the query set image by using a feature extraction backbone network B to obtain corresponding feature images, and recording the feature images of the support set image as { F _s,1,F_s,2,...,F_s,C }, wherein the feature extraction backbone network B adopts a convolution layer of the first four stages of ResNet-101 networks, F _s,i represents the feature images of the support images of the ith category, and i=1, 2, …, C and C are category numbers; then, global average pooling operation is carried out on the feature map { F _s,1,F_s,2,...,F_s,C } of the support set image, so that prototype vectors { p ₁,p₂,...,p_C},p_i of each class represent prototype vectors of the ith class, i=1, 2, … and C;

The prototype guided RPN module is used for generating an interested region possibly containing a target, and the specific implementation process is as follows: inputting a class prototype vector { p ₁,p₂,...,p_C } obtained by a feature extraction and class prototype acquisition module into a three-layer fully-connected network, outputting a vector with the same length as that of the vector after all convolution kernels of an RPN classifier are unfolded, recombining the vector according to the shape identical to that of the convolution kernels of the RPN classifier to form a group of new convolution kernel parameters, taking the parameters as parameters of an auxiliary classifier, respectively performing front-background scoring on anchor points in a feature map of an inquiry set image by using the auxiliary classifier and the RPN classifier, and adding the obtained two scores as foreground target scores of the anchor points; then, determining the label of each anchor point according to the anchor point dividing rule of the RPN, wherein the label comprises a foreground category, a background category and an neglected sample; sequencing the foreground target scores from high to low, and adjusting r anchor points with higher scores by using an RPN regression to obtain r regions of interest;

The redirection feature map module performs feature extraction on the region of interest obtained by the prototype guided RPN module by using the RoI Align operation of the Mask R-CNN network to obtain a feature map { F ₁,F₂,...,F_r } of the region of interest, wherein F _i represents the feature map of the ith region of interest, and i=1, 2, …, R and R are the number of the regions of interest; then, multiplying the class prototype vector and the feature map of the region of interest channel by channel to obtain a redirected feature map;

The detector module detects the redirection feature map output by the redirection feature map module by using a second-stage detector of the fast R-CNN network, and outputs a detection frame containing the target prediction category and the position; wherein the classification loss of the detector module adopts a cross entropy loss function, and the regression loss of the detector module adopts a SmoothL loss function;

Step3, training a target detection network: inputting the preprocessed support set image and the preprocessed query set image obtained in the step 1 into the target detection network constructed in the step 2 for training to obtain a target detection network trained by the base class data set;

Step 4, fine tuning training data preparation: firstly, randomly extracting 3N labeling sample images of each category from a basic category training image data set D _base, and combining the labeling sample images with a new category training image data set D _novel to form a fine tuning training image data set D _few; the number of the labeling samples contained in the new class image data set is not more than 30, and N is the number of the labeling samples of each class;

Then, the fine tuning training image data set D _few is used for replacing the base class training image data set D _base to execute the processing in the step 1, so as to obtain fine tuning training data;

step 5, performing target detection network fine adjustment: inputting the support set image and the query set image of the fine tuning training data obtained in the step 4 into the trained target detection network obtained in the step 3, and training the network again to obtain the target detection network trained by the fine tuning data set;

Then, each type of labeling sample in the dataset D _few constructed in the step 4 is respectively input into a trained feature extraction backbone network B, and then feature representative vectors are obtained through global average pooling operation, and average vectors of all feature representative vectors are calculated to be used as prototype vectors of the type; so processed, each class obtains a prototype vector to obtain C _few prototype vectors, C _few being the number of image classes contained in dataset D _few;

Step 6, target detection: inputting the preprocessed data to be detected as an inquiry set image into a feature extraction backbone network B trained in the step 5 to obtain inquiry set image features, inputting C _few category prototype vectors obtained in the step 5 and the inquiry set image features into a trained prototype-guided RPN module, and obtaining a detection frame containing a target prediction category and a position through a trained redirection feature map module and a detector module, wherein a non-maximum suppression method is adopted to filter out redundant detection frames, and the rest detection frames are final target detection results of the data to be detected.

The beneficial effects of the invention are as follows: the feature extraction and category prototype acquisition module of the shared backbone network is adopted, so that the fitting problem can be effectively relieved, the memory is saved, and the calculation speed is improved; the prototype is adopted to guide the RPN, so that the quality of the acquired region of interest is higher, and the detection of a subsequent detector is facilitated; the detection precision can be further improved due to the adoption of a characteristic redirection processing mode; because the basic training data and the fine tuning training data are adopted to train the target detection network in sequence, the target detection network can realize the target detection task of the complex background remote sensing image under the condition that only a small amount of new label samples are contained, and the target detection network has higher detection precision and better robustness.

Drawings

FIG. 1 is a flow chart of a method for detecting a target of a small sample of a remote sensing image based on a prototype convolutional neural network;

FIG. 2 is a two-stage training schematic of the method of the present invention;

FIG. 3 is an exemplary diagram of the detection results of the method of the present invention.

Detailed Description

The invention will be further illustrated with reference to the following figures and examples, which include but are not limited to the following examples.

The hardware environment for implementation is: intel (R) Core (TM) i3-8100 CPU computer, 8.0GB memory, the running software environment is: ubuntu16.04.5lts and Pycharm2018. The experiment uses a large scale remote sensing image public database DIOR Dataset, which has a total of 23463 images, wherein 5862 images are partitioned into a training set, 5863 images are partitioned into a validation set, and the remaining 11738 images are partitioned into a test set. 20 common remote sensing image targets are covered. Each category contains approximately 1200 images, with an image size of 800 x 800 pixels, and a spatial resolution ranging from 0.5 meters/pixel to 30 meters/pixel. In order to verify the effectiveness of the proposed solution described above, baseball fields, basketball courts, bridges, chimneys, ships, airplanes, airports, highway toll booths, wharfs, track and field fields, dams, golf courses, oil storage tanks, tennis courts, and vehicles are taken as basic classes, and highway service areas, overpasses, stadiums, train stations, and windmills are taken as new classes. 11725 images of the training set and the verification set are used for training, images containing new classes in the 11725 images are removed according to the classification of the base class and the new class, and the rest 8573 images are used as a base class training data set D _base. The number of the labeling samples of the new class is 10 per class, and the labeling samples of the new class and 10 labeling samples randomly extracted from each class in the base class are combined to construct a small sample data set D _few. Finally, the remaining 11738 images were used as test sets for testing.

As shown in fig. 1, the specific implementation process of the remote sensing image small sample target detection method based on the prototype convolutional neural network is as follows:

1. base class training data preparation

First, a support set image and a query set image are respectively constructed based on the base class training image dataset D _base. The query set image is directly obtained by randomly extracting from the D _base, namely randomly extracting the whole image containing any category and number of target examples as the query set image. The support set image is composed of all kinds of target instance images in the base class training image dataset D _base, wherein the target instance image is an image block in a target instance bounding box in the image, and the target instance can be obtained by matting out the target instance from the whole remote sensing image by utilizing a real label frame of the image.

Then, preprocessing the support set image and the query set image respectively, including: (1) The image is normalized by using the mean value R _mean、G_mean、B_mean and the standard deviation R _std、G_std、B_std of three channel components of each image RGB:

Where I _p,c represents the c-channel component of the image before normalization, I' _p,c represents the c-channel component of the image after normalization, mean _c represents the Mean of the image c-channels, and Std _c represents the standard deviation of the image c-channels.

(2) The length and width of the images in the support set are adjusted to be M multiplied by M, and the length and width of the images in the query set are adjusted to be M 'multiplied by M', wherein the value of M 'is 0.8 to 1.2 times the original image size in D _base, and the value of M is 0.28 times that of M'.

2. Building a target detection network

The target detection network mainly comprises a feature extraction and category prototype acquisition module, a prototype guidance RPN module, a redirection feature map module and a detector module.

(1) Feature extraction and category prototype acquisition module

The present invention adopts the convolution layers of the first four stages of ResNet-101 networks as features to extract the backbone network B. The ResNet-101 network structure is described in document "K.He,X.Zhang,S.Ren,and J.Sun,"Deep residual learning for image recognition,"in Proceedings of the IEEE conference on computer vision and pattern recognition,2016,pp.770-778.".

Firstly, inputting a support set image and a query set image, respectively extracting features of the support set image and the query set image by using a feature extraction backbone network B to obtain corresponding feature images, and recording the feature images of the support set image as { F _s,1,F_s,2,...,F_s,C }, wherein F _s,i represents the feature images of the support images of the ith category, i=1, 2, …, C and C are the category numbers contained in the support set image; then, global average pooling operation is performed on the feature map { F _s,1,F_s,2,...,F_s,C } of the support set image, so as to obtain a prototype vector { p ₁,p₂,...,p_C},p_i of each class, where i=1, 2, …, and C, and the prototype vector { p ₁,p₂,...,p_C},p_i of each class represents a prototype vector of the i-th class.

(2) Prototype guided RPN module

The prototype guided RPN module is used for generating an interested region possibly containing a target, and the specific implementation process is as follows: inputting a class prototype vector { p ₁,p₂,...,p_C } obtained by a feature extraction and class prototype acquisition module into a three-layer fully-connected network, outputting a vector with the same length as that of the vector after all convolution kernels of an RPN classifier are unfolded, recombining the vector according to the shape identical to that of the convolution kernels of the RPN classifier to form a group of new convolution kernel parameters, taking the parameters as parameters of an auxiliary classifier, respectively performing front-background scoring on anchor points in a feature map of an inquiry set image by using the auxiliary classifier and the RPN classifier, and adding the obtained two scores as foreground target scores of the anchor points; then, determining the label of each anchor point according to the anchor point dividing rule of the RPN, wherein the label comprises a foreground category, a background category and an neglected sample; and sequencing the foreground target scores from high to low, and adjusting r anchor points with higher scores by using an RPN regression to obtain r regions of interest, wherein r is 256 in the embodiment.

Among them, the RPN method is described in literature "S.Ren,R.Girshick,R.Girshick,and J.Sun,"Faster R-CNN:Towards Real-Time Object Detection with Region Proposal Networks,"IEEE Transactions on Pattern Analysis&Machine Intelligence,vol.39,no.6,pp.1137-1149,2017.".

(3) Redirection feature map module

Firstly, a redirection feature map module performs feature extraction on the region of interest obtained by the prototype guided RPN module by utilizing the RoI Align operation proposed in the Mask R-CNN network proposed in 2017 by KAIMING HE to obtain a feature map { F ₁,F₂,...,F_r } of the region of interest, wherein F _i represents the feature map of the ith region of interest, and i=1, 2, …, R and R are the number of the regions of interest; then, the class prototype vector and the feature map of the region of interest are multiplied channel by channel to obtain a redirected feature map {F_1c1,F_1c2,...,F_1c,F_2c1,F_2c2,...,F_2c,...,F_rc1,F_rc2...,F_rc},, where F _i,j represents a redirected feature map in which the feature map of the i-th region of interest is multiplied channel by channel with the prototype vector of class j.

(4) Detector module

Detecting the redirection feature map output by the redirection feature map module by using a second-stage detector of the Faster R-CNN network, and outputting a detection frame containing the target prediction category and the position; wherein the classification loss of the detector module adopts a cross entropy loss function, and the regression loss of the detector module adopts a SmoothL loss function.

Said Faster R-CNN is described in literature "S.Ren,R.Girshick,R.Girshick,and J.Sun,"Faster R-CNN:Towards Real-Time Object Detection with Region Proposal Networks,"IEEE Transactions on Pattern Analysis&Machine Intelligence,vol.39,no.6,pp.1137-1149,2017.7.".

3. Training target detection network

And (3) inputting the preprocessed support set image and the preprocessed query set image obtained in the step (1) into the target detection network constructed in the step (2) for training to obtain the target detection network trained by the base class data set.

4. Fine tuning training data preparation

Constructing a data set D _few containing both a base class annotation sample and a new class annotation sample, namely: randomly extracting 3N labeling sample images of each category from the basic category training image data set D _base, and combining the labeling sample images with the new category training image data set D _novel to form a fine tuning training image data set D _few; the new image dataset comprises a small number of labeling samples, the number of the labeling samples is not more than 30, N is the number of labeling samples of each category, and in this embodiment, n=10.

Then, the fine tuning training image dataset D _few is used to replace the base class training image dataset D _base to perform the processing in step 1, resulting in fine tuning training data.

5. Fine tuning of a target detection network

The fine tuning training data obtained through the processing in the step 4 also comprises a preprocessed support set image and a preprocessed query set image, the preprocessed support set image and the preprocessed query set image are respectively input into the target detection network trained on the basic class data set obtained in the step 3, and the network is trained again to perform network fine tuning, so that the target detection network trained by the fine tuning data set is obtained. As shown in fig. 2.

Then, each type of labeling sample in the dataset D _few is respectively input into the feature extraction backbone network B after retraining, and then the feature representative vector is obtained through global average pooling operation, and the average vector of all feature representative vectors is calculated to be used as the prototype vector of the category; thus, each class obtains a prototype vector, resulting in C _few prototype vectors, C _few being the number of image classes contained in dataset D _few, C _few being 20 in this embodiment.

6. Target detection

And (3) sending the data to be detected preprocessed according to the method in the step (1) into a retrained backbone network B as an inquiry set image, and combining the 20 category prototype vectors obtained in the step (5), and guiding an RPN module, a redirection feature map module and a detector module through the retrained prototypes to obtain a detection frame for predicting the category and the position of the target in the inquiry set image. Filtering redundant detection frames by adopting a non-maximum suppression method, and giving a final detection result of each picture, wherein the score threshold is set to be 0.3, and the redundant detection frames are filtered by adopting the NMS coincidence threshold to be 0.5.

The NMS method is described in literature "A.Neubeck and L.Gool,"Efficient Non-Maximum Suppression."18th International Conference on Pattern Recognition,2006,pp.850-855.".

FIG. 3 shows an example of the detection results obtained by the present invention. Meanwhile, in order to verify the effectiveness of the method, mAP values are selected to measure the detection results, the values are between 0 and 1, and the larger the values are, the better the detection effect is. The calculation method of mAP is described in literature "M.Everingham,SMA.Eslami and L Van Gool,"The pascal visual object classes challenge:A retrospective."International journal of computer vision,2015,pp.98-136.".

Comparing the detection result obtained by the method with a small sample target detection method Meta R-CNN method, the comparison result is shown in table 1, and the method has higher mAP value on the basis and the new class and higher detection precision.

TABLE 1

	Basic mAP	New class of mAP
			Meta R-CNN process	52.3％	17.0％
The method of the invention	52.6％	18.6％

Claims

1. A remote sensing image small sample target detection method based on a prototype convolutional neural network is characterized by comprising the following steps: