CN111046756A

CN111046756A - Convolutional neural network detection method for high-resolution remote sensing image target scale features

Info

Publication number: CN111046756A
Application number: CN201911179838.0A
Authority: CN
Inventors: 王密; 董志鹏; 杨芳; 刘思远
Original assignee: Wuhan University WHU
Current assignee: Wuhan University WHU
Priority date: 2019-11-27
Filing date: 2019-11-27
Publication date: 2020-04-21

Abstract

The present invention provides a convolutional neural network detection method for target scale features of high-resolution remote sensing images, comprising: establishing a large-scale high-resolution remote sensing image multi-target detection data set including different radiances and different scales; Statistical analysis of each target scale range separately, and statistical analysis of all targets together, to obtain the optimal distribution range of each target scale in the high-resolution remote sensing image; The optimal scale size of the high-resolution remote sensing image target detection proposal frame; according to the optimal scale size of the target proposal frame, a convolutional neural network architecture suitable for high-resolution remote sensing image target detection is set to achieve target detection in images. The method of the invention can obtain high-precision high-resolution remote sensing image target detection results, and the method has the characteristics of simplicity, reliability, high precision and easy implementation.

Description

Convolutional neural network detection method for high-resolution remote sensing image target scale features

Technical Field

The invention belongs to the field of remote sensing image processing and information extraction, and particularly relates to a target detection method for realizing an optimal scale suggestion box for a high-resolution remote sensing image target.

Background

With the development of ground-to-ground observation technology, the data acquisition amount of high-resolution remote sensing images is increasing, and the high-resolution remote sensing images are widely used in the aspects of urban planning, disaster monitoring, agricultural management, military reconnaissance and the like. Under the condition of big data, how to automatically and intelligently realize the target detection of the high-resolution remote sensing image has important influence on the exertion of the application value of the high-resolution remote sensing image. For this reason, a lot of research has been carried out by scholars at home and abroad, many of the research methods mainly use artificially designed image target features for target detection, such as features of histogram of gradient (HOG), Local Binary Patterns (LBP), scale-invariant feature transform (SIFT), Gabor, and the like, and then input the features into a conventional classifier, such as a Support Vector Machine (SVM), AdaBoost, a decision tree, and the like, in the form of feature quantities, so as to perform classification, thereby obtaining a better effect in a specific target detection task. However, due to the complex and changeable shooting conditions of the remote sensing satellite, the traditional target detection algorithm is difficult to adapt to remote sensing images under different conditions, and the robustness and universality of the algorithm are poor.

In recent years, a Convolutional Neural Network (CNN) is used as the hottest deep learning model algorithm, and since the target features do not need to be artificially designed, the effective feature extraction and learning can be automatically performed according to mass data and labels; in addition, under the condition that the training data is sufficient, the model has good generalization capability, and can still keep good robustness under the complicated and changeable conditions. Therefore, the convolutional neural network model has been widely applied to the field of image target detection. Currently, the conventional convolutional neural network target detection architectures include fast-regional CNN (fast-RCNN), Young Only Look One (YOLO), Single-shot multi-box Detector (SSD), and the like, and the convolutional neural network target detection architectures are all designed for the target scale of a natural image, and all achieve a better target detection result in the target detection of the natural image. The high-resolution remote sensing satellite generally images the earth surface in a near-earth orbit, and is influenced by illumination, meteorological conditions and the like in the imaging process, and the generated remote sensing images have the characteristics of complex image content, small target scale range, large radiation difference of the remote sensing images generated in different time periods and the like. Compared with a natural image, the high-resolution remote sensing image has the characteristics of more complex background, smaller target area range, larger scale change of similar targets and the like. Therefore, the existing target detection frameworks such as fast-RCNN, YOLO and SSD can not effectively couple the target scale characteristics of the high-resolution remote sensing image, and the high-precision high-resolution remote sensing image target detection result is difficult to obtain. A research team of the university of Wuhan submits a paper 'scale feature convolutional neural network identification method of a remote sensing image target' in the future, and a convolutional neural network detection and identification method based on target scale features is provided aiming at the problem that the robustness and universality of artificially designed features in the traditional image target detection and identification are poor. However, in the paper, statistical analysis is only roughly performed on all target scales in the data set, and a reasonable optimal suggested box scale acquisition technical means is lacked, so that the detection effect is influenced.

Aiming at the problems, the invention provides improvement and provides a novel convolutional neural network detection method for the target scale characteristics of the high-resolution remote sensing image.

Disclosure of Invention

The invention provides a novel convolutional neural network detection method based on remote sensing image target scale characteristics, aiming at the problem of how to obtain a high-resolution remote sensing image target detection optimal scale suggestion frame.

The technical scheme provided by the invention is a convolutional neural network detection method for high-resolution remote sensing image target scale characteristics, which comprises the following steps:

step 1, establishing a large-scale high-resolution remote sensing image multi-target detection data set containing different radiances and different scales, respectively carrying out statistical analysis on the scale range of each target in the data set, and carrying out statistical analysis on all targets together to obtain the optimal distribution range of each target scale in the high-resolution remote sensing image;

step 2, obtaining the optimal dimension of the target detection suggestion frame of the high-resolution remote sensing image according to the area covered by the target dimension distribution range in the high-resolution remote sensing image;

and 3, setting a convolutional neural network architecture suitable for high-resolution remote sensing image target detection according to the optimal size of the target suggestion frame, and realizing target detection in the image.

Furthermore, in step 3, the convolutional neural network architecture comprises a target region suggestion network RPN and a target classification and accurate positioning network CNN,

the target area suggestion network RPN generates various target candidate areas at each position of the characteristic diagram, and transmits the information of the target candidate areas to the CNN;

the target classification and accurate positioning network CNN uses five layers of convolution layers to extract image target characteristic graphs, and combines target candidate region information and the last layer of characteristic graph to obtain characteristic vectors of target candidate regions; then, the feature vector of the target candidate region is transmitted to the region of interest pooling layer to obtain the feature vector of the target candidate region with the specified size; and finally, the feature vectors with the specified sizes are transmitted to a full connection layer for training and testing of target recognition classification and regional coordinate regression.

Moreover, the size of the region-of-interest pooling layer is 7 × 7, and both of the two fully-connected layers behind the region-of-interest pooling layer contain 4096 neurons; then a full connection layer corresponding to the target classification layer comprises n neurons, a target coordinate regression layer comprises 4n neurons, n is the classification number of the target in the image, and 4n represents the coordinate coefficients of n types of targets x, y, w and h corresponding to the target area; wherein x is the coordinate of the horizontal axis of the central point, y is the coordinate of the vertical axis of the central point, w is the width of the target, and h is the height of the target.

The invention provides a convolutional neural network detection method for the scale characteristics of a high-resolution remote sensing image target, which can be used for well coupling the scale characteristics of the high-resolution remote sensing image target and obtaining a high-precision remote sensing image target detection result.

Drawings

Fig. 1 is a graph of a statistical result of target dimensions of a high-resolution remote sensing image according to an embodiment of the present invention, where fig. 1(a) is a distribution range of target dimensions of an airplane in the high-resolution remote sensing image, fig. 1(b) is a distribution range of target dimensions of a storage tank in the high-resolution remote sensing image, fig. 1(c) is a distribution range of target dimensions of a ship in the high-resolution remote sensing image, and fig. 1(d) is a distribution range of dimensions of all targets (the airplane, the storage tank, and the ship) in the high-resolution remote.

Fig. 2 is a diagram of an optimal dimension of a high-resolution remote sensing image target detection suggestion box according to an embodiment of the invention.

Fig. 3 is a convolutional neural network architecture for detecting a target in a high-resolution remote sensing image according to an embodiment of the present invention.

Fig. 4 is a result of detecting a target in a high-resolution remote sensing image according to an embodiment of the present invention.

Detailed Description

The following detailed description of the embodiments of the present invention is provided with reference to the accompanying drawings.

The embodiment of the invention provides a convolutional neural network detection method for high-resolution remote sensing image target scale features, which comprises the following steps:

(1) the optimal scale of the remote sensing image target detection suggestion frame is as follows: counting the target scale range of the remote sensing image in the data set by establishing a large-scale high-resolution remote sensing image target detection data set to obtain the scale range of the remote sensing image target; and obtaining the optimal dimension of the high-resolution remote sensing image target detection suggestion frame according to the high-resolution remote sensing image target dimension distribution range.

The high-resolution remote sensing satellite generally images the earth surface in a top-down mode on a near-earth orbit, and the imaging process is easily influenced by illumination, weather and the like, so that the generated high-resolution remote sensing image scene is complex; and the targets in the image show the characteristics of small size and dense distribution in different groups. The setting of the target candidate region extraction scale in the convolutional neural network target detection architecture is of great importance to the influence of the target detection architecture precision. In order to fully count the scale range of a typical target region of interest of an image, a remote sensing image target detection data set WHU-RSONE comprising an airplane, a storage tank and a ship is established, wherein the data set comprises 5977 high-resolution remote sensing images, and the image size is 600 pixels multiplied by 600 pixels to 1372 pixels multiplied by 1024 pixels. 2460 remote sensing images contain 51866 targets, wherein 15703 aircraft (plane) targets, 24692 storage-tank (tank) targets and 11471 ship (ship) targets. The WHU-RSONE data set comprises target image data with different radiances and different scales, and statistics is carried out on the target scales in the data set, wherein the statistical information is shown in figure 1.

The optimal dimension map of the high-resolution remote sensing image target detection suggestion box obtained by the embodiment is shown in fig. 2. In fig. 2, a target region of interest is extracted from the last layer of feature map of the network, and 256-dimensional feature vectors corresponding to the target region of interest are input into the two classification layers and the coordinate regression layer to perform learning and training on whether the target region of interest is a target or not and learning and training on the coordinates of the target region of interest. In fig. 2, the left side is the scale size of the target detection suggestion frame in the existing network, and the right side is the scale size of the target detection suggestion frame in the network of the present invention.

In FIG. 1(a), only 6.58% of the aircraft targets are within the region covered by the region of interest of the nine targets in the left rectangular box of FIG. 2; 96.68% of the aircraft targets are within the region covered by the region of interest of the twelve targets in the right rectangular box of FIG. 2.

In fig. 1(b), only 0.65% of the tank targets are within the region covered by the region of interest of the nine targets in the left rectangular box of fig. 2; there are 97.5% of the tank targets in the area covered by the twelve target regions of interest in the right rectangular box of fig. 2.

In FIG. 1(c), only 24.24% of the ship targets are within the area covered by the region of interest of the nine targets in the left rectangular box of FIG. 2; 90% of the ship targets are within the area covered by the twelve target interest areas in the right rectangular box of fig. 2.

In FIG. 1(d), only 7.62% of the targets in the WHU-RSONE training set are within the region covered by the region of interest of the nine targets in the left rectangular box of FIG. 2; 95.61% of the objects are located in the area covered by the region of interest of the twelve objects in the right rectangular box of FIG. 2.

Statistics show that the target region of interest generated by setting four scales (16, 32, 64 and 128) and three proportions (1:2, 1:1 and 2:1) can effectively couple the scale range of a typical target in a remote sensing image. Then in the design of the convolutional neural network architecture of the embodiment of the present invention, the region suggestion network (RPN) generates the target region of interest size in the convolutional neural network architecture training and testing process using four scales (16, 32, 64, and 128) and three ratios (1:2, 1:1, and 2: 1).

Compared with the scale characteristic convolutional neural network identification method of the remote sensing image target, the statistical analysis is only roughly carried out on all target scales in the data set in the paper. In the patent, a data set with a larger scale than that in a paper is established, and the number of targets in the data set is 2 times that in the paper; carrying out statistical analysis on each target scale in the data set independently, and carrying out statistical analysis on all targets in the data set together to obtain the optimal recommended frame scale of each target in the image; the theory in the patent is more strict and reasonable compared with the thesis, and the optimal recommended frame size of target detection in the image can be obtained.

(2) The remote sensing image target detection convolutional neural network architecture: and setting a convolutional neural network architecture suitable for the target detection of the high-resolution remote sensing image according to the optimal dimension of the target suggestion frame, thereby realizing the target detection in the image.

The invention uses the design of the fast-RCNN architecture for reference, and the convolutional neural network architecture comprises two modules: the target area recommendation network RPN (generating a multi-scale and rotation-invariant target candidate area) and the target classification and accurate positioning network CNN (classifying the target candidate area and reducing the target candidate area positioning error). Compared with the scale characteristic convolutional neural network identification method of the remote sensing image target, the convolutional neural network structure in the patent is clearer and more reasonable than that in the paper, and the parameter details in the structure are optimally designed. The convolutional neural network architecture is divided into two parts in the patent, including a target area suggestion network (RPN) and a target classification and accurate positioning network (CNN), and the overall details of the architecture are clearer and more reasonable. The schematic diagram of the convolutional neural network architecture is shown in fig. 3, in the target detection network architecture of the present invention, five convolutional layers are used to extract an image target feature map, and the detailed parameters of each layer are as follows.

A first layer: inputting a convolution layer with a convolution template size of 7 multiplied by 7, a batch normalization layer, an activation layer with an activation function of Relu and a maximum pooling layer, and outputting 96 characteristic graphs;

a second layer: inputting a convolution layer with a convolution template size of 5 multiplied by 5, a batch normalization layer, an activation layer with an activation function of Relu and a maximum pooling layer, and outputting 256 characteristic graphs;

and a third layer: inputting a convolution layer with a convolution template size of 26 multiplied by 256 and an activation layer with an activation function of Relu, and outputting 384 characteristic graphs;

a fourth layer: inputting 13 × 13 × 384 convolution layers with convolution template size of 3 × 3 and activation layers with activation function of Relu, and outputting 384 characteristic graphs;

and a fifth layer: inputting a convolution layer with convolution template size of 3 × 3 and an activation layer with activation function of Relu, and outputting 256 feature maps.

1)RPN

The convolutional neural network architecture of the invention uses RPN to obtain a target candidate region on the last layer of feature map of the network architecture, and the generated target candidate region is used for training and testing the target detection of the whole architecture. Because a high-resolution remote sensing satellite generally images the ground in a near-earth orbit (400 km-600 km) from top to bottom, and a target in a generated remote sensing image has the characteristics of small scale, large scale change of the same kind of target, direction uncertainty and the like, the RPN generates 12 target candidate regions at each position of a feature map by using four scales (16, 32, 64 and 128) and three scales (1:2, 1:1 and 2:1) and is used for target detection training and testing of the convolutional neural network architecture.

2) Object classification and accurate positioning CNN

The RPN transmits the obtained target candidate region information to a target classification and accurate positioning CNN, and the target classification and accurate positioning CNN combines the candidate region information and a last layer of feature map in the framework to obtain a feature vector of the target candidate region. The feature vectors of the target candidate region are then passed to a region of interest pooling layer (convolution), obtaining target candidate region feature vectors of a specified size. And finally, the feature vectors with the specified sizes are transmitted to a full connection layer for training and testing of target recognition classification and regional coordinate regression. The size of the region-of-interest pooling layer is 7 multiplied by 7, and both the two fully-connected layers behind the region-of-interest pooling layer comprise 4096 neurons; and then respectively accessing a full connection layer and a target coordinate regression layer corresponding to the target classification layer, wherein the full connection layer corresponding to the target classification layer comprises n neurons, the target coordinate regression layer comprises 4n neurons, n is the classification number of the target in the image, and 4n represents the coordinate coefficient of n types of targets (x, y, w, h) corresponding to the target area. Wherein x is the coordinate of the horizontal axis of the central point, y is the coordinate of the vertical axis of the central point, w is the width of the target, and h is the height of the target.

3) Architecture training and testing

In the training phase, parameters in the convolutional neural network architecture of the present invention are initialized using the network parameters trained on imgNet. And training the parameters in the convolutional neural network architecture by adopting an end-to-end training method. And adding the RPN training loss, the target classification loss and the CNN loss for accurate positioning, and performing back propagation on the loss by using a Stochastic Gradient Descent (SGD) method to update parameters in the network. The RPN and target classification and pinpoint CNN training process is described in detail below:

① the positive and negative samples in the min-batch of each training of RPN are from a high resolution remote sensing image, in one image, 256 target candidate areas are randomly sampled to calculate the training loss of RPN, wherein the proportion of the positive and negative samples in the 256 target candidate areas is close to 1:1, if the number of the positive samples is less than 128, the insufficient part supplements the negative samples.

②, training the target classification and accurate positioning CNN by using the target candidate area generated by RPN, wherein, a large amount of overlapping redundancy exists between the target candidate areas generated by RPN, in order to eliminate the target candidate areas with large amount of overlapping redundancy, the target candidate area is restrained by NMS algorithm based on the probability that the target candidate area is the target, the IOU threshold of NMS algorithm is set to 0.7, after restraining the target candidate area by NMS algorithm, the candidate area with the target probability ranking of 2000 is selected as the primary min-batch of training the target classification and accurate positioning CNN, the training loss of the target classification and accurate positioning CNN is calculated by using 2000 target candidate areas.

In the test, if the IOU of one target detection area and the IOU of the real target area in the image are more than or equal to 0.5, the target detection area is considered as a correct target detection result, otherwise, the target detection area is considered as an incorrect target detection result. The number of iterative training of the architecture is set to 75000 in this document, wherein the learning rate of the previous 50000 iterative training of the architecture is 0.001; the learning rate of the last 25000 architectural iterative trainings is 0.0001. The network training momentum is 0.9 and the attenuation factor is 0.0005.

In the framework test stage, a high-resolution remote sensing image is input into the convolutional neural network framework, and the RPN generates 6000 target candidate areas on the high-resolution remote sensing image. And performing non-maximum suppression on the 6000 target candidate regions by using an NMS algorithm based on the probability that the target candidate regions are targets, wherein the IOU threshold value of the non-maximum suppression is set to be 0.7. And after the non-maximum value is inhibited, selecting the target candidate with the target probability confidence degree of 300 before the ranking, transmitting the target candidate into the target classification and accurate positioning CNN for target classification and coordinate accurate positioning, and realizing target detection in the image. The detection result of the invention on the high-resolution remote sensing image target is shown in figure 4.

In specific implementation, the automatic operation of the method can be realized by adopting a computer software technology.

The specific embodiments described herein are merely illustrative of the spirit of the invention. Various modifications or additions may be made to the described embodiments or alternatives may be employed by those skilled in the art without departing from the spirit or ambit of the invention as defined in the appended claims.

Claims

1. A convolutional neural network detection method for target scale features of high-resolution remote sensing images, comprising the following steps:

Step 1, by establishing a large-scale high-resolution remote sensing image multi-target detection data set containing different radiances and different scales, and statistical analysis of each target scale range in the data set, and statistical analysis of all targets together, to obtain high-resolution remote sensing. The optimal distribution range of each target scale in the image;

Step 2, according to the area covered by the distribution range of each target scale in the high-resolution remote sensing image, obtain the optimal scale size of the target detection proposal frame of the high-resolution remote sensing image;

Step 3: According to the optimal size of the target proposal frame, a convolutional neural network architecture suitable for target detection in high-resolution remote sensing images is set to realize target detection in images.

2. according to the convolutional neural network detection method of the described high-resolution remote sensing image target scale feature of claim 1, it is characterized in that: in step 3, described convolutional neural network architecture comprises target area suggestion network RPN and target classification and accurate. localization network CNN,

The target region suggestion network RPN generates multiple target candidate regions at each position of the feature map, and transmits the target candidate region information to CNN;

The target classification and precise localization network CNN uses five layers of convolution layers to extract the image target feature map, and combines the target candidate region information and the last layer of feature maps to obtain the feature vector of the target candidate region; then the feature vector of the target candidate region is passed. For the pooling layer of the region of interest, the feature vector of the target candidate region of the specified size is obtained; finally, the feature vector of the specified size is passed to the fully connected layer for training and testing of target recognition classification and region coordinate regression.

3. The convolutional neural network detection method of high-resolution remote sensing image target scale features according to claim 2, wherein the size of the pooling layer in the region of interest is 7×7, and the size of the pooling layer in the region of interest is 7×7. The connection layer contains 4096 neurons; then the fully connected layer corresponding to the target classification layer contains n neurons, and the target coordinate regression layer contains 4n neurons, where n is the number of classifications of the target in the image, and 4n represents the n corresponding to the target area. Class target x, y, w, h coordinate coefficients; where x refers to the horizontal axis coordinate of the center point, y refers to the vertical axis coordinate of the center point, w is the target width, and h is the target height.