CN109948415A

CN109948415A - Remote sensing image object detection method based on filtering background and scale prediction

Info

Publication number: CN109948415A
Application number: CN201811649055.XA
Authority: CN
Inventors: 李海昌; 高婧; 王思雨; 张金芳; 胡晓惠
Original assignee: Institute of Software of CAS
Current assignee: Institute of Software of CAS
Priority date: 2018-12-30
Filing date: 2018-12-30
Publication date: 2019-06-28

Abstract

The invention discloses a kind of remote sensing image object detection method based on filtering background and scale prediction, include the following steps: step 1: the labeled data of remote sensing images is clustered, filtering background network is trained using the label data after cluster, and extracts the region there may be object from remote sensing images using the filtering background network；Step 2: cut picture, based on filtering background network extract described in there may be the region of object, cut out and need the region detected as target detection；Step 3: using picture and labeled data the training multiscale target detection network cut out, and detecting network detection object on cut-out remote sensing images using the multiscale target；Step 4: merging testing result and the result of target detection is merged on big remote sensing images according to the coordinate in the region of cutting, to obtain the object detection results of big figure.By adopting the above scheme, improving target detection accuracy, recall rate and mAP The present invention reduces the calculation amount of target detection.

Description

Remote sensing image object detection method based on filtering background and scale prediction

Technical field

The invention belongs to remote sensing image processing and analysis technical fields, and in particular to a kind of high-resolution remote sensing image Object detection method.

Background technique

Remote sensing images are as a type more complicated in image, with some features for being different from general life image Such as image is big, abundant information etc..This makes remote sensing images as a kind of data with abundant information, is difficult to be utilized.

Target detection is one of the most widely used computer vision application in remote Sensing Image Analysis.Target detection is appointed Business be positioning and classification image in multiple objects, main application field include face recognition, quantity statistics, automobile auxiliary drive Sail and satellite image analysis etc..Good development had been obtained in the target detection of remote sensing images in recent years.Long et al. exists Accurate Object Localization in Remote Sensing Images Based on Convolutional It proposes a kind of object accurate positioning method including two steps in Neural Networks: non-maxima suppression and being based on The scoring method of prediction block；Object in remote sensing images cannot be extracted well in order to solve the feature of existing object detection method Labeled data, which costs dearly, the data that manually mark are unreliable in space and structure feature and Remote Sensing Target detection asks Topic, Han et al. is in Object Detection in Optical Remote Sensing Images Based on Weakly It is proposed in Supervised Learning and High-Level Feature Learning, Weakly supervised study and higher-dimension Feature learning combines the target detection for carrying out remote sensing images；Chen et al. is in Learning Oriented Region- based Convolutional Neural Networks for Building Detection in Satellite In Remote Sensing Images, use the angle sorter network of six classes horizontal to detected for R-CNN Rectangle frame carries out the adjustment of angle, so that last detection block more suits building, to solve the horizontal square that detected Shape frame cannot show the problem of direction of building in remote sensing images.

However Remote Sensing Target detection encounters some difficulties.Although the studies above is given we provide some enlightenments, But these methods not can be well solved the problem that image is too big, object is too small in Remote Sensing Target detection still.For big When the remote sensing images of size carry out target detection, if being sent directly into neural network, since the occupancy of each memory is all relatively more, It can make training and the process tested all especially slowly, or even can not train.And if large-sized remote sensing images are zoomed to ratio Lesser size, and the target in remote sensing images can be made too small, or even disappear, it is unfavorable for neural network and extracts feature.Using The problem of average cut can alleviate the too big problem of picture to a certain extent, but average cutting is brought is to need to scheme whole Piece all does target detection, and averagely cutting picture is easy geometry culling into different pictures, and detects incomplete object It is all extremely difficult for any object detection method, therefore can biggish shadow using the method for averagely cutting picture Ring the accuracy of entire target detection.

Summary of the invention

Existing Remote Sensing Target detection method there are aiming at the problem that, the present invention proposes a kind of based on filtering background and ruler The remote sensing image object detection method for spending prediction carries out detection target for the high-resolution remote sensing image of large format, Realize the automatic detection to object in remote sensing images and Accurate classification.This method is based on filtering background and scale prediction, by right Faster R-CNN method improves, the accuracy and recall rate of Lai Tigao target detection.

To achieve the above object, technical solution provided by the invention are as follows:

A kind of high-resolution optical remote sensing image object detection method is detected for the remote sensing images for large format Target realizes the automatic detection to object in remote sensing images and Accurate classification, and described method includes following steps:

Step 1: the labeled data of remote sensing images being clustered, the label data training filtering background net after cluster is utilized Network, and the region there may be object is extracted using the filtering background network from remote sensing images；

Step 2: cut picture, based on filtering background network extract described in there may be the region of object, cut out The region detected is needed as target detection；

Step 3: using picture and labeled data the training multiscale target detection network cut out, and using this multiple dimensioned Target detection network detection object on cut-out remote sensing images；

Step 4: merging testing result, according to the coordinate in the region of cutting, the result of target detection is merged into big distant Feel on image, to obtain the object detection results of big figure.

Further, the step 1 specifically includes following sub-step:

Step 11: labeled data being clustered, to generate the label data for training filtering background network；

Step 12: size and horizontal ratio to the region after cluster are adjusted；

Step 13: training filtering background network；

Step 14: using filtering background neural network forecast, there may be the regions of object.

Further, in the step 2, the overlapping region that filtering background neural network forecast goes out is closed before cutting And.

Further, it is merged according to the IOU value between each candidate region and other remaining candidate regions.

Further, in the step 2, before cutting, to the candidate region size and aspect ratio obtained after merging It is adjusted.

Further, the scale prediction network and RPN, classification and position adjusting type modules in Faster R-CNN are shared The characteristic pattern generated from shared convolutional layer is input to the volume with predetermined quantity channel by convolutional layer, the scale prediction network Lamination, to obtain the characteristic pattern of respective numbers.

Further, the loss function of the scale prediction network is cross entropy loss function.

Further, the scaling vector exported according to scale prediction network, is found multiple using one-dimensional non-maxima suppression The scale of maximum likelihood.

Further, multiscale target detection is carried out using following steps: (1) extracting the convolution feature of picture, generate altogether Enjoy convolution feature；(2) scale is generated in scale prediction module using shared convolution feature；(3) according to the scale predicted and altogether It enjoys convolution feature and generates candidate region on the RPN in Faster R-CNN；(4) classification and the position in Faster R-CNN are used It sets adjustment module and classification and position adjustment is carried out to candidate region.

Further, the step 4 includes following sub-step:

Step 41: being existed according to the cutting picture of the prediction result of multiscale target detection network and the output of filtering background network Position in big figure, adjusts the coordinate of testing result；

Step 42: non-maxima suppression being passed through to the testing result being merged on the small figure on big figure, removes duplicate inspection Survey frame.

The present invention finds object by the way that the original remote sensing images of high-resolution large format are sent in filtering background network The region that may assemble, and these regions are cut out to be sent in multiscale target detector and detect, finally inspection It surveys result and merges the testing result for exporting big figure, so that multiscale target detector can be as handling general target Detection task equally handles the target detection of remote sensing images.Simultaneously as the aggregation zone of object is for remote sensing images It is relatively small, so the target detection of object aggregation zone can be only carried out after filtering background network processes, thus Reduce the calculation amount of target detection.Further, since using scale prediction, so that the time that the RPN in Faster R-CNN is generated Favored area more suits with the scale of real-world object, to improve target detection accuracy, recall rate and mAP.

Detailed description of the invention

Fig. 1 is high-resolution optical remote sensing image object detection method work flow diagram according to the present invention；

Fig. 2 is the training of filtering background network and prediction work flow chart in the present invention；

Fig. 3 is the process of the prediction result cutting picture in the present invention based on filtering background network；

Fig. 4 is the scale prediction network structure in the present invention；

Fig. 5 is to find maximum two scales of possibility according to the output of scale prediction network in the present invention；

Fig. 6 is that multiscale target detects work flow diagram in the present invention；

Fig. 7 is to merge testing result module flow diagram in the present invention.

Specific embodiment

With reference to the accompanying drawings of the specification, the specific technical solution of the present invention is described in detail.Reality described herein Apply example only for the purpose of illustrating and explaining the present invention and is not intended to limit the present invention.

As shown in Figure 1, the high-resolution optical remote sensing image mesh proposed by the present invention based on filtering background and scale prediction Mark detection method, comprising the following steps:

It is clustered by the labeled data to original remote sensing images, obtains the labeled data about object aggregation zone. Using data one filtering background network based on RPN of training after cluster, using filtering background network from large format remote sensing The region there may be object is extracted in image, to reduce the range of target detection.

The present invention is instructed using the label data of improved hierarchy clustering method cluster target using the label data after cluster Practice filtering background network.

The step 1 is specifically realized by following sub-step:

Collected original optical remote sensing image data is labeled according to classification, when it is implemented, labeled data is answered Situation as much as possible is covered, to guarantee the completeness of training set.

Labeled data is clustered, using the labeled data training filtering background network after cluster, then uses background Screen predicts the region of object aggregation；

Preferred embodiment according to the present invention, clustering method use hierarchical clustering, and cluster is later also to labeled data Size and aspect ratio have carried out certain adjustment.

The purpose clustered to the labeled data of original remote sensing images is adjoining object cluster to the same mark Remember in frame, so that the size of new indicia framing is bigger.The method for merging the indicia framing of object is to merge two every time Apart from the smallest indicia framing, the distance until only remaining next indicia framing or two indicia framings is greater than preset value.

Cluster between two indicia framings is as follows:

Dis tan ce=Area^increase+λdis tance^center (1)

In formula, Area^increseTwo indicia framings of expression merge after increased area, be defined as follows shown in:

Area^increse=Area (b)-[Area (b₁)+Area(b₂)-Area(bb)] (2)

In formula, b₁,b₂It is respectively two and needs combined indicia framing, b is merging and indicia framing later, and bb is b₁And b₂ The intersection of two indicia framings；

distance^centerIndicate the distance between the central point of two indicia framings.

Therefore, the distance between two indicia framings distance is increased area after being merged by two indicia framings Area^increaseThe distance between the central point of two indicia framings distance^centerWeighting (λ) (being weighted using parameter lambda) is formed , it can guarantee that the indicia framing of cluster is closer in this way, and increased white space is less after merging.

Step 12: the adjustment of size and ratio is carried out to the region after cluster；

There may be the sizes of some indicia framings and aspect ratio to be not suitable in indicia framing after cluster, therefore newly-generated Indicia framing cannot be directly used to train filtering background network.Detection object is ratio in the picture of inappropriate size and aspect ratio More difficult.

If the picture that filtering background network pruning goes out is too big, the meaning cut detects opposite with regard to little Lesser object is easy missing inspection.So when merging indicia framing, if the size for encountering indicia framing is greater than given threshold value Then stop merging the indicia framing.Since the object having individually occurs, so still remaining some area very littles after cluster Indicia framing.If these indicia framings are cut out to come, when these especially small cutting picture pieces feeding target detection networks When, since picture can all be scaled greatly fixed size, such as (600 × 1000), the target in the picture will torsional deformation, Very big puzzlement is caused to target detection.Therefore, the present invention is extended the indicia framing for being less than threshold value.

If the aspect ratio pictures that filtering background network pruning goes out are too big, object detector detection will be highly detrimental to Object.Due to above, it is big to extend those aspect ratios by short side when generating new indicia framing by the present invention It is greater than 5 indicia framing in the indicia framing of predetermined value, such as aspect ratio.

The present invention is adjusted by size and ratio, it is ensured that target phase one of the obtained new label data with cutting picture It causes, better effect can be obtained using such label data training filtering background network.

Step 13: training filtering background network；

Ready training data is divided into training set, verifying collection, test set.Use training set training filtering background net Network, and filtering background network is examined with verifying collection in the training process.When filtering background network obtains enough on verifying collection When small loss, illustrate that training has restrained, then deconditioning, and use test set testing background screen.

Since filtering background network is for cutting picture, filtering background network omission factor wants sufficiently low, therefore The present invention carrys out opinion scale prediction network using recall rate.The calculation method of recall rate is as follows:

In formula, tp is that positive sample is identified as positive sample number, and fn is that positive sample is identified as negative sample number, and tp+fn is indicated Be all positive samples number.

Step 14: using filtering background neural network forecast, there may be the regions of object

Test picture is input in filtering background network, filtering background network can export that some there may be the areas of object Domain, these regions will be as the bases for cutting picture.

Filtering background neural network forecast has gone out the region in large format remote sensing images there may be object, these regions have very much Overlapping and the aspect ratio and size in region do not comply with requirement of the target detection network to picture size, so background mistake What strainer network predicted cannot be directly used to cut picture there may be the region of object.

As shown in figure 3, the present invention is based on filtering background network find out there may be the region of object, to these regions into Row merges, and size region and ratio are adjusted to be suitble to the range of target detection, then cut out these region conducts Target detection needs the region detected.

The step 2 is specifically realized by following sub-step:

Step 21: the region that filtering background neural network forecast goes out is merged；

Filtering background neural network forecast go out there may be the regions of object there are many overlappings, if directly cutting these areas Domain is sent in object detector and detects, and the calculation amount of target detection will be made very big, and needs to carry out the same area weight Reinspection is surveyed, therefore the present invention first merges the region that filtering background neural network forecast goes out.Merging can close overlapping region And at a region, to reduce the overlapping between region.

Specific merging method is to calculate each candidate region with the IOU value between other remaining candidate regions, and every time All merge maximum two candidate regions IOU, be less than predetermined value until only remaining next candidate region or maximum IOU value, Such as 0.5.

The method for calculating the IOU value between two candidate regions is as follows:

Wherein, p₁,p₂That indicate is two candidate regions, p₁∩p₂That indicate is p₁,p₂The area of intersection, p₁∪p₂It indicates Be p₁,p₂Combined area.

Step 22: adjusting the size and ratio of candidate region

In the candidate region obtained after being merged using the above method, some area sizes, ratio are not suitable for, and if cut out The picture cut is too small, and after being sent into object detector, by scaling, the object in figure can be deformed, to be unfavorable for Detection object.Therefore, the present invention is adjusted the size and aspect ratio of candidate region so that the size of the picture cut out and Aspect ratio is suitble to target detection network to extract feature.

It is less than the region of predetermined value to clipping region, its short side can be extended by the present invention, and long side is contracted in proportion It puts.For example, if clipping region is less than (300 × 300), then its short side is extended to 300, long side bi-directional scaling.

If the picture cut out is too big, equally it is unfavorable for object detector and extracts feature, the effect for cutting picture will Become smaller.Therefore, the present invention is when merging candidate region, when candidate region area is greater than predetermined value, will stop merging.Example Such as it is greater than the candidate region of (1000 × 1000) for area, stops merging.

If the aspect ratio of the picture cut out will also be unfavorable for greatly very much extracting feature, therefore work as the aspect ratio of candidate region When greater than predetermined value, the present invention reduces the aspect ratio of the candidate region by extension short side.Such as the aspect ratio when candidate region When greater than 3, the aspect ratio of the candidate region is become 1 by extension short side.

Multiscale target detection network in the present invention refers to increases a scale on the basis of Faster R-CNN Prediction module.The scale for predicting is used to generate the scale of candidate region as the anchor of the RPN in Faster R-CNN, so that RPN can generate the candidate region for being more bonded object scale, then using the classification and position adjustment in Faster R-CNN Module does classification to these candidate regions and position adjusts.

The step 3 is specifically realized by following sub-step:

Step 31: drawingdimension predicts network structure；

The structure of scale prediction network is made of convolutional layer, pond layer and normalization layer.The present invention uses scale prediction Neural network forecast inputs the possibility size of object present in picture.

As shown in figure 4, the scale prediction network in the present invention is adjusted with RPN, classification and the position in Faster R-CNN Module shares convolutional layer, so that a small amount of calculating only be needed to can be obtained by the possibility scale of object in picture.As shown in figure 3, Scale prediction network is input to the characteristic pattern generated from shared convolutional layer the convolutional layer in one 40 channel, to obtain 40 spies Sign figure.Then, this 40 characteristic patterns generate the vector of 40 dimensions by a global maximum pond layer.

In the present invention, the scale used in scale prediction network is an integer multiplied by a coefficient, such as the coefficient Selection 16.Therefore, if the scale predicted is 3, the scale of the candidate region of actual generation is 16*3.

If defining the scale x of object according to the size of the indicia framing b of the aircraft for acquiring remote sensing images:

In formula, w, h are the width and height of indicia framing b respectively.

Scale collection S={ p is defined according to the scale x of object₀,p₁,p₂,…,p₃₉, the definition of the subscript i in scale collection S is such as Shown in lower:

I=round (x/16) -1 (6)

Wherein, round () is the function to round up.

The method for generating the scale collection of truthful data is as follows: scale i* existing for one, if p_i*=1, then exist Scale in the neighborhood of this scale is defined using following gauss hybrid models:

In formula, r is the number of scale present in N (i), N (i^*) what is indicated is with i^*Centered on, size be 20 neighborhood； μ_jIt is the mean value of Gaussian function, μ_jValue be necessary being in N (i) j-th of scale i^*；σ is the variance of Gaussian function, is For the sake of simplicity,

The loss function of scale prediction network is defined as follows:

In formula, x is the vector of maximum pond layer output in scale prediction module,It is the output normalizing maximum pond layer It is melted into the size distribution of the prediction exported after probability, p is the true size distribution being calculated according to label data, i.e., originally Invention uses cross entropy loss function.

Step 32: according to the output of scale prediction network, finding two scales being most likely to occur in the picture；

Because picture is cut-out from original remote sensing images, the scale for the object for including in each picture Small number.Data set is counted it is known that the scale for including in every picture is generally 2 or 3.Pass through experiment It was found that whole promoted that scale is set as 3 accuracy for target detection is not obvious, so the present invention sets scale It is set to two, it is therefore desirable to two scales most possibly occurred are found according to the output of scale prediction network.

As shown in figure 5, the scaling vector of the output of scale prediction network represent each scale there are a possibility that point Cloth.According to scaling vector, the present invention finds two scales being most likely to occur in the picture by the following method:

(1) method for subtracting mean value is used to make size distribution more smooth；

(2) scale of two maximum likelihoods is found using an one-dimensional non-maxima suppression (1D NMS).

Step 33: multiscale target detection

As shown in fig. 6, multiscale target detection comprises the following steps:

(1) the convolution feature for extracting picture generates shared convolution feature；

The present invention extracts convolution feature using VGG16 network structure, contains convolutional layer, pond layer, active coating.These volumes The parameter of lamination passes through pre-training on Image Net data set, and for remote sensing image data collection of the invention to some volumes The parameter of lamination is finely tuned, can extract well in image than it is more significant and with target object relevant feature.

(2) scale is generated in scale prediction module using shared convolution feature

Present invention uses a convolutional layers and a global maximum pond layer, predict picture based on shared convolution feature In the scale that most possibly occurs.

(3) candidate region is generated on the RPN in Faster R-CNN according to the scale and shared convolution feature that predict；

RPN is for generating a module of candidate region in Faster R-CNN, is to generate 3 fixed sizes, 3 originally A ratio, altogether 9 kinds of candidate regions.The present invention is after it wherein joined two scales of prediction, it is only necessary to generate 2 big Small, 3 ratios, altogether 6 kinds of candidate regions, and the size of the candidate region generated are related with the size of object in picture.

(4) classification and position adjustment are carried out to candidate region using the classification of Faster R-CNN and position adjusting type modules；

The step is consistent with conventional implementation method, so being no longer described in detail.

The object detection results of coordinate position and the picture cut based on the picture cut out in big figure, by right After the position adjustment of prediction block, so that the coordinate position of prediction block adjusted is the position in big figure, in addition to merging The testing result of big figure afterwards has carried out non-maxima suppression, gets rid of duplicate prediction block, to obtain final detection knot Fruit.

Since multiscale target detection is detected on the picture after cutting, it is therefore desirable to according to the region of cutting Coordinate is merged into the result of target detection on big remote sensing images.Meanwhile the object detection results of the big figure after merging can There can be some overlappings, it is therefore desirable to remove the label of overlapping.The present invention is overlapped using the method removal of non-maxima suppression Label, to obtain the object detection results of big figure, facilitates displaying.

The step 4 is specifically realized by following sub-step:

Step 41: adjusting the coordinate of testing result；

The cutting picture exported according to the prediction result of multiscale target detection module and filtering background network is in big figure Position, the prediction block of the target detection on small figure by the coordinate value in the upper left corner plus small figure belonging to the prediction block, So that prediction block expression is position of the object on big figure.

Step 42: removing duplicate detection block；

There may be some overlappings when due to cutting picture, so that the same object appears in multiple small figures, in this way Certain objects after allowing for being merged into big figure above can be repeatedly marked.Therefore, the testing result on small figure is merged It to after scheming above greatly, needs by non-maxima suppression, to go out to remove duplicate prediction block, obtains better testing result.

Non-elaborated part of the present invention belongs to techniques well known.

The present invention can use different data sets, training filtering background network and multiscale target inspection according to actual needs Device is surveyed, to reach preferable object detection results.Target detection process through the invention can be more accurate detection it is high Object in resolution Optical remote sensing images, to provide help for analysis remote sensing images.

The above, part specific embodiment only of the present invention, but scope of protection of the present invention is not limited thereto, appoints What those skilled in the art is in the technical scope disclosed by the present invention, it will be appreciated that the change or replacement expected should all be covered Within protection scope of the present invention.Therefore, the scope of protection of the invention shall be subject to the scope of protection specified in the patent claim.

Claims

1. a kind of high-resolution optical remote sensing image object detection method carries out detection mesh for the remote sensing images for large format Mark realizes the automatic detection to object in remote sensing images and Accurate classification, and described method includes following steps:

Step 1: the labeled data of remote sensing images is clustered, trains filtering background network using the label data after cluster, And region there may be object is extracted from remote sensing images using the filtering background network；

Step 2: cut picture, based on filtering background network extract described in there may be the region of object, cut out conduct Target detection needs the region detected；

Step 3: using picture and labeled data the training multiscale target detection network cut out, and using the multiscale target Detect network detection object on cut-out remote sensing images；

Step 4: merging testing result according to the coordinate in the region of cutting and the result of target detection is merged into big remote sensing figure As upper, to obtain the object detection results of big figure.

2. high-resolution optical remote sensing image object detection method according to claim 1, it is characterised in that: the step 1 specifically includes following sub-step:

Step 12: the size and aspect ratio in the region after cluster are adjusted；

Step 13: training filtering background network；

3. high-resolution optical remote sensing image object detection method according to claim 1, it is characterised in that: the step In 2, the overlapping region that filtering background neural network forecast goes out is merged before cutting.

4. high-resolution optical remote sensing image object detection method according to claim 3, it is characterised in that: according to each IOU value between candidate region and other remaining candidate regions merges.

5. high-resolution optical remote sensing image object detection method according to claim 3, it is characterised in that: the step In 2, before cutting, the candidate region size and aspect ratio that obtain after merging are adjusted.

6. high-resolution optical remote sensing image object detection method according to claim 1, it is characterised in that: the scale Predict that RPN, classification and position adjusting type modules in network and Faster R-CNN share convolutional layer, the scale prediction network from The characteristic pattern that shared convolutional layer generates is input to a convolutional layer with predetermined quantity channel, to obtain the spy of respective numbers Sign figure.

7. high-resolution optical remote sensing image object detection method according to claim 1, it is characterised in that: the scale The loss function for predicting network is cross entropy loss function.

8. high-resolution optical remote sensing image object detection method according to claim 1, it is characterised in that: according to scale The scaling vector for predicting network output, the scale of multiple maximum likelihoods is found using one-dimensional non-maxima suppression.

9. high-resolution optical remote sensing image object detection method according to claim 1, it is characterised in that: using as follows Step carries out multiscale target detection: (1) extracting the convolution feature of picture, generate shared convolution feature；(2) using shared convolution Feature generates scale in scale prediction module；(3) according to the scale and shared convolution feature predicted Faster R-CNN's Candidate region is generated on RPN；(4) using the classification of Faster R-CNN and position adjusting type modules to candidate region carry out classification and Position adjustment.

10. high-resolution optical remote sensing image object detection method according to claim 1, it is characterised in that: the step Rapid 4 include following sub-step:

Step 41: being schemed greatly according to the cutting picture of the prediction result of multiscale target detection network and the output of filtering background network In position, adjust the coordinate of testing result；

Step 42: non-maxima suppression being passed through to the testing result being merged on the small figure on big figure, removes duplicate detection block.