CN111950488B - Improved Faster-RCNN remote sensing image target detection method - Google Patents

Improved Faster-RCNN remote sensing image target detection method Download PDF

Info

Publication number
CN111950488B
CN111950488B CN202010833754.0A CN202010833754A CN111950488B CN 111950488 B CN111950488 B CN 111950488B CN 202010833754 A CN202010833754 A CN 202010833754A CN 111950488 B CN111950488 B CN 111950488B
Authority
CN
China
Prior art keywords
network
remote sensing
rcnn
sensing image
faster
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010833754.0A
Other languages
Chinese (zh)
Other versions
CN111950488A (en
Inventor
郭艳艳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanxi University
Original Assignee
Shanxi University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanxi University filed Critical Shanxi University
Priority to CN202010833754.0A priority Critical patent/CN111950488B/en
Publication of CN111950488A publication Critical patent/CN111950488A/en
Application granted granted Critical
Publication of CN111950488B publication Critical patent/CN111950488B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/10Terrestrial scenes
    • G06V20/13Satellite images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/32Normalisation of the pattern dimensions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/08Detecting or categorising vehicles

Abstract

The invention relates to the field of remote sensing image target detection, in particular to an improved Faster-RCNN remote sensing image target detection method. The method comprises the following steps: (1) dividing a remote sensing image data set into a training set and a testing set; (2) carrying out size transformation, standardization and normalization processing and data enhancement on the remote sensing images in the training set in sequence: (3) building an improved fast-RCNN remote sensing image target detection network; (4) training an improved fast-RCNN remote sensing image target detection network; (5) and testing the improved fast-RCNN remote sensing image target detection network. The method improves the average accuracy of target detection of the remote sensing image, particularly the average accuracy of small target detection, and reduces the probability of false detection and missed detection of the small target.

Description

Improved Faster-RCNN remote sensing image target detection method
Technical Field
The invention relates to the field of remote sensing image target detection, in particular to an improved Faster-RCNN remote sensing image target detection method.
Background
Object detection is one of the basic problems in computer vision recognition tasks, and has wide application in a plurality of fields. The target detection in the remote sensing image has wide application prospect in the aspects of military application, urban planning, environmental management and the like. Unlike target detection on natural images, targets on remote sensing images are much smaller than those on natural images, the size and orientation of the targets are diverse (e.g., playground, car, bridge, etc.), and the visual appearance of the target instances varies due to occlusion, shadows, lighting, resolution, and viewpoint variations. Therefore, detection of objects on remote sensing images is much more difficult than detection of objects on natural images.
In recent years, some researches introduce deep convolutional neural networks into target detection, which can automatically learn from data to feature representation with good robustness and strong expression capability, and the target detection method has greatly improved in speed and precision. The target detection algorithm based on candidate region extraction and the target detection algorithm based on regression are the most classic algorithms in the current deep convolutional neural network target detection algorithm, the candidate region is extracted from a given image based on the algorithm based on candidate region extraction, then classification and regression positioning are carried out on each extracted candidate region, and certain advantages are achieved in the accuracy of target detection; the regression-based target detection algorithm provides a single and integral convolutional neural network, and the target detection problem is reconstructed into a regression problem to directly predict the category and the position of the target, so that the regression-based target detection algorithm has certain advantages in the aspect of target detection speed.
Although the current target detection algorithm has a good effect in natural image target detection, the target detection of the remote sensing image still needs to be improved, and particularly, the detection effect of the small target in the remote sensing image is still not ideal, and the situations of target false detection and target missing detection are easy to occur.
Disclosure of Invention
Aiming at the defects in the prior art, the invention aims to provide an improved fast-RCNN remote sensing image target detection method, which can improve the target detection accuracy of a remote sensing image, reduce the probability of false detection and missed detection during target detection and has better generalization capability.
In order to achieve the purpose, the invention adopts the following technical scheme:
an improved method for detecting a target of a Faster-RCNN remote sensing image comprises the following steps:
(1.1) dividing a remote sensing image data set into a training set and a testing set;
(1.2) carrying out size transformation, normalization processing and data enhancement on the remote sensing images in the training set in sequence:
a. the size transformation is to set the size of the remote sensing image in the training set to 800 pixels multiplied by 960 pixels;
b. the normalization processing is to map each pixel value of the images in the training set to a range of 0-1;
c. the data enhancement is to rotate the remote sensing image normalized in the training set by 90 degrees, 180 degrees and 270 degrees and perform mirror image operation;
(1.3) constructing an improved fast-RCNN remote sensing image target detection network: the network consists of a Faster-RCNN sub-network and a positioning fine modification sub-network;
(1.4) training an improved Faster-RCNN remote sensing image target detection network: randomly configuring node parameters of a built improved fast-RCNN remote sensing image target detection network in advance, inputting the remote sensing images in the training set into the built improved fast-RCNN remote sensing image target detection network, and updating the node parameters in an improved fast-RCNN remote sensing image target detection network model in a random gradient descent mode until an optimal solution is found;
(1.5) testing an improved Faster-RCNN remote sensing image target detection network: and detecting the remote sensing images in the test set by using the trained improved Faster-RCNN remote sensing image target detection network, and analyzing the detection effect.
Further, the step (1.3) of building an improved fast-RCNN remote sensing image target detection network specifically comprises the following steps:
(2.1) inputting the remote sensing image into a VGG16 network in a Faster-RCNN sub-network, extracting texture, color and scale characteristics of a target in the remote sensing image, and obtaining a characteristic graph y after passing through a VGG16 network1The size is 50 × 60 × 256;
(2.2) obtaining a characteristic diagram y1The device is divided into three parallel branches which are respectively an RPN (resilient packet network) and a RoI pooling layer in a Faster-RCNN sub-network and a RoI pooling layer in a positioning and refining sub-network;
(2.3) in the RPN network in the Faster-RCNN sub-network, a sliding window of 3 x 3 size is used for y1Standard convolution with step size of 1An operation of generating 12 different scale anchor blocks with sizes of 16 × 16, 16 × 32, 32 × 016, 32 × 132, 32 × 64, 64 × 32, 64 × 128, 128 × 64, 128 × 256, and 256 × 128 centering on a center point of the sliding window every time the sliding window is slid; after generating an anchor point frame, outputting the anchor point frame through a Relu activation function in an RPN network, dividing the anchor point frame into two branches, wherein one branch is a classification loss branch, the branch firstly performs point-by-point convolution operation with the output channel number of 18, then classifies 12 anchor point frames with different scales through a Softmax classifier in the RPN network, each anchor point frame outputs two probability values to distinguish the target or background of an image, and 24 probability values are output after each sliding for 1 time; the other branch is a boundary regression loss branch, and after the branch is convolved point by an output channel with the number of 36, the boundary regression offset of an anchor frame is calculated through a boundary frame regression loss layer in the RPN network: one anchor point frame outputs 4 relative position coordinates, which are respectively the center coordinates (x) of the anchor point frame a,ya) And width and height (w) of anchor boxesa,ha) Outputting 48 relative position coordinates of 12 anchor point frames after each sliding for 1 time, and finally, integrating the outputs of the classification loss branch and the boundary regression loss branch in the RPN network through a proposed layer in the RPN network to obtain a feature map y of the anchor point frame with the relative position coordinates2
(2.4) passing through the RPN network in the Faster-RCNN sub-network, the feature map y2And a feature map y obtained from the VGG16 network1Inputting the feature maps into a RoI pooling layer in a Faster-RCNN sub-network, and outputting the feature maps with non-uniform sizes into a feature map y with a size of 25 × 30 × 2563After passing through a full-link layer with Relu activation function in the Faster-RCNN sub-network, a regression result y is obtained through a regression loss layer in the Faster-RCNN sub-network4
(2.5) regression results y obtained from regression lost layer in the Faster-RCNN subnetwork4And a feature graph y obtained in the VGG16 network1Inputting the feature map into the RoI pooling layer in the positioning and refining sub-network, outputting a feature map with the size of 6 multiplied by 7 multiplied by 256, and outputting the feature map after passing through a full connection layer with Relu activation function in the positioning and refining sub-networkThe obtained result is divided into two paths, one path outputs the position information y of the target in the remote sensing image through a regression loss layer in the positioning and refining sub-network 5(ii) a The other path outputs the classification result y of the target in the remote sensing image through a Softmax classifier in the positioning and refining sub-network6
All regression loss layers (regressors) in the invention utilize a robust loss function to calculate the boundary regression offset of the anchor point frame.
Compared with the prior art, the invention has the following advantages: according to the method, on the basis of the existing Faster-RCNN network, anchor point frames of an RPN (resilient packet network) in the Faster-RCNN are increased to 12, and a positioning and refining sub-network with a RoI pooling layer is added to further detect the remote sensing image output by the Faster-RCNN network, so that the average accuracy of target detection of the remote sensing image is improved, and particularly the accuracy of detection of small targets such as automobiles and airplanes in the remote sensing image is improved.
Drawings
FIG. 1 is a schematic diagram of the improved fast-RCNN remote sensing image target detection network structure of the present invention;
FIG. 2 is a RPN network of the present invention;
FIG. 3 is a visual comparison of the improved Faster-RCNN remote sensing image object detection network of the present invention and the prior art method.
Detailed Description
The remote sensing image target detection data set used by the invention is from a NWPU VHR 10 data set made by Gong Cheng doctor, etc. of northwest industrial university, and the data sets respectively have 10 types, namely, an airplane (airflane), a ship (ship), a storage tank (storage), a baseball (baseball) diamond, a tennis court (tenis court), a basketball court (basketball court), a track and field (harbor), a bridge (bridge) and a vehicle (vehicle). The data set is a data set containing 800 high resolution remote sensing images, wherein the negative sample data set comprises 150 images which do not belong to any category. The sizes of the targets to be detected vary widely, with the largest target size being about 418 × 418 and the smallest target size being 33 × 33.
Referring to fig. 1, 2 and 3, the improved method for detecting the target of the fast-RCNN remote sensing image disclosed by the invention comprises the following steps:
(1.1) dividing a remote sensing image data set into a training set and a testing set, wherein 80% of the training set used for network training and 20% of the testing set used for network testing keep the data distribution consistency of different types of samples in the training set and the testing set as much as possible;
(1.2) carrying out size transformation, normalization processing and data enhancement on the remote sensing images in the training set in sequence:
a. the size transformation is to set the size of the remote sensing image in the training set to 800 pixels multiplied by 960 pixels;
b. the normalization processing is to map each pixel value of the remote sensing image in the training set to a range of 0-1;
c. the data enhancement is to rotate the remote sensing image normalized in the training set by 90 degrees, 180 degrees, 270 degrees and mirror image operation, so as to ensure the robustness of the improved Faster-RCNN remote sensing image target detection network;
(1.3) constructing an improved fast-RCNN remote sensing image target detection network: the network consists of a Faster-RCNN sub-network and a positioning fine modification sub-network, wherein the Faster-RCNN sub-network is used for carrying out primary target detection on a remote sensing image, and the positioning fine modification sub-network is used for further detecting the output of the Faster-RCNN network, so that the problems of inaccurate positioning, missing detection and error detection of a target are solved;
The fast-RCNN sub-network consists of a VGG16 network, an RPN network, a RoI pooling layer, a full connection layer (FC) with a Relu activation function and a regression loss layer (Regressor); the positioning refinement subnetwork consists of a RoI Pooling layer (RoI Pooling), a full connection layer with Relu activation function, a Softmax classifier and a regression loss layer (Regressor).
The RPN network in the Faster-RCNN subnetwork comprises a standard convolution layer (Conv2d) with the size of 3 x 3, a Relu activation function, two point-by-point convolution layers (Pwise), a Softmax classifier, a bounding box regression loss layer (Bbox Regressor) and a proposed layer (Propusal).
The method for building the improved fast-RCNN remote sensing image target detection network comprises the following specific steps:
(2.1) inputting the remote sensing image into a VGG16 network in a Faster-RCNN sub-network, wherein the VGG16 network comprises 13 convolutional layers (Conv2d) with a Relu activation function behind the convolutional layers and 4 Pooling layers (Pooling), an input feature map is activated through a Relu activation function after passing through each convolutional layer, wherein the maximum Pooling operation is carried out through one Pooling layer after the 2 nd, 4 th, 7 th and 10 th convolutional layers, each convolutional layer adopts standard convolution with the size of 3 x 3, the filling number (Padding) is 1, the step size is 1, each Pooling layer adopts maximum Pooling, the size of a Pooling kernel is 2 x 2, the step size is 2, and a feature map y obtained after texture, color and scale features of an object in the remote sensing image are extracted through the VGG16 network 1The size is 50 × 60 × 256, and the network configuration of VGG16 is shown in table 1:
table 1 VGG16 network configuration table
Figure BDA0002638937280000061
Figure BDA0002638937280000071
(2.2) feature map y to be obtained1The device is divided into three parallel branches which are respectively an RPN (resilient packet network) and a RoI pooling layer in a Faster-RCNN sub-network and a RoI pooling layer in a positioning and refining sub-network;
(2.3) in the RPN network in the Faster-RCNN subnetwork, using a sliding window with the size of 3 x 3 for y1Performing standard convolution operation with step size of 1, and generating 12 different scale anchor point frames with the size of 16 × 16, 16 × 32, 32 × 016, 32 × 132, 32 × 64, 64 × 32, 64 × 128, 128 × 64, 128 × 256 and 256 × 128 by taking the central point of the sliding window as the center once sliding; generating anchor point frame, outputting via Relu activating function in RPN network, dividing into two branches, one branch being classified loss branch, and passing through an output channelAfter 18 point-by-point convolution operations, classifying 12 anchor frames with different scales through a Softmax classifier in an RPN, outputting two probability values for each anchor frame to distinguish the target or background of an image, and outputting 24 probability values every 1 sliding time; the other branch is a boundary regression loss branch, and after the branch is convolved point by point with an output channel number of 36, the boundary regression offset of an anchor point frame is calculated through a boundary frame regression loss layer (Bbox _ Regressor) in the RPN network: one anchor point frame outputs 4 relative position coordinates, which are respectively the center coordinates (x) of the anchor point frame a,ya) And width and height (w) of anchor boxesa,ha) Every time the sliding is performed for 1 time, the 12 anchor point frames output 48 relative position coordinates; finally, the outputs of the classification loss branch and the boundary regression loss branch in the RPN network are integrated through a proposed layer in the RPN network to obtain a feature map y of the anchor point frame with relative position coordinate values2The proposed layer (Proposal) adopts a non-maximum suppression algorithm (NMS) to realize the preliminary screening of the anchor point frame and remove the anchor point frame beyond the image boundary;
(2.4) passing through the RPN network in the Faster-RCNN sub-network, the feature map y2And a feature map y obtained from the VGG16 network1Inputting the feature map into a RoI pooling layer in a Faster-RCNN subnetwork, and outputting the feature map with non-uniform size as a feature map y with the size of 25 x 30 x 2563After passing through a full connection layer (FC) with a Relu activation function in the Faster-RCNN sub-network, a regression result y is obtained through a regression loss layer (Regressor) in the Faster-RCNN sub-network4
(2.5) regression results y obtained from regression lost layer in the Faster-RCNN subnetwork4And the feature map y obtained in the VGG16 network1Inputting the data into a RoI pooling layer in a positioning and refining sub-network, outputting a characteristic diagram with the size of 6 multiplied by 7 multiplied by 256, dividing an output result into two paths after passing through a full connection layer (FC) with a Relu activation function in the positioning and refining sub-network, and outputting position information y of a target in the remote sensing image through a regression loss layer (Regressor) in the positioning and refining sub-network 5(ii) a The other path is output by a Softmax classifier in the positioning and refining sub-networkThe classification result y of the target in the remote sensing image6
(1.4) training an improved Faster-RCNN remote sensing image target detection network: firstly, randomly configuring node parameters of a built improved Faster-RCNN remote sensing image target detection network, inputting a remote sensing image in a training set into the built improved Faster-RCNN remote sensing image target detection network, updating the node parameters in an improved Faster-RCNN remote sensing image target detection network model in a random gradient descending mode according to the descending direction in each iteration process until an optimal solution is found, and stopping iteration.
The hardware conditions and parameter configurations for training the network are shown as S401 and S402:
s401, the method adopts a computer with a CPU as an Intel Core i7-9700 processor, a display card configured as an Nvidia GeForce GTX 10606GB and a total memory capacity of 16G, and builds an algorithm frame by using a Pythroch.
S402, the parameters in the network are updated by adopting a random gradient descent algorithm, a pre-training model is a resnet50 network, the network is quickly converged to the optimum by adopting a dynamic learning rate, the initial learning rate is set to be 0.001, the learning rate is multiplied by 0.1 every 4000 iterations, 10 ten thousand iterations are performed, and the threshold value of non-maximum suppression (NMS) is set to be 0.7.
(1.5) testing an improved Faster-RCNN remote sensing image target detection network: and detecting the remote sensing images in the test set by using the trained improved Faster-RCNN remote sensing image target detection network, analyzing the detection effect, and selecting the average accuracy (mAP) as an evaluation index for measuring the detection effect of the remote sensing image target.
The present embodiment is described in further detail below:
standard convolution (Conv2 d): the calculation formula of the standard convolution is shown in formula (1):
Conv2d(W,b,x)=W·x+b (1)
wherein, W is the weight of the convolution kernel, x is the input characteristic diagram, b is the offset term parameter, M is the input channel number, V and U are the width and height of the convolution kernel respectively, and N is the output channel number.
Point-by-point convolution (Pwise): convolution kernel W of point-by-point convolutionpThe size is 1 × 1 × N, N is the number of output channels, and if the input image size is h × d × M, the size of the output feature map is h × d × N. The point-by-point convolution calculation is shown in equation (2):
Pwise(Wp,x)=Wp·x (2)
relu activation function: the mathematical formula is relu (x) max (0, x), where max () represents the larger of 0 and x;
pooling layer (Pooling): each pooling layer adopts maximum pooling, the size of a pooling core is 2 multiplied by 2, and the step length is 2;
RoI pooling layer (roupooling): the specific operation of the RoI pooling layer is divided into three steps, wherein in the first step, an interested area is mapped to a position corresponding to a characteristic diagram according to the input characteristic diagram; the second step divides the mapped area into parts with the same size (the number is the same as the output dimension); and thirdly, performing maximum pooling operation on each divided part. Through the three steps, the characteristic graphs with different sizes can be output as the characteristic graphs with fixed sizes, and the size of the output characteristic graph is irrelevant to the size of the RoI pooling layer and the size of the input characteristic graph.
Full connection layer (FC): each neuron of the full connection layer is completely connected with the neuron of the previous layer;
average accuracy (mAP): the calculation formula is shown in formula (3):
Figure BDA0002638937280000101
assume a total of k +1 classes (including a null or background class), pijThe number that is originally in class i but predicted to be in class j is called false positive; p is a radical ofiiTo truly classify the correct number;
softmax classifier: the Softmax classifier is generally used for multi-classification problems, minimizing the Softmax loss function by training the network. Suppose that for a size J dataset { (x)(1),y(1)),…(x(m),y(m)),…(x(J),y(J)) For each sample in the dataset, there is a label for the correct classification, i.e. label value: { y(1),…,y(k)K is the number of classes. For the mth sample, which corresponds to a category j, there is a probability, also called score value, and the score value of the mth sample is shown in formula (4):
Figure BDA0002638937280000102
wherein θ ═ θ01…θk-1) Is a parameter to be optimized, y(m)Denoted is the m-th sample label, x(m)Denotes the m-th sample, h (x)(m)) Represents the score value of the m-th sample by
Figure BDA0002638937280000103
This term normalizes the probability distribution so that the sum of the probabilities is 1.
NMS: a Non-Maximum Suppression algorithm (Non-Maximum Suppression), which is an algorithm for removing Non-maxima, and the algorithm steps are as follows:
Assuming that the object to be recognized is surrounded by F candidate frames, the nth candidate frame is calculated by the classifier as snN is more than or equal to 1 and less than or equal to F: (1) a set H is newly established, F candidate frames are placed into the set, and an empty set T is newly established; (2) sorting all candidate frames in the set H according to the score values of the classifier, and putting the frame T with the highest score into the set T; (3) traversing the candidate frames in the set H, respectively performing cross-comparison operation with the frame t, if the number of the candidate frames is higher than a certain threshold value, considering that the frame is overlapped with the frame t, and deleting the frame from the set H; (4) and (4) returning to the step (2) to continue the iteration until the set H is an empty set. The boxes in the set T are what we need.
Regression loss layer (Regressor): regression loss function LregAs shown in equation (5):
Lreg(tn,vn)=∑c∈{x,y,w,h}smoothL1(tc-vc) (5)
wherein, SmoothL1The loss function for robustness is shown in equation (6):
Figure BDA0002638937280000111
wherein v isn=(vx,vy,vw,vh) Is the coordinate vector of the real frame, tn=(tx,ty,tw,th) Is the coordinate vector where the prediction box is located. The four coordinate calculation formulas are as follows:
Figure BDA0002638937280000112
Figure BDA0002638937280000113
Figure BDA0002638937280000114
Figure BDA0002638937280000115
x and y are the center coordinates of the prediction box, w and h are the width and height of the prediction box, respectively, xaAnd yaCenter coordinates, w, of anchor boxes generated for RPN networksaAnd haWidth and height, x, of anchor boxes generated in the RPN network, respectively *And y*Is the center coordinate of the real frame, w*And h*Respectively the width and height of the real box.
Bounding box regression loss layer (Bbox Regressor): the loss function of the bounding box regression loss layer in the RPN network is defined as:
Figure BDA0002638937280000121
the loss function is divided into two parts which,
Figure BDA0002638937280000122
is a function of the classification loss, with p for the outputnIs represented by LregFor the regression loss function, output tnIs represented by LregIs a regression loss function expressed by the formula (5); n is an anchor frame index; p is a radical ofnFor the probability of the target being contained in the nth anchor box,
Figure BDA0002638937280000123
indicating that if the nth anchor block contains a target, it is 1, otherwise it is 0; n is a radical ofregIndicating the number of anchor boxes containing objects in the RPN network, NclsThe total number of anchor boxes; λ is the weight.
The advantages and disadvantages of the present invention were further analyzed by comparing the method of the present invention with the results of the fast-RCNN assay (see Table 2).
TABLE 2 average accuracy of the present invention and prior methods
Figure BDA0002638937280000124
As can be seen from table 2, the average accuracy (mAP) of the method of the present invention is improved, and for a vehicle with a small target, since the background is complex and is easily occluded by a shadow, the accuracy in the existing method is low, but the network of the present invention improves the average accuracy of the vehicle by about 7%, which indicates that the detection effect of the small target is effectively improved by using the positioning fine trimming sub-network; meanwhile, for a larger target such as a bridge, the average accuracy rate is not high in the method and the prior art, because the data set is connected with the bridge and the road in a long strip shape, and the color characteristics and the texture characteristics are similar, the bridge is easily identified as the road in the detection process, so that the average accuracy rate of the bridge is low; for the oil tank which is a target with dense distribution in the image, the accuracy of the invention and the existing network detection is low, so that the network needs to be continuously improved to detect the target with high distribution density.
Meanwhile, under the data set, the average accuracy of the improved network is tested by setting different anchor point frames, as shown in table 3.
TABLE 3 average accuracy for different anchor frame numbers
Number of anchor boxes mAP(%)
3 78.2
6 80.6
9 81.5
12 83.1
15 82.6
As can be seen from table 3, the average accuracy steadily increases when the NWPU VHR 10 data set is trained with different anchor boxes, ranging from 3 to 12, while the average accuracy slightly decreases when over 12. This shows that the number of anchor frames can effectively improve the average accuracy within a certain range, and when the number exceeds a certain range, the accuracy of target detection cannot be improved by simply increasing the number of anchor frames, which not only increases the extra calculation amount of the network, but also increases the overfitting risk of the network, thereby increasing the complexity of the network.
As shown in FIG. 3, the improved method for detecting the target of the fast-RCNN remote sensing image is compared with the visualization chart of the prior art, and the visualization chart of the invention is arranged in two columns and three rows in total, wherein the first column from left to right is the visualization chart of the invention, and the second column is the visualization chart of the prior art. As can be seen from the first line of images, the existing method has large displacement on the prediction frame of the bridge and small deviation on the positioning of the ship; the second row of images contains 5 vehicle targets, and the prior method has the problems that for the vehicle with a smaller target, the detection of one vehicle is missed, the detection of one vehicle is mistaken, and the positioning of other three vehicles is slightly deviated; the third row of images contains 5 airplanes, and it can be seen from the figure that the existing method has a missed detection situation for airplanes.
The above embodiments are described in detail for the purpose of further illustrating the present invention and should not be construed as limiting the scope of the present invention, and the skilled engineer can make insubstantial modifications and variations of the present invention based on the above disclosure.

Claims (1)

1. An improved method for detecting a target of a Faster-RCNN remote sensing image is characterized by comprising the following steps:
(1.1) dividing a remote sensing image data set into a training set and a testing set;
(1.2) carrying out size transformation, normalization processing and data enhancement on the remote sensing images in the training set in sequence:
a. the size transformation is to set the size of the remote sensing image in the training set to 800 pixels multiplied by 960 pixels;
b. the normalization processing is to map each pixel value of the images in the training set to a range of 0-1;
c. the data enhancement is to rotate the remote sensing image normalized in the training set by 90 degrees, 180 degrees and 270 degrees and perform mirror image operation;
(1.3) constructing an improved fast-RCNN remote sensing image target detection network: the network consists of a Faster-RCNN sub-network and a positioning and refining sub-network;
(1.4) training an improved Faster-RCNN remote sensing image target detection network: randomly configuring node parameters of the established improved fast-RCNN remote sensing image target detection network in advance, inputting the remote sensing images in the training set into the established improved fast-RCNN remote sensing image target detection network, and updating the node parameters in the improved fast-RCNN remote sensing image target detection network model in a random gradient descent mode until an optimal solution is found;
(1.5) testing an improved Faster-RCNN remote sensing image target detection network: detecting the remote sensing images in the test set by using a trained improved Faster-RCNN remote sensing image target detection network, and analyzing the detection effect;
the step (1.3) of establishing the improved Faster-RCNN remote sensing image target detection network comprises the following specific steps:
(2.1) inputting the remote sensing image into a VGG16 network in a Faster-RCNN sub-network, extracting texture, color and scale features of a target in the remote sensing image, and obtaining a feature map y after passing through a VGG16 network1The size is 50 × 60 × 256;
(2.2) feature map y to be obtained1The device is divided into three parallel branches which are respectively an RPN (resilient packet network) and a RoI pooling layer in a Faster-RCNN sub-network and a RoI pooling layer in a positioning and refining sub-network;
(2.3) in the RPN network in the Faster-RCNN subnetwork, using a sliding window with the size of 3 x 3 for y1Performing standard convolution operation with step size of 1, and generating 12 anchor point frames with different scales with the center point of the sliding window as the center, wherein the anchor point frames are 16 × 16, 16 × 32, 32 × 016, 32 × 132, 32 × 64, 64 × 32, 64 × 128, 128 × 64, 128 × 256 and 256 × 128; generating an anchor point frame, outputting the anchor point frame through a Relu activation function in an RPN network, dividing the anchor point frame into two branches, wherein one branch is a classification loss branch, firstly performing point-by-point convolution operation on the branch with the output channel number of 18, and then performing 12 different pairs of different classes through a Softmax classifier in the RPN network Classifying anchor frames with scales, wherein each anchor frame outputs two probability values for distinguishing the target or the background of the image, and 24 probability values are output every time the anchor frame slides for 1 time; the other branch is a boundary regression loss branch, and after the branch is convolved point by an output channel with the number of 36, the boundary regression offset of an anchor frame is calculated through a boundary frame regression loss layer in the RPN network: one anchor point frame outputs 4 relative position coordinates, which are respectively the center coordinates (x) of the anchor point framea,ya) And width and height (w) of anchor boxesa,ha) Outputting 48 relative position coordinates of 12 anchor point frames after each sliding for 1 time, and finally, integrating the outputs of the classification loss branch and the boundary regression loss branch in the RPN network through a proposed layer in the RPN network to obtain a feature map y of the anchor point frame with the relative position coordinates2
(2.4) passing through the RPN network in the Faster-RCNN sub-network, the feature map y2And a feature map y obtained from the VGG16 network1Inputting the feature maps into a RoI pooling layer in a Faster-RCNN sub-network, and outputting the feature maps with non-uniform sizes into a feature map y with a size of 25 × 30 × 2563After passing through a full-link layer with Relu activation function in the Faster-RCNN sub-network, a regression result y is obtained through a regression loss layer in the Faster-RCNN sub-network 4
(2.5) regression result y obtained from regression loss layer in Faster-RCNN subnetwork4And a feature graph y obtained in the VGG16 network1Inputting the data into a RoI pooling layer in a positioning and refining sub-network, outputting a characteristic diagram with the size of 6 multiplied by 7 multiplied by 256, dividing an output result into two paths after passing through a full connection layer with a Relu activation function in the positioning and refining sub-network, and outputting position information y of a target in the remote sensing image through a regression loss layer in the positioning and refining sub-network5(ii) a The other path outputs the classification result y of the target in the remote sensing image through a Softmax classifier in the positioning and refining sub-network6
CN202010833754.0A 2020-08-18 2020-08-18 Improved Faster-RCNN remote sensing image target detection method Active CN111950488B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010833754.0A CN111950488B (en) 2020-08-18 2020-08-18 Improved Faster-RCNN remote sensing image target detection method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010833754.0A CN111950488B (en) 2020-08-18 2020-08-18 Improved Faster-RCNN remote sensing image target detection method

Publications (2)

Publication Number Publication Date
CN111950488A CN111950488A (en) 2020-11-17
CN111950488B true CN111950488B (en) 2022-07-19

Family

ID=73342137

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010833754.0A Active CN111950488B (en) 2020-08-18 2020-08-18 Improved Faster-RCNN remote sensing image target detection method

Country Status (1)

Country Link
CN (1) CN111950488B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112464769A (en) * 2020-11-18 2021-03-09 西北工业大学 High-resolution remote sensing image target detection method based on consistent multi-stage detection
CN112686139B (en) * 2020-12-29 2024-02-09 西安电子科技大学 Remote sensing image target detection method based on cross-stage local multiscale dense connection
CN113158789B (en) * 2021-03-15 2023-08-25 华南理工大学 Target detection method, system, device and medium for remote sensing image
CN113392803A (en) * 2021-06-30 2021-09-14 广东电网有限责任公司 Method and device for identifying suspended foreign matters of power transmission line, terminal and storage medium
DE102022108158A1 (en) 2022-04-05 2023-10-05 Ford Global Technologies, Llc Method and device for determining a component orientation

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107972662A (en) * 2017-10-16 2018-05-01 华南理工大学 To anti-collision warning method before a kind of vehicle based on deep learning
CN108830280A (en) * 2018-05-14 2018-11-16 华南理工大学 A kind of small target detecting method based on region nomination
CN110084195A (en) * 2019-04-26 2019-08-02 西安电子科技大学 Remote Sensing Target detection method based on convolutional neural networks
CN110210463A (en) * 2019-07-03 2019-09-06 中国人民解放军海军航空大学 Radar target image detecting method based on Precise ROI-Faster R-CNN
WO2020020472A1 (en) * 2018-07-24 2020-01-30 Fundación Centro Tecnoloxico De Telecomunicacións De Galicia A computer-implemented method and system for detecting small objects on an image using convolutional neural networks
CN110853015A (en) * 2019-11-12 2020-02-28 中国计量大学 Aluminum profile defect detection method based on improved Faster-RCNN
CN110929618A (en) * 2019-11-15 2020-03-27 国网江西省电力有限公司电力科学研究院 Potential safety hazard detection and evaluation method for power distribution network crossing type building
CN111259905A (en) * 2020-01-17 2020-06-09 山西大学 Feature fusion remote sensing image semantic segmentation method based on downsampling

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10839220B2 (en) * 2018-10-15 2020-11-17 Kepler Vision Technologies B.V. Method for categorizing a scene comprising a sub-scene with machine learning

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107972662A (en) * 2017-10-16 2018-05-01 华南理工大学 To anti-collision warning method before a kind of vehicle based on deep learning
CN108830280A (en) * 2018-05-14 2018-11-16 华南理工大学 A kind of small target detecting method based on region nomination
WO2020020472A1 (en) * 2018-07-24 2020-01-30 Fundación Centro Tecnoloxico De Telecomunicacións De Galicia A computer-implemented method and system for detecting small objects on an image using convolutional neural networks
CN110084195A (en) * 2019-04-26 2019-08-02 西安电子科技大学 Remote Sensing Target detection method based on convolutional neural networks
CN110210463A (en) * 2019-07-03 2019-09-06 中国人民解放军海军航空大学 Radar target image detecting method based on Precise ROI-Faster R-CNN
CN110853015A (en) * 2019-11-12 2020-02-28 中国计量大学 Aluminum profile defect detection method based on improved Faster-RCNN
CN110929618A (en) * 2019-11-15 2020-03-27 国网江西省电力有限公司电力科学研究院 Potential safety hazard detection and evaluation method for power distribution network crossing type building
CN111259905A (en) * 2020-01-17 2020-06-09 山西大学 Feature fusion remote sensing image semantic segmentation method based on downsampling

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
An Optimized Faster R-CNN Method Based on DRNet and RoI Align for Building Detection in Remote Sensing Images;Tong Bai等;《Remote Sensing》;20200226;第12卷(第5期);1-16 *
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks;Shaoqing Ren等;《arXiv:1506.01497v3》;20160106;1-14 *
基于下采样的特征融合遥感图像语义分割;李帅等;《测试技术学报》;20200616;第34卷(第04期);第331-337页 *
基于深度学习的遥感图像分类技术研究;李帅;《中国优秀硕士学位论文全文数据库_工程科技Ⅱ辑》;20210115;C028-204 *

Also Published As

Publication number Publication date
CN111950488A (en) 2020-11-17

Similar Documents

Publication Publication Date Title
CN111950488B (en) Improved Faster-RCNN remote sensing image target detection method
CN111091105B (en) Remote sensing image target detection method based on new frame regression loss function
CN112418117B (en) Small target detection method based on unmanned aerial vehicle image
CN108830188B (en) Vehicle detection method based on deep learning
CN109614985B (en) Target detection method based on densely connected feature pyramid network
CN108830285B (en) Target detection method for reinforcement learning based on fast-RCNN
CN106096561B (en) Infrared pedestrian detection method based on image block deep learning features
CN109919241B (en) Hyperspectral unknown class target detection method based on probability model and deep learning
CN112101278A (en) Hotel point cloud classification method based on k nearest neighbor feature extraction and deep learning
CN112132042A (en) SAR image target detection method based on anti-domain adaptation
CN109934216B (en) Image processing method, device and computer readable storage medium
CN109165658B (en) Strong negative sample underwater target detection method based on fast-RCNN
CN111274926B (en) Image data screening method, device, computer equipment and storage medium
CN109801305B (en) SAR image change detection method based on deep capsule network
CN108171119B (en) SAR image change detection method based on residual error network
CN111833353B (en) Hyperspectral target detection method based on image segmentation
CN113177456A (en) Remote sensing target detection method based on single-stage full convolution network and multi-feature fusion
CN114694178A (en) Method and system for monitoring safety helmet in power operation based on fast-RCNN algorithm
CN113221956A (en) Target identification method and device based on improved multi-scale depth model
CN110097067B (en) Weak supervision fine-grained image classification method based on layer-feed feature transformation
CN113128518B (en) Sift mismatch detection method based on twin convolution network and feature mixing
CN108257148B (en) Target suggestion window generation method of specific object and application of target suggestion window generation method in target tracking
CN114219936A (en) Object detection method, electronic device, storage medium, and computer program product
CN113780145A (en) Sperm morphology detection method, sperm morphology detection device, computer equipment and storage medium
CN113609895A (en) Road traffic information acquisition method based on improved Yolov3

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant