CN111950488B - Improved Faster-RCNN remote sensing image target detection method - Google Patents
Improved Faster-RCNN remote sensing image target detection method Download PDFInfo
- Publication number
- CN111950488B CN111950488B CN202010833754.0A CN202010833754A CN111950488B CN 111950488 B CN111950488 B CN 111950488B CN 202010833754 A CN202010833754 A CN 202010833754A CN 111950488 B CN111950488 B CN 111950488B
- Authority
- CN
- China
- Prior art keywords
- network
- remote sensing
- rcnn
- sensing image
- faster
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/10—Terrestrial scenes
- G06V20/13—Satellite images
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/32—Normalisation of the pattern dimensions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V2201/00—Indexing scheme relating to image or video recognition or understanding
- G06V2201/07—Target detection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V2201/00—Indexing scheme relating to image or video recognition or understanding
- G06V2201/08—Detecting or categorising vehicles
Abstract
The invention relates to the field of remote sensing image target detection, in particular to an improved Faster-RCNN remote sensing image target detection method. The method comprises the following steps: (1) dividing a remote sensing image data set into a training set and a testing set; (2) carrying out size transformation, standardization and normalization processing and data enhancement on the remote sensing images in the training set in sequence: (3) building an improved fast-RCNN remote sensing image target detection network; (4) training an improved fast-RCNN remote sensing image target detection network; (5) and testing the improved fast-RCNN remote sensing image target detection network. The method improves the average accuracy of target detection of the remote sensing image, particularly the average accuracy of small target detection, and reduces the probability of false detection and missed detection of the small target.
Description
Technical Field
The invention relates to the field of remote sensing image target detection, in particular to an improved Faster-RCNN remote sensing image target detection method.
Background
Object detection is one of the basic problems in computer vision recognition tasks, and has wide application in a plurality of fields. The target detection in the remote sensing image has wide application prospect in the aspects of military application, urban planning, environmental management and the like. Unlike target detection on natural images, targets on remote sensing images are much smaller than those on natural images, the size and orientation of the targets are diverse (e.g., playground, car, bridge, etc.), and the visual appearance of the target instances varies due to occlusion, shadows, lighting, resolution, and viewpoint variations. Therefore, detection of objects on remote sensing images is much more difficult than detection of objects on natural images.
In recent years, some researches introduce deep convolutional neural networks into target detection, which can automatically learn from data to feature representation with good robustness and strong expression capability, and the target detection method has greatly improved in speed and precision. The target detection algorithm based on candidate region extraction and the target detection algorithm based on regression are the most classic algorithms in the current deep convolutional neural network target detection algorithm, the candidate region is extracted from a given image based on the algorithm based on candidate region extraction, then classification and regression positioning are carried out on each extracted candidate region, and certain advantages are achieved in the accuracy of target detection; the regression-based target detection algorithm provides a single and integral convolutional neural network, and the target detection problem is reconstructed into a regression problem to directly predict the category and the position of the target, so that the regression-based target detection algorithm has certain advantages in the aspect of target detection speed.
Although the current target detection algorithm has a good effect in natural image target detection, the target detection of the remote sensing image still needs to be improved, and particularly, the detection effect of the small target in the remote sensing image is still not ideal, and the situations of target false detection and target missing detection are easy to occur.
Disclosure of Invention
Aiming at the defects in the prior art, the invention aims to provide an improved fast-RCNN remote sensing image target detection method, which can improve the target detection accuracy of a remote sensing image, reduce the probability of false detection and missed detection during target detection and has better generalization capability.
In order to achieve the purpose, the invention adopts the following technical scheme:
an improved method for detecting a target of a Faster-RCNN remote sensing image comprises the following steps:
(1.1) dividing a remote sensing image data set into a training set and a testing set;
(1.2) carrying out size transformation, normalization processing and data enhancement on the remote sensing images in the training set in sequence:
a. the size transformation is to set the size of the remote sensing image in the training set to 800 pixels multiplied by 960 pixels;
b. the normalization processing is to map each pixel value of the images in the training set to a range of 0-1;
c. the data enhancement is to rotate the remote sensing image normalized in the training set by 90 degrees, 180 degrees and 270 degrees and perform mirror image operation;
(1.3) constructing an improved fast-RCNN remote sensing image target detection network: the network consists of a Faster-RCNN sub-network and a positioning fine modification sub-network;
(1.4) training an improved Faster-RCNN remote sensing image target detection network: randomly configuring node parameters of a built improved fast-RCNN remote sensing image target detection network in advance, inputting the remote sensing images in the training set into the built improved fast-RCNN remote sensing image target detection network, and updating the node parameters in an improved fast-RCNN remote sensing image target detection network model in a random gradient descent mode until an optimal solution is found;
(1.5) testing an improved Faster-RCNN remote sensing image target detection network: and detecting the remote sensing images in the test set by using the trained improved Faster-RCNN remote sensing image target detection network, and analyzing the detection effect.
Further, the step (1.3) of building an improved fast-RCNN remote sensing image target detection network specifically comprises the following steps:
(2.1) inputting the remote sensing image into a VGG16 network in a Faster-RCNN sub-network, extracting texture, color and scale characteristics of a target in the remote sensing image, and obtaining a characteristic graph y after passing through a VGG16 network1The size is 50 × 60 × 256;
(2.2) obtaining a characteristic diagram y1The device is divided into three parallel branches which are respectively an RPN (resilient packet network) and a RoI pooling layer in a Faster-RCNN sub-network and a RoI pooling layer in a positioning and refining sub-network;
(2.3) in the RPN network in the Faster-RCNN sub-network, a sliding window of 3 x 3 size is used for y1Standard convolution with step size of 1An operation of generating 12 different scale anchor blocks with sizes of 16 × 16, 16 × 32, 32 × 016, 32 × 132, 32 × 64, 64 × 32, 64 × 128, 128 × 64, 128 × 256, and 256 × 128 centering on a center point of the sliding window every time the sliding window is slid; after generating an anchor point frame, outputting the anchor point frame through a Relu activation function in an RPN network, dividing the anchor point frame into two branches, wherein one branch is a classification loss branch, the branch firstly performs point-by-point convolution operation with the output channel number of 18, then classifies 12 anchor point frames with different scales through a Softmax classifier in the RPN network, each anchor point frame outputs two probability values to distinguish the target or background of an image, and 24 probability values are output after each sliding for 1 time; the other branch is a boundary regression loss branch, and after the branch is convolved point by an output channel with the number of 36, the boundary regression offset of an anchor frame is calculated through a boundary frame regression loss layer in the RPN network: one anchor point frame outputs 4 relative position coordinates, which are respectively the center coordinates (x) of the anchor point frame a,ya) And width and height (w) of anchor boxesa,ha) Outputting 48 relative position coordinates of 12 anchor point frames after each sliding for 1 time, and finally, integrating the outputs of the classification loss branch and the boundary regression loss branch in the RPN network through a proposed layer in the RPN network to obtain a feature map y of the anchor point frame with the relative position coordinates2;
(2.4) passing through the RPN network in the Faster-RCNN sub-network, the feature map y2And a feature map y obtained from the VGG16 network1Inputting the feature maps into a RoI pooling layer in a Faster-RCNN sub-network, and outputting the feature maps with non-uniform sizes into a feature map y with a size of 25 × 30 × 2563After passing through a full-link layer with Relu activation function in the Faster-RCNN sub-network, a regression result y is obtained through a regression loss layer in the Faster-RCNN sub-network4;
(2.5) regression results y obtained from regression lost layer in the Faster-RCNN subnetwork4And a feature graph y obtained in the VGG16 network1Inputting the feature map into the RoI pooling layer in the positioning and refining sub-network, outputting a feature map with the size of 6 multiplied by 7 multiplied by 256, and outputting the feature map after passing through a full connection layer with Relu activation function in the positioning and refining sub-networkThe obtained result is divided into two paths, one path outputs the position information y of the target in the remote sensing image through a regression loss layer in the positioning and refining sub-network 5(ii) a The other path outputs the classification result y of the target in the remote sensing image through a Softmax classifier in the positioning and refining sub-network6。
All regression loss layers (regressors) in the invention utilize a robust loss function to calculate the boundary regression offset of the anchor point frame.
Compared with the prior art, the invention has the following advantages: according to the method, on the basis of the existing Faster-RCNN network, anchor point frames of an RPN (resilient packet network) in the Faster-RCNN are increased to 12, and a positioning and refining sub-network with a RoI pooling layer is added to further detect the remote sensing image output by the Faster-RCNN network, so that the average accuracy of target detection of the remote sensing image is improved, and particularly the accuracy of detection of small targets such as automobiles and airplanes in the remote sensing image is improved.
Drawings
FIG. 1 is a schematic diagram of the improved fast-RCNN remote sensing image target detection network structure of the present invention;
FIG. 2 is a RPN network of the present invention;
FIG. 3 is a visual comparison of the improved Faster-RCNN remote sensing image object detection network of the present invention and the prior art method.
Detailed Description
The remote sensing image target detection data set used by the invention is from a NWPU VHR 10 data set made by Gong Cheng doctor, etc. of northwest industrial university, and the data sets respectively have 10 types, namely, an airplane (airflane), a ship (ship), a storage tank (storage), a baseball (baseball) diamond, a tennis court (tenis court), a basketball court (basketball court), a track and field (harbor), a bridge (bridge) and a vehicle (vehicle). The data set is a data set containing 800 high resolution remote sensing images, wherein the negative sample data set comprises 150 images which do not belong to any category. The sizes of the targets to be detected vary widely, with the largest target size being about 418 × 418 and the smallest target size being 33 × 33.
Referring to fig. 1, 2 and 3, the improved method for detecting the target of the fast-RCNN remote sensing image disclosed by the invention comprises the following steps:
(1.1) dividing a remote sensing image data set into a training set and a testing set, wherein 80% of the training set used for network training and 20% of the testing set used for network testing keep the data distribution consistency of different types of samples in the training set and the testing set as much as possible;
(1.2) carrying out size transformation, normalization processing and data enhancement on the remote sensing images in the training set in sequence:
a. the size transformation is to set the size of the remote sensing image in the training set to 800 pixels multiplied by 960 pixels;
b. the normalization processing is to map each pixel value of the remote sensing image in the training set to a range of 0-1;
c. the data enhancement is to rotate the remote sensing image normalized in the training set by 90 degrees, 180 degrees, 270 degrees and mirror image operation, so as to ensure the robustness of the improved Faster-RCNN remote sensing image target detection network;
(1.3) constructing an improved fast-RCNN remote sensing image target detection network: the network consists of a Faster-RCNN sub-network and a positioning fine modification sub-network, wherein the Faster-RCNN sub-network is used for carrying out primary target detection on a remote sensing image, and the positioning fine modification sub-network is used for further detecting the output of the Faster-RCNN network, so that the problems of inaccurate positioning, missing detection and error detection of a target are solved;
The fast-RCNN sub-network consists of a VGG16 network, an RPN network, a RoI pooling layer, a full connection layer (FC) with a Relu activation function and a regression loss layer (Regressor); the positioning refinement subnetwork consists of a RoI Pooling layer (RoI Pooling), a full connection layer with Relu activation function, a Softmax classifier and a regression loss layer (Regressor).
The RPN network in the Faster-RCNN subnetwork comprises a standard convolution layer (Conv2d) with the size of 3 x 3, a Relu activation function, two point-by-point convolution layers (Pwise), a Softmax classifier, a bounding box regression loss layer (Bbox Regressor) and a proposed layer (Propusal).
The method for building the improved fast-RCNN remote sensing image target detection network comprises the following specific steps:
(2.1) inputting the remote sensing image into a VGG16 network in a Faster-RCNN sub-network, wherein the VGG16 network comprises 13 convolutional layers (Conv2d) with a Relu activation function behind the convolutional layers and 4 Pooling layers (Pooling), an input feature map is activated through a Relu activation function after passing through each convolutional layer, wherein the maximum Pooling operation is carried out through one Pooling layer after the 2 nd, 4 th, 7 th and 10 th convolutional layers, each convolutional layer adopts standard convolution with the size of 3 x 3, the filling number (Padding) is 1, the step size is 1, each Pooling layer adopts maximum Pooling, the size of a Pooling kernel is 2 x 2, the step size is 2, and a feature map y obtained after texture, color and scale features of an object in the remote sensing image are extracted through the VGG16 network 1The size is 50 × 60 × 256, and the network configuration of VGG16 is shown in table 1:
table 1 VGG16 network configuration table
(2.2) feature map y to be obtained1The device is divided into three parallel branches which are respectively an RPN (resilient packet network) and a RoI pooling layer in a Faster-RCNN sub-network and a RoI pooling layer in a positioning and refining sub-network;
(2.3) in the RPN network in the Faster-RCNN subnetwork, using a sliding window with the size of 3 x 3 for y1Performing standard convolution operation with step size of 1, and generating 12 different scale anchor point frames with the size of 16 × 16, 16 × 32, 32 × 016, 32 × 132, 32 × 64, 64 × 32, 64 × 128, 128 × 64, 128 × 256 and 256 × 128 by taking the central point of the sliding window as the center once sliding; generating anchor point frame, outputting via Relu activating function in RPN network, dividing into two branches, one branch being classified loss branch, and passing through an output channelAfter 18 point-by-point convolution operations, classifying 12 anchor frames with different scales through a Softmax classifier in an RPN, outputting two probability values for each anchor frame to distinguish the target or background of an image, and outputting 24 probability values every 1 sliding time; the other branch is a boundary regression loss branch, and after the branch is convolved point by point with an output channel number of 36, the boundary regression offset of an anchor point frame is calculated through a boundary frame regression loss layer (Bbox _ Regressor) in the RPN network: one anchor point frame outputs 4 relative position coordinates, which are respectively the center coordinates (x) of the anchor point frame a,ya) And width and height (w) of anchor boxesa,ha) Every time the sliding is performed for 1 time, the 12 anchor point frames output 48 relative position coordinates; finally, the outputs of the classification loss branch and the boundary regression loss branch in the RPN network are integrated through a proposed layer in the RPN network to obtain a feature map y of the anchor point frame with relative position coordinate values2The proposed layer (Proposal) adopts a non-maximum suppression algorithm (NMS) to realize the preliminary screening of the anchor point frame and remove the anchor point frame beyond the image boundary;
(2.4) passing through the RPN network in the Faster-RCNN sub-network, the feature map y2And a feature map y obtained from the VGG16 network1Inputting the feature map into a RoI pooling layer in a Faster-RCNN subnetwork, and outputting the feature map with non-uniform size as a feature map y with the size of 25 x 30 x 2563After passing through a full connection layer (FC) with a Relu activation function in the Faster-RCNN sub-network, a regression result y is obtained through a regression loss layer (Regressor) in the Faster-RCNN sub-network4;
(2.5) regression results y obtained from regression lost layer in the Faster-RCNN subnetwork4And the feature map y obtained in the VGG16 network1Inputting the data into a RoI pooling layer in a positioning and refining sub-network, outputting a characteristic diagram with the size of 6 multiplied by 7 multiplied by 256, dividing an output result into two paths after passing through a full connection layer (FC) with a Relu activation function in the positioning and refining sub-network, and outputting position information y of a target in the remote sensing image through a regression loss layer (Regressor) in the positioning and refining sub-network 5(ii) a The other path is output by a Softmax classifier in the positioning and refining sub-networkThe classification result y of the target in the remote sensing image6。
(1.4) training an improved Faster-RCNN remote sensing image target detection network: firstly, randomly configuring node parameters of a built improved Faster-RCNN remote sensing image target detection network, inputting a remote sensing image in a training set into the built improved Faster-RCNN remote sensing image target detection network, updating the node parameters in an improved Faster-RCNN remote sensing image target detection network model in a random gradient descending mode according to the descending direction in each iteration process until an optimal solution is found, and stopping iteration.
The hardware conditions and parameter configurations for training the network are shown as S401 and S402:
s401, the method adopts a computer with a CPU as an Intel Core i7-9700 processor, a display card configured as an Nvidia GeForce GTX 10606GB and a total memory capacity of 16G, and builds an algorithm frame by using a Pythroch.
S402, the parameters in the network are updated by adopting a random gradient descent algorithm, a pre-training model is a resnet50 network, the network is quickly converged to the optimum by adopting a dynamic learning rate, the initial learning rate is set to be 0.001, the learning rate is multiplied by 0.1 every 4000 iterations, 10 ten thousand iterations are performed, and the threshold value of non-maximum suppression (NMS) is set to be 0.7.
(1.5) testing an improved Faster-RCNN remote sensing image target detection network: and detecting the remote sensing images in the test set by using the trained improved Faster-RCNN remote sensing image target detection network, analyzing the detection effect, and selecting the average accuracy (mAP) as an evaluation index for measuring the detection effect of the remote sensing image target.
The present embodiment is described in further detail below:
standard convolution (Conv2 d): the calculation formula of the standard convolution is shown in formula (1):
Conv2d(W,b,x)=W·x+b (1)
wherein, W is the weight of the convolution kernel, x is the input characteristic diagram, b is the offset term parameter, M is the input channel number, V and U are the width and height of the convolution kernel respectively, and N is the output channel number.
Point-by-point convolution (Pwise): convolution kernel W of point-by-point convolutionpThe size is 1 × 1 × N, N is the number of output channels, and if the input image size is h × d × M, the size of the output feature map is h × d × N. The point-by-point convolution calculation is shown in equation (2):
Pwise(Wp,x)=Wp·x (2)
relu activation function: the mathematical formula is relu (x) max (0, x), where max () represents the larger of 0 and x;
pooling layer (Pooling): each pooling layer adopts maximum pooling, the size of a pooling core is 2 multiplied by 2, and the step length is 2;
RoI pooling layer (roupooling): the specific operation of the RoI pooling layer is divided into three steps, wherein in the first step, an interested area is mapped to a position corresponding to a characteristic diagram according to the input characteristic diagram; the second step divides the mapped area into parts with the same size (the number is the same as the output dimension); and thirdly, performing maximum pooling operation on each divided part. Through the three steps, the characteristic graphs with different sizes can be output as the characteristic graphs with fixed sizes, and the size of the output characteristic graph is irrelevant to the size of the RoI pooling layer and the size of the input characteristic graph.
Full connection layer (FC): each neuron of the full connection layer is completely connected with the neuron of the previous layer;
average accuracy (mAP): the calculation formula is shown in formula (3):
assume a total of k +1 classes (including a null or background class), pijThe number that is originally in class i but predicted to be in class j is called false positive; p is a radical ofiiTo truly classify the correct number;
softmax classifier: the Softmax classifier is generally used for multi-classification problems, minimizing the Softmax loss function by training the network. Suppose that for a size J dataset { (x)(1),y(1)),…(x(m),y(m)),…(x(J),y(J)) For each sample in the dataset, there is a label for the correct classification, i.e. label value: { y(1),…,y(k)K is the number of classes. For the mth sample, which corresponds to a category j, there is a probability, also called score value, and the score value of the mth sample is shown in formula (4):
wherein θ ═ θ0,θ1…θk-1) Is a parameter to be optimized, y(m)Denoted is the m-th sample label, x(m)Denotes the m-th sample, h (x)(m)) Represents the score value of the m-th sample byThis term normalizes the probability distribution so that the sum of the probabilities is 1.
NMS: a Non-Maximum Suppression algorithm (Non-Maximum Suppression), which is an algorithm for removing Non-maxima, and the algorithm steps are as follows:
Assuming that the object to be recognized is surrounded by F candidate frames, the nth candidate frame is calculated by the classifier as snN is more than or equal to 1 and less than or equal to F: (1) a set H is newly established, F candidate frames are placed into the set, and an empty set T is newly established; (2) sorting all candidate frames in the set H according to the score values of the classifier, and putting the frame T with the highest score into the set T; (3) traversing the candidate frames in the set H, respectively performing cross-comparison operation with the frame t, if the number of the candidate frames is higher than a certain threshold value, considering that the frame is overlapped with the frame t, and deleting the frame from the set H; (4) and (4) returning to the step (2) to continue the iteration until the set H is an empty set. The boxes in the set T are what we need.
Regression loss layer (Regressor): regression loss function LregAs shown in equation (5):
Lreg(tn,vn)=∑c∈{x,y,w,h}smoothL1(tc-vc) (5)
wherein, SmoothL1The loss function for robustness is shown in equation (6):
wherein v isn=(vx,vy,vw,vh) Is the coordinate vector of the real frame, tn=(tx,ty,tw,th) Is the coordinate vector where the prediction box is located. The four coordinate calculation formulas are as follows:
x and y are the center coordinates of the prediction box, w and h are the width and height of the prediction box, respectively, xaAnd yaCenter coordinates, w, of anchor boxes generated for RPN networksaAnd haWidth and height, x, of anchor boxes generated in the RPN network, respectively *And y*Is the center coordinate of the real frame, w*And h*Respectively the width and height of the real box.
Bounding box regression loss layer (Bbox Regressor): the loss function of the bounding box regression loss layer in the RPN network is defined as:
the loss function is divided into two parts which,is a function of the classification loss, with p for the outputnIs represented by LregFor the regression loss function, output tnIs represented by LregIs a regression loss function expressed by the formula (5); n is an anchor frame index; p is a radical ofnFor the probability of the target being contained in the nth anchor box,indicating that if the nth anchor block contains a target, it is 1, otherwise it is 0; n is a radical ofregIndicating the number of anchor boxes containing objects in the RPN network, NclsThe total number of anchor boxes; λ is the weight.
The advantages and disadvantages of the present invention were further analyzed by comparing the method of the present invention with the results of the fast-RCNN assay (see Table 2).
TABLE 2 average accuracy of the present invention and prior methods
As can be seen from table 2, the average accuracy (mAP) of the method of the present invention is improved, and for a vehicle with a small target, since the background is complex and is easily occluded by a shadow, the accuracy in the existing method is low, but the network of the present invention improves the average accuracy of the vehicle by about 7%, which indicates that the detection effect of the small target is effectively improved by using the positioning fine trimming sub-network; meanwhile, for a larger target such as a bridge, the average accuracy rate is not high in the method and the prior art, because the data set is connected with the bridge and the road in a long strip shape, and the color characteristics and the texture characteristics are similar, the bridge is easily identified as the road in the detection process, so that the average accuracy rate of the bridge is low; for the oil tank which is a target with dense distribution in the image, the accuracy of the invention and the existing network detection is low, so that the network needs to be continuously improved to detect the target with high distribution density.
Meanwhile, under the data set, the average accuracy of the improved network is tested by setting different anchor point frames, as shown in table 3.
TABLE 3 average accuracy for different anchor frame numbers
Number of anchor boxes | mAP(%) |
3 | 78.2 |
6 | 80.6 |
9 | 81.5 |
12 | 83.1 |
15 | 82.6 |
As can be seen from table 3, the average accuracy steadily increases when the NWPU VHR 10 data set is trained with different anchor boxes, ranging from 3 to 12, while the average accuracy slightly decreases when over 12. This shows that the number of anchor frames can effectively improve the average accuracy within a certain range, and when the number exceeds a certain range, the accuracy of target detection cannot be improved by simply increasing the number of anchor frames, which not only increases the extra calculation amount of the network, but also increases the overfitting risk of the network, thereby increasing the complexity of the network.
As shown in FIG. 3, the improved method for detecting the target of the fast-RCNN remote sensing image is compared with the visualization chart of the prior art, and the visualization chart of the invention is arranged in two columns and three rows in total, wherein the first column from left to right is the visualization chart of the invention, and the second column is the visualization chart of the prior art. As can be seen from the first line of images, the existing method has large displacement on the prediction frame of the bridge and small deviation on the positioning of the ship; the second row of images contains 5 vehicle targets, and the prior method has the problems that for the vehicle with a smaller target, the detection of one vehicle is missed, the detection of one vehicle is mistaken, and the positioning of other three vehicles is slightly deviated; the third row of images contains 5 airplanes, and it can be seen from the figure that the existing method has a missed detection situation for airplanes.
The above embodiments are described in detail for the purpose of further illustrating the present invention and should not be construed as limiting the scope of the present invention, and the skilled engineer can make insubstantial modifications and variations of the present invention based on the above disclosure.
Claims (1)
1. An improved method for detecting a target of a Faster-RCNN remote sensing image is characterized by comprising the following steps:
(1.1) dividing a remote sensing image data set into a training set and a testing set;
(1.2) carrying out size transformation, normalization processing and data enhancement on the remote sensing images in the training set in sequence:
a. the size transformation is to set the size of the remote sensing image in the training set to 800 pixels multiplied by 960 pixels;
b. the normalization processing is to map each pixel value of the images in the training set to a range of 0-1;
c. the data enhancement is to rotate the remote sensing image normalized in the training set by 90 degrees, 180 degrees and 270 degrees and perform mirror image operation;
(1.3) constructing an improved fast-RCNN remote sensing image target detection network: the network consists of a Faster-RCNN sub-network and a positioning and refining sub-network;
(1.4) training an improved Faster-RCNN remote sensing image target detection network: randomly configuring node parameters of the established improved fast-RCNN remote sensing image target detection network in advance, inputting the remote sensing images in the training set into the established improved fast-RCNN remote sensing image target detection network, and updating the node parameters in the improved fast-RCNN remote sensing image target detection network model in a random gradient descent mode until an optimal solution is found;
(1.5) testing an improved Faster-RCNN remote sensing image target detection network: detecting the remote sensing images in the test set by using a trained improved Faster-RCNN remote sensing image target detection network, and analyzing the detection effect;
the step (1.3) of establishing the improved Faster-RCNN remote sensing image target detection network comprises the following specific steps:
(2.1) inputting the remote sensing image into a VGG16 network in a Faster-RCNN sub-network, extracting texture, color and scale features of a target in the remote sensing image, and obtaining a feature map y after passing through a VGG16 network1The size is 50 × 60 × 256;
(2.2) feature map y to be obtained1The device is divided into three parallel branches which are respectively an RPN (resilient packet network) and a RoI pooling layer in a Faster-RCNN sub-network and a RoI pooling layer in a positioning and refining sub-network;
(2.3) in the RPN network in the Faster-RCNN subnetwork, using a sliding window with the size of 3 x 3 for y1Performing standard convolution operation with step size of 1, and generating 12 anchor point frames with different scales with the center point of the sliding window as the center, wherein the anchor point frames are 16 × 16, 16 × 32, 32 × 016, 32 × 132, 32 × 64, 64 × 32, 64 × 128, 128 × 64, 128 × 256 and 256 × 128; generating an anchor point frame, outputting the anchor point frame through a Relu activation function in an RPN network, dividing the anchor point frame into two branches, wherein one branch is a classification loss branch, firstly performing point-by-point convolution operation on the branch with the output channel number of 18, and then performing 12 different pairs of different classes through a Softmax classifier in the RPN network Classifying anchor frames with scales, wherein each anchor frame outputs two probability values for distinguishing the target or the background of the image, and 24 probability values are output every time the anchor frame slides for 1 time; the other branch is a boundary regression loss branch, and after the branch is convolved point by an output channel with the number of 36, the boundary regression offset of an anchor frame is calculated through a boundary frame regression loss layer in the RPN network: one anchor point frame outputs 4 relative position coordinates, which are respectively the center coordinates (x) of the anchor point framea,ya) And width and height (w) of anchor boxesa,ha) Outputting 48 relative position coordinates of 12 anchor point frames after each sliding for 1 time, and finally, integrating the outputs of the classification loss branch and the boundary regression loss branch in the RPN network through a proposed layer in the RPN network to obtain a feature map y of the anchor point frame with the relative position coordinates2;
(2.4) passing through the RPN network in the Faster-RCNN sub-network, the feature map y2And a feature map y obtained from the VGG16 network1Inputting the feature maps into a RoI pooling layer in a Faster-RCNN sub-network, and outputting the feature maps with non-uniform sizes into a feature map y with a size of 25 × 30 × 2563After passing through a full-link layer with Relu activation function in the Faster-RCNN sub-network, a regression result y is obtained through a regression loss layer in the Faster-RCNN sub-network 4;
(2.5) regression result y obtained from regression loss layer in Faster-RCNN subnetwork4And a feature graph y obtained in the VGG16 network1Inputting the data into a RoI pooling layer in a positioning and refining sub-network, outputting a characteristic diagram with the size of 6 multiplied by 7 multiplied by 256, dividing an output result into two paths after passing through a full connection layer with a Relu activation function in the positioning and refining sub-network, and outputting position information y of a target in the remote sensing image through a regression loss layer in the positioning and refining sub-network5(ii) a The other path outputs the classification result y of the target in the remote sensing image through a Softmax classifier in the positioning and refining sub-network6。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010833754.0A CN111950488B (en) | 2020-08-18 | 2020-08-18 | Improved Faster-RCNN remote sensing image target detection method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010833754.0A CN111950488B (en) | 2020-08-18 | 2020-08-18 | Improved Faster-RCNN remote sensing image target detection method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111950488A CN111950488A (en) | 2020-11-17 |
CN111950488B true CN111950488B (en) | 2022-07-19 |
Family
ID=73342137
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010833754.0A Active CN111950488B (en) | 2020-08-18 | 2020-08-18 | Improved Faster-RCNN remote sensing image target detection method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111950488B (en) |
Families Citing this family (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112464769A (en) * | 2020-11-18 | 2021-03-09 | 西北工业大学 | High-resolution remote sensing image target detection method based on consistent multi-stage detection |
CN112686139B (en) * | 2020-12-29 | 2024-02-09 | 西安电子科技大学 | Remote sensing image target detection method based on cross-stage local multiscale dense connection |
CN113158789B (en) * | 2021-03-15 | 2023-08-25 | 华南理工大学 | Target detection method, system, device and medium for remote sensing image |
CN113392803A (en) * | 2021-06-30 | 2021-09-14 | 广东电网有限责任公司 | Method and device for identifying suspended foreign matters of power transmission line, terminal and storage medium |
DE102022108158A1 (en) | 2022-04-05 | 2023-10-05 | Ford Global Technologies, Llc | Method and device for determining a component orientation |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107972662A (en) * | 2017-10-16 | 2018-05-01 | 华南理工大学 | To anti-collision warning method before a kind of vehicle based on deep learning |
CN108830280A (en) * | 2018-05-14 | 2018-11-16 | 华南理工大学 | A kind of small target detecting method based on region nomination |
CN110084195A (en) * | 2019-04-26 | 2019-08-02 | 西安电子科技大学 | Remote Sensing Target detection method based on convolutional neural networks |
CN110210463A (en) * | 2019-07-03 | 2019-09-06 | 中国人民解放军海军航空大学 | Radar target image detecting method based on Precise ROI-Faster R-CNN |
WO2020020472A1 (en) * | 2018-07-24 | 2020-01-30 | Fundación Centro Tecnoloxico De Telecomunicacións De Galicia | A computer-implemented method and system for detecting small objects on an image using convolutional neural networks |
CN110853015A (en) * | 2019-11-12 | 2020-02-28 | 中国计量大学 | Aluminum profile defect detection method based on improved Faster-RCNN |
CN110929618A (en) * | 2019-11-15 | 2020-03-27 | 国网江西省电力有限公司电力科学研究院 | Potential safety hazard detection and evaluation method for power distribution network crossing type building |
CN111259905A (en) * | 2020-01-17 | 2020-06-09 | 山西大学 | Feature fusion remote sensing image semantic segmentation method based on downsampling |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10839220B2 (en) * | 2018-10-15 | 2020-11-17 | Kepler Vision Technologies B.V. | Method for categorizing a scene comprising a sub-scene with machine learning |
-
2020
- 2020-08-18 CN CN202010833754.0A patent/CN111950488B/en active Active
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107972662A (en) * | 2017-10-16 | 2018-05-01 | 华南理工大学 | To anti-collision warning method before a kind of vehicle based on deep learning |
CN108830280A (en) * | 2018-05-14 | 2018-11-16 | 华南理工大学 | A kind of small target detecting method based on region nomination |
WO2020020472A1 (en) * | 2018-07-24 | 2020-01-30 | Fundación Centro Tecnoloxico De Telecomunicacións De Galicia | A computer-implemented method and system for detecting small objects on an image using convolutional neural networks |
CN110084195A (en) * | 2019-04-26 | 2019-08-02 | 西安电子科技大学 | Remote Sensing Target detection method based on convolutional neural networks |
CN110210463A (en) * | 2019-07-03 | 2019-09-06 | 中国人民解放军海军航空大学 | Radar target image detecting method based on Precise ROI-Faster R-CNN |
CN110853015A (en) * | 2019-11-12 | 2020-02-28 | 中国计量大学 | Aluminum profile defect detection method based on improved Faster-RCNN |
CN110929618A (en) * | 2019-11-15 | 2020-03-27 | 国网江西省电力有限公司电力科学研究院 | Potential safety hazard detection and evaluation method for power distribution network crossing type building |
CN111259905A (en) * | 2020-01-17 | 2020-06-09 | 山西大学 | Feature fusion remote sensing image semantic segmentation method based on downsampling |
Non-Patent Citations (4)
Title |
---|
An Optimized Faster R-CNN Method Based on DRNet and RoI Align for Building Detection in Remote Sensing Images;Tong Bai等;《Remote Sensing》;20200226;第12卷(第5期);1-16 * |
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks;Shaoqing Ren等;《arXiv:1506.01497v3》;20160106;1-14 * |
基于下采样的特征融合遥感图像语义分割;李帅等;《测试技术学报》;20200616;第34卷(第04期);第331-337页 * |
基于深度学习的遥感图像分类技术研究;李帅;《中国优秀硕士学位论文全文数据库_工程科技Ⅱ辑》;20210115;C028-204 * |
Also Published As
Publication number | Publication date |
---|---|
CN111950488A (en) | 2020-11-17 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111950488B (en) | Improved Faster-RCNN remote sensing image target detection method | |
CN111091105B (en) | Remote sensing image target detection method based on new frame regression loss function | |
CN112418117B (en) | Small target detection method based on unmanned aerial vehicle image | |
CN108830188B (en) | Vehicle detection method based on deep learning | |
CN109614985B (en) | Target detection method based on densely connected feature pyramid network | |
CN108830285B (en) | Target detection method for reinforcement learning based on fast-RCNN | |
CN106096561B (en) | Infrared pedestrian detection method based on image block deep learning features | |
CN109919241B (en) | Hyperspectral unknown class target detection method based on probability model and deep learning | |
CN112101278A (en) | Hotel point cloud classification method based on k nearest neighbor feature extraction and deep learning | |
CN112132042A (en) | SAR image target detection method based on anti-domain adaptation | |
CN109934216B (en) | Image processing method, device and computer readable storage medium | |
CN109165658B (en) | Strong negative sample underwater target detection method based on fast-RCNN | |
CN111274926B (en) | Image data screening method, device, computer equipment and storage medium | |
CN109801305B (en) | SAR image change detection method based on deep capsule network | |
CN108171119B (en) | SAR image change detection method based on residual error network | |
CN111833353B (en) | Hyperspectral target detection method based on image segmentation | |
CN113177456A (en) | Remote sensing target detection method based on single-stage full convolution network and multi-feature fusion | |
CN114694178A (en) | Method and system for monitoring safety helmet in power operation based on fast-RCNN algorithm | |
CN113221956A (en) | Target identification method and device based on improved multi-scale depth model | |
CN110097067B (en) | Weak supervision fine-grained image classification method based on layer-feed feature transformation | |
CN113128518B (en) | Sift mismatch detection method based on twin convolution network and feature mixing | |
CN108257148B (en) | Target suggestion window generation method of specific object and application of target suggestion window generation method in target tracking | |
CN114219936A (en) | Object detection method, electronic device, storage medium, and computer program product | |
CN113780145A (en) | Sperm morphology detection method, sperm morphology detection device, computer equipment and storage medium | |
CN113609895A (en) | Road traffic information acquisition method based on improved Yolov3 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |