CN113850761A - Remote sensing image target detection method based on multi-angle detection frame - Google Patents

Remote sensing image target detection method based on multi-angle detection frame Download PDF

Info

Publication number
CN113850761A
CN113850761A CN202111007113.0A CN202111007113A CN113850761A CN 113850761 A CN113850761 A CN 113850761A CN 202111007113 A CN202111007113 A CN 202111007113A CN 113850761 A CN113850761 A CN 113850761A
Authority
CN
China
Prior art keywords
angle
size
remote sensing
frame
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111007113.0A
Other languages
Chinese (zh)
Other versions
CN113850761B (en
Inventor
王素玉
许凯焱
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Technology
Original Assignee
Beijing University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Technology filed Critical Beijing University of Technology
Priority to CN202111007113.0A priority Critical patent/CN113850761B/en
Publication of CN113850761A publication Critical patent/CN113850761A/en
Application granted granted Critical
Publication of CN113850761B publication Critical patent/CN113850761B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10032Satellite or aerial image; Remote sensing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20016Hierarchical, coarse-to-fine, multiscale or multiresolution image processing; Pyramid transform

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Quality & Reliability (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a remote sensing image target detection method based on a multi-angle detection frame.A tilt angle module is designed on the basis of a positive frame predicted by a fast-rcnn, and the method is mainly divided into two stages; in the first stage, the full-connection layer and the decoder are used for carrying out preliminary angle deviation rotation, in the second stage, a rotated roi alignment is used for extracting rotation invariant features, and angle deviation correction is carried out again to obtain a detection frame with an accurate angle. In addition, aiming at the problems of large size and slow training of the remote sensing image, the regression loss function of the inclination detection module is redesigned, so that the convergence of the loss function is faster, and the accuracy is higher. The experimental result shows that the accuracy of the invention is improved by 4.4% compared with the improved fast-rcnn, and the invention is proved to have good detection effect.

Description

Remote sensing image target detection method based on multi-angle detection frame
Technical Field
The invention belongs to the field of target detection in computer vision, and relates to a method for detecting a target type and marking a target position of a picture by using a convolutional neural network, which has higher accuracy compared with the current remote sensing target detection method, such as ROI-Transfomer, SCRdet, R3Det and the like, and can detect the inclination direction according to the characteristics of a remote sensing image.
Background
In recent years, the aerospace industry in China is rapidly developed, the remote sensing satellite technology is continuously improved, a large number of images are acquired by the satellite every day for various purposes, the satellite with the visible light camera is the most common, the visible light remote sensing image is the most visual, and the target in the image can be conveniently distinguished. However, the traditional detection algorithm needs manual feature extraction, the identification performance cannot meet daily needs, and the external influence factors are very large. However, with the continuous development of deep learning, the convolutional neural network is utilized to automatically extract characteristics, so that the cost of manual operation is greatly reduced, and the accuracy is obviously improved. However, the accuracy of the most advanced detector still cannot completely meet the current actual needs, and the problem of insufficient accuracy still exists in the field, and needs to be solved urgently.
At present, remote sensing target detection methods based on convolutional neural networks have made great progress, such as R3Det method, PIOU method, DRN and the like based on single-stage networks, R2CNN method, RRPN method, ROI-Transfomer method, SCRDet method and the like based on dual-stage networks. Although the above method has incomparable advantages compared with the conventional method, and achieves quite high accuracy on the main stream data sets DOTA and HRSC2016, the method still has the problem of insufficient accuracy, and still has a relatively large promotion space.
Disclosure of Invention
Aiming at the problem of insufficient accuracy of the algorithm, the invention designs a target detection algorithm based on a multi-angle remote sensing image, and compared with the algorithm, the target detection algorithm is improved to different degrees.
The invention adopts the following technical scheme: a target detection algorithm based on a multi-angle remote sensing image. The specific process of target detection is as follows: firstly, preprocessing and data enhancement are carried out on pictures, then the pictures are sent into the convolutional neural network provided by the invention, the features of the pictures are extracted through a backbone network, then the features are sent into an RPN network to generate a specific number of prousals, then the pro usals need to be subjected to roi alignment to output a feature graph with a fixed size of 7 multiplied by 7, then the feature graph enters a full connection layer to output a positive frame, then the output positive frame is subjected to angular offset regression through a full connection layer and a decoder, and finally the angle correction is carried out through a rotated roi alignment to obtain a final detection result.
(1) Data preprocessing: the present invention discloses data sets using DOTA. In order to facilitate training and prediction, the width and the height of an input network image are limited, the size of the input network image is 1024 × 1024 during training and prediction, and if the original size is larger than 1024 × 1024 during training and prediction, a picture is divided into a plurality of pictures with the size of 1024 × 1024 according to a sliding window with the step length of 512; if the original size is smaller than 1024 × 1024, it will be filled up with a black background, so that the size of the data can be preprocessed without losing boundary information.
(2) Data enhancement: aiming at the problems of a plurality of small target objects of the remote sensing image, a data enhancement strategy is designed, the ratio of the regression loss of a frame with the calculation area smaller than 32 multiplied by 32 to the total regression loss of the whole picture is calculated in each iteration during training, if the ratio is smaller than 0.4, the contribution rate of the loss of the small target of the iteration to the total regression loss is considered to be insufficient, four pictures of a training set are selected randomly in the next iteration, the length and the width of each picture are shortened to 1/2, the four pictures are spliced into a new picture, the coordinates of the corresponding ground truth are modified, and the picture is sent to network training.
(3) Setting and training a model: the network model mainly comprises a trunk convolution neural network, a characteristic pyramid, an RPN full convolution neural network, an ROI classifier and an inclination angle regression network. The main network is designed to be a residual error network with the depth of 152 layers and is divided into 5 parts, wherein convolutions of each part are connected in parallel, the features extracted through 4 times of down-sampling operation of the main network enter a feature pyramid to be subjected to 3 times of up-sampling operation and 1 time of maximum pooling and are fused, the number of output channels of the feature layer is 256, then the output channels are sent to an RPN full convolution neural network to generate proposal, then roi align is performed to map the features to be 7x7, then classification and regression are performed through the full connection layer to obtain a positive frame, finally the positive frame is sent to an inclination angle regression network, initial angle deviation regression is performed through the full connection layer and a decoder, and then a rotated roi align is performed to extract rotation invariant features for obtaining a final detection result.
In the training process, a pre-training model ResNet152 is used, a cross entropy loss function is adopted for the classification loss of the RPN classifier and the ROI classifier, a SmoothL1 loss function is adopted for the regression loss, the cross entropy loss function is still adopted for the classification loss in the tilt angle network, and the regression loss function is redesigned. The optimizer employs an SGD optimizer using momentum with an initial learning rate of 0.00125 for a total of 15 iterations.
(4) Model prediction: after model training is completed, the training model is stored, the trained model parameters are loaded, test data of any size is input, and the type and the position of an object in the data can be acquired end to end. This stage is only to load the trained model parameters, and data enhancement is not used for this step.
The evaluation index is the average accuracy (map). On the DOTA data set to be tested, the performance of the whole algorithm is evaluated, the method obtains a competitive result, compared with the current commonly used single-stage and double-stage algorithms, the method has higher accuracy and identification performance, and can identify data with any size end to end.
Drawings
Fig. 1 is a schematic overall flow chart of the method of the present invention.
Fig. 2 is a schematic diagram of a convolutional neural network structure according to the present invention.
Fig. 3 is a schematic diagram of a data enhancement result according to the present invention.
Detailed Description
The following detailed description of embodiments of the invention is provided in conjunction with the accompanying drawings:
a remote sensing image target detection method based on a multi-angle detection frame is disclosed. As shown in fig. 1, the detection process includes preprocessing an image and enhancing data (only applied to a training stage), dividing the image into 1024 × 1024 sizes, sending the image to a main network for down-sampling to extract features, sending the image to a feature pyramid for up-sampling, fusing features of different levels in the process, sending the image to an RPN full convolution neural network to generate propofol, then performing roi alignment on the feature map to 7 × 7 sizes, classifying and regressing, and finally sending the image to an oblique angle regression network for final classification and regression to obtain a detection result.
The specific algorithm is referred to as follows:
(1) data preprocessing: preprocessing is carried out on the DOTA data set which is most famous in the current remote sensing target detection. Considering the characteristics of remote sensing images, the resolution of a plurality of pictures is very high, the length and the width of the pictures need to be limited for the convenience of training, predicting and saving computer performance, the input pictures are uniformly limited to be 1024 multiplied by 1024, 4 times of downsampling, 3 times of upsampling and 1 time of maximum pooling fusion characteristics are carried out in an input network through 152 layers of residual error neural networks and a characteristic pyramid structure, and the fact that the pictures can be divided by 32 in whole at each time is guaranteed.
(2) Data enhancement: since there are many small targets in the data set, poor detection performance due to many small targets is a very important issue. Research shows that one of the reasons for poor small target detection performance is that the regression loss of the small target object does not contribute sufficiently to the total regression loss, so increasing the loss contribution of the small target is one of the methods for solving the problem. A data enhancement method is designed, and the regression loss of the inclination detection module of the object with the size smaller than 32 x 32 in the current iteration is calculated firstly, and L is usedsWhere s represents an object of size less than 32 x 32, and then the regression loss of the tilt detection module in this iteration is calculated, denoted as LregAnd finally, the loss contribution rate a of the small target object is calculated according to the following formula:
a=Ls/Lregformula (1)
If the calculated a is less than 0.4, the contribution rate of the small target loss in the iteration is considered to be insufficient for the total regression loss, and the loss contribution rate of the small target needs to be strengthenedThen, the next iteration randomly selects four pictures from the training set, reduces w and h of each picture to the original w/2 and h/2, changes the area to the original 1/4, and arranges the pictures according to a two-dimensional array
Figure BDA0003237606340000041
The method is spliced, the size is still 1024 multiplied by 1024, the splicing schematic diagram is shown in fig. 2, and the next iteration is carried out, so that the proportion of the small target is increased, and the detection performance of the small target is improved. Thereby improving the overall detection performance.
(3) Model setup and training
As shown in fig. 3, the network model mainly includes a backbone convolutional neural network, a feature pyramid structure, an RPN full convolutional neural network, an ROI classifier, and an oblique angle regression network. The method comprises the steps that a picture is input into a backbone convolutional neural network after data preprocessing and extracted from a feature pyramid structure, then the receptive field of the network is increased through convolution of 3 x 3 of a feature graph output by the feature pyramid, then dimension reduction is conducted through convolution of 1x1, a proposal candidate frame is finally generated, then bilinear interpolation is conducted through an ROI classifier to obtain a mapped feature graph, finally a full-connection layer is connected, regression and classification are output to obtain a positive candidate frame, however, due to the characteristics of a data set, the frame needs to be sent to an inclination angle regression network to conduct final regression deviation and classification prediction, and compared with a common two-stage network, the number of anchor points is not increased in the whole process, and an accurate regression frame can be obtained.
To obtain the best detection performance, 152 layers of residual networks are used throughout the model and the convolution kernels are grouped in parallel for parameter reduction. The trunk network is divided into five layers of C1, C2, C3, C4 and C5, the picture size is downsampled for 4 times through the five layers, the C1 layer of the trunk network uses a convolution layer of 7x7 and a relu activation function, and finally max pool is carried out for one time; in order to reduce the parameter number and improve the detection performance at the same time, the backbone network layers C2-C5 all use packet convolution, the backbone network layer C2 uses convolution groups of 1 × 1, 3 × 3 and 1 × 1, wherein the convolution dimensionality is divided into 32 groups, the backbone network layer C3 uses 8 convolution groups, the backbone network layer C4 uses 36 convolution groups, and the backbone network layer C5 uses 3 convolution groups, so that the feature maps 256, 512, 1024 and 2048 of the dimensionality in 4 are finally output. And then, carrying out convolution on feature maps output by C2-C5 layers by 1x1, compressing feature dimensions into 256, carrying out up-sampling on 2048-dimensional feature maps output by the C5 layer for 3 times, outputting 256-dimensional feature maps, adding the 256-dimensional feature maps and 256-dimensional features output by the C2-C5 layers to obtain P2'-P5' layers, carrying out convolution on 3 x 3 of the 256 dimensions to obtain final P2-P5 layers, and carrying out max pool with 1x1 step size of 2 on the P5 layers to obtain P6 layers which are used for subsequent calculation.
Then, the output P2-P6 layers extract features through a 256-dimensional convolution layer of 3 x 3, so that the features are extracted from different levels, the features of the different levels are fused, the different features are better extracted, the detection performance of the whole network is further improved, then k anchors are obtained, then the k anchors are divided into two paths to be subjected to 1x1 convolution for classification and regression loss and calculation of 128 positive and 128 negative propofol for a subsequent roi classifier, and classification loss L is used according to a formula (2)clsEquation (3) uses the regression loss Lreg. The overall RPN stage loss is calculated as shown in equation (4).
Figure BDA0003237606340000051
Wherein p isiRepresenting the probability that the ith anchor predicts the true tag,
Figure BDA0003237606340000052
indicating a 1 when the current sample is positive and a 0 when the current sample is negative
Figure BDA0003237606340000053
Figure BDA0003237606340000061
Wherein
Figure BDA0003237606340000062
Represents the offset, t, of this anchor with respect to the ground truth predictioniRepresenting the predicted offset
Figure BDA0003237606340000063
Wherein gamma is 1 in the present invention.
The positive and negative samples enter a roi classifier, roi alignment is carried out to map the propofol extracted by RPN to generate a feature map with the size of 7 multiplied by 7, then a feature map with the size of 1024 is obtained through a full connection layer, and classification loss L is carried out again hereclsrAnd regression loss LregrAnd (3) calculating the total loss function again after the calculation mode is consistent with the calculation loss of the RPN, wherein the calculation mode is as shown in formula (5), the obtained regression frame is a positive frame at the moment, but an angular inclined frame is required to be obtained according to the characteristics of the remote sensing image, and therefore the calculated positive frame is connected to an inclined detection module.
Figure BDA0003237606340000064
Where p is the softmax probability distribution predicted by the classifier, u is the true tag value of the corresponding target, tuPredicting regression parameters for class u
Figure BDA0003237606340000065
v corresponds to the regression parameter of the bounding box of the real target (vx,vy,vw,vh)
The tilt detection module is the focus here, taking the positive frame coming out of the modified fast-rcnn module as input, first extracting the angular offset features through the roi align and a full link layer of size 5, then sending to the decoder, outputting the preliminary features RROI, then extracting the deep features of RROI through roi align again, sending to the full link layer of size 2048 for classification andregression loss calculation, wherein the classification loss LclsxIn accordance with the formula (2), the regression loss LregxA new calculation method is adopted, as shown in equation (6). And finally obtaining classification and regression results.
Figure BDA0003237606340000066
Figure BDA0003237606340000067
In order to ensure continuous conduction, aln (b + beta) ═ μ when x is equal to 1, setting parameters a equal to 0.5, beta equal to 1, μ equal to 1.5,
Figure BDA0003237606340000071
the inclination detection module is mainly divided into two parts, the first part is an angle rotation module, the module mainly rotates a horizontal anchor frame into an inclined anchor frame, the coordinates of the obtained positive frame are assumed to be (x, y, w, h), wherein x, y represent the coordinates of the central point of the positive anchor frame, w, h represent the width and height of the positive anchor frame, and under the most ideal condition, the positive anchor frame is an external rectangle of the inclined frame, the position and angle deviation is carried out by a middle full connection layer and a decoder in a network, and the calculation of the deviation amount compared with the ground route is shown in a formula (7).
Figure BDA0003237606340000072
Figure BDA0003237606340000073
Figure BDA0003237606340000074
Figure BDA0003237606340000075
Wherein (x)r,yr,wr,hr,θr) Represents the coordinates of the offset frame calculated by the angle rotation module (x)*,y*,w*,h*,θ*) Coordinates of the box representing the ground truth.
The second part is that the angle correction module extracts the deep-level features of the features after the first part is shifted, so that the features are not changed although the angle is rotated, the extracted deep-level features can correct the angle again, and the returned rotating frame can be more robust and fit with the angle of the target object. The specific flow is that the first part calculates the inclined frame parameter (x)r,yr,wr,hr,θr) And inputting a feature map D with the size of (H multiplied by W multiplied by C), dividing features and parameters into K multiplied by K grids (bins) through a rotated roi alignment, and calculating a feature map y with the size of K multiplied by C, for an index of (i ≧ 0, j ≧ C)<K) The calculation mode of the feature graph y with the grid output dimension of C (C is more than or equal to 0 and less than or equal to C) is shown in formula (8).
yc(i,j)=∑(x,y)∈bin(i,j)Di,j,c(Tθ(x,y))/nijFormula (8)
Wherein Di,j,cRepresenting a feature map of size K C, nijRepresents the number of samples in the grid, and bin (i, j) represents a grid coordinate set, wherein T is calculated as shown in equation (9)θIt represents the transformation of each grid real coordinate (x, y) into the coordinate (x ', y') on the feature map, and the transformation is shown in formula (10).
Figure BDA0003237606340000081
Figure BDA0003237606340000082
The loss function of the whole network is specifically shown in formula (11), and the total loss obtained by integrating the loss of the RPN stage, the loss of the full link layer stage and the loss of the tilt module is jointly trained.
Lall=L(p,u,tu,v)+Lclsx+LregxFormula (11)
In the training process, a 1080ti video card is used for calculation, a pre-trained resnet152 model is used, an SGD optimizer using momentum is adopted, the initial learning rate is 0.00125, and 15 iterations are performed in total.
(4) Model prediction and evaluation
Storing the trained model, loading and training parameters of the model, and directly predicting the type and position of an object in a remote sensing image with any size end to end, wherein in the prediction process, the image with the size larger than 1024 × 1024 is still segmented according to a sliding window with the step length of 512, and is sent into the model for prediction, if the same object appears in a plurality of images, only the image with the highest confidence coefficient is selected for drawing when the anchor frame is drawn, and the other images are discarded; if the size is smaller than 1024 × 1024, the picture is filled with black background to be a picture with the size of 1024 × 1024. And the model evaluation is to load model parameters, predict local test set pictures, generate the types of the test set pictures and the position coordinate information of the anchor frame, and then perform online evaluation through a DOTA website. The index of the model evaluation is the average accuracy (map). The prediction performance of the algorithm is evaluated on a data set DOTA, and compared with the improved fast-rcnn algorithm, the improved algorithm is greatly improved, wherein the experimental result is shown in the table 1.
TABLE 1 comparison of predicted Performance of the method proposed by the present invention
Figure BDA0003237606340000091
As shown in table 1, on the DOTA data set, compared with the improved fast-rcnn algorithm, the improved algorithm is improved by 4.42%, a better result is obtained, a more excellent prediction result is obtained on the currently popular algorithms SCRet and R3Det, and an experimental result proves that the algorithm is effective and can more accurately identify objects in the remote sensing image.

Claims (6)

1. A remote sensing image target detection method based on multi-angle detection frame is characterized in that; the method comprises three parts of data preprocessing, data enhancement, positive frame generation, angle rotation and angle correction:
firstly, preprocessing input data to ensure that the size of an image input into a network accords with a preset size, and then classifying and outputting a positive frame without angles through a backbone network, a characteristic pyramid structure, an RPN structure and an ROI;
then entering an inclination angle module, and performing first angle rotation by using a 1 multiplied by 1 convolution, a full connection layer and a decoder;
in order to obtain a more accurate angle, the angle needs to be corrected, a rotated roiign, a 1 × 1 convolution and a full connection layer are used for correcting the angle of the first rotation, and in the training stage, a loss function of position regression is redesigned for facilitating training; and determining whether the next iteration uses a data enhancement strategy to process the input data or not according to the loss contribution rate of the small target.
2. The method for detecting the target of the remote sensing image based on the multi-angle detection frame as claimed in claim 1, wherein:
the data preprocessing is to judge whether the size of a picture is smaller than 1024 × 1024, if so, the picture is filled with black background to be 1024 × 1024, and if the size is larger than 1024 × 1024, the picture is divided into n pictures with the size of 1024 × 1024 by using a sliding window with the step size of 512 pixels, and the target of the boundary of the divided images is completely detected.
3. The method for detecting the target of the remote sensing image based on the multi-angle detection frame as claimed in claim 1, wherein:
the data enhancement strategy is based on the regression loss L of the size of the ground route box less than 32 multiplied by 32 in one iterationsAccounts for the total regression loss LregIf the ratio is less than 0.4, the next iteration of inputting the network image will randomly select four pictures from the training set, the length-width ratio of each picture is reduced to 1/2, and then the ratio is determined according to the length-width ratio of each picture
Figure FDA0003237606330000011
The modes are combined, if the ratio is more than or equal to 0.4, the original training set picture is normally input, and a loss ratio calculation formula is shown as follows;
a=Ls/Lreg
4. the method for detecting the target of the remote sensing image based on the multi-angle detection frame as claimed in claim 1, wherein:
the first angular rotation firstly uses a 10-channel 1 × 1 convolution to reduce dimension, and then uses a full-link layer and a decoder to perform the first angular rotation, and the offset calculation method compared with the ground route is as follows:
Figure FDA0003237606330000021
Figure FDA0003237606330000022
Figure FDA0003237606330000023
Figure FDA0003237606330000024
wherein (x)r,yr,wr,hr,θr) The coordinates of the frame after the offset calculated in the first stage are shown, (x)*,y*,w*,h*,θ*) Coordinates of the box representing the ground truth.
5. The method for detecting the target of the remote sensing image based on the multi-angle detection frame as claimed in claim 1, wherein:
the second angle correction uses a rotated roi align to extract the deep features of the features after the first part of the offset features, so that the features are not changed although the angle is rotated, the extracted deep features can correct the angle again, and a regressed rotating frame can be more robust and fit with a target object; the specific procedure of angle correction is to calculate the first part of the calculated tilt frame parameters (x)r,yr,wr,hr,θr) And inputting a feature diagram D with the size of (H multiplied by W multiplied by C), dividing features and parameters into a feature diagram y with the size of K multiplied by C through a rotated roi align, then using a 1 multiplied by 1 convolution with 10 channels to reduce dimensions, and finally using a full connection layer to carry out final classification and regression; for an index of (i ≧ 0, j)<K) The calculation mode of the feature graph y with the grid output dimension C (C is more than or equal to 0 and less than or equal to C) is as follows:
yc(i,j)=∑(x,y)∈bin(i,j)Di,j,c(Tθ(x,y))/nij
wherein Di,j,cRepresenting a feature map of size K C, nijRepresents the number of samples in the grid, and bin (i, j) represents the true coordinate value of the grid with coordinate index i, j, wherein the calculation is as follows, TθThe transformation of each grid real coordinate (x, y) into a coordinate (x ', y') on the feature map is represented as follows:
Figure FDA0003237606330000031
Figure FDA0003237606330000032
6. the method for detecting the target of the remote sensing image based on the multi-angle detection frame as claimed in claim 1, wherein:
in order to make the loss function converge faster in the training stage of angle correction, the inclination angle module regression loss function is redesigned, the gradient value of the range x <1 in the gradient function is improved, the training time is shortened, and the detection performance of the model is improved, wherein the loss function calculation mode is as follows:
Figure FDA0003237606330000033
Figure FDA0003237606330000034
in order to ensure continuous conduction, aln (b + beta) ═ μ when x is equal to 1, setting parameters a equal to 0.5, beta equal to 1, μ equal to 1.5,
Figure FDA0003237606330000035
CN202111007113.0A 2021-08-30 2021-08-30 Remote sensing image target detection method based on multi-angle detection frame Active CN113850761B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111007113.0A CN113850761B (en) 2021-08-30 2021-08-30 Remote sensing image target detection method based on multi-angle detection frame

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111007113.0A CN113850761B (en) 2021-08-30 2021-08-30 Remote sensing image target detection method based on multi-angle detection frame

Publications (2)

Publication Number Publication Date
CN113850761A true CN113850761A (en) 2021-12-28
CN113850761B CN113850761B (en) 2024-06-14

Family

ID=78976487

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111007113.0A Active CN113850761B (en) 2021-08-30 2021-08-30 Remote sensing image target detection method based on multi-angle detection frame

Country Status (1)

Country Link
CN (1) CN113850761B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114419520A (en) * 2022-03-28 2022-04-29 南京智谱科技有限公司 Training method, device, equipment and storage medium of video-level target detection model
CN116363435A (en) * 2023-04-03 2023-06-30 盐城工学院 Remote sensing image target detection system and method based on deep learning

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111091105A (en) * 2019-12-23 2020-05-01 郑州轻工业大学 Remote sensing image target detection method based on new frame regression loss function
CN111723748A (en) * 2020-06-22 2020-09-29 电子科技大学 Infrared remote sensing image ship detection method
CN111950488A (en) * 2020-08-18 2020-11-17 山西大学 Improved fast-RCNN remote sensing image target detection method
CN112560614A (en) * 2020-12-04 2021-03-26 中国电子科技集团公司第十五研究所 Remote sensing image target detection method and system based on candidate frame feature correction

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111091105A (en) * 2019-12-23 2020-05-01 郑州轻工业大学 Remote sensing image target detection method based on new frame regression loss function
CN111723748A (en) * 2020-06-22 2020-09-29 电子科技大学 Infrared remote sensing image ship detection method
CN111950488A (en) * 2020-08-18 2020-11-17 山西大学 Improved fast-RCNN remote sensing image target detection method
CN112560614A (en) * 2020-12-04 2021-03-26 中国电子科技集团公司第十五研究所 Remote sensing image target detection method and system based on candidate frame feature correction

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114419520A (en) * 2022-03-28 2022-04-29 南京智谱科技有限公司 Training method, device, equipment and storage medium of video-level target detection model
CN116363435A (en) * 2023-04-03 2023-06-30 盐城工学院 Remote sensing image target detection system and method based on deep learning
CN116363435B (en) * 2023-04-03 2023-10-27 盐城工学院 Remote sensing image target detection system and method based on deep learning

Also Published As

Publication number Publication date
CN113850761B (en) 2024-06-14

Similar Documents

Publication Publication Date Title
CN111325203B (en) American license plate recognition method and system based on image correction
CN113065558B (en) Lightweight small target detection method combined with attention mechanism
CN110443143B (en) Multi-branch convolutional neural network fused remote sensing image scene classification method
TWI762860B (en) Method, device, and apparatus for target detection and training target detection network, storage medium
WO2022002150A1 (en) Method and device for constructing visual point cloud map
CN109902677B (en) Vehicle detection method based on deep learning
CN111931684B (en) Weak and small target detection method based on video satellite data identification features
CN113012212B (en) Depth information fusion-based indoor scene three-dimensional point cloud reconstruction method and system
CN109886066B (en) Rapid target detection method based on multi-scale and multi-layer feature fusion
CN111652321A (en) Offshore ship detection method based on improved YOLOV3 algorithm
CN113591795B (en) Lightweight face detection method and system based on mixed attention characteristic pyramid structure
CN112150493A (en) Semantic guidance-based screen area detection method in natural scene
CN113850761B (en) Remote sensing image target detection method based on multi-angle detection frame
CN113850324B (en) Multispectral target detection method based on Yolov4
CN111680705A (en) MB-SSD method and MB-SSD feature extraction network suitable for target detection
CN113159215A (en) Small target detection and identification method based on fast Rcnn
CN114998566A (en) Interpretable multi-scale infrared small and weak target detection network design method
CN112364881B (en) Advanced sampling consistency image matching method
CN116342536A (en) Aluminum strip surface defect detection method, system and equipment based on lightweight model
CN117496158A (en) Semi-supervised scene fusion improved MBI contrast learning and semantic segmentation method
CN116778187A (en) Salient target detection method based on light field refocusing data enhancement
CN116363610A (en) Improved YOLOv 5-based aerial vehicle rotating target detection method
CN116740587A (en) Unmanned aerial vehicle aerial photographing target credible identification method based on double uncertainty perception of data and model
CN110910497A (en) Method and system for realizing augmented reality map
CN113971764B (en) Remote sensing image small target detection method based on improvement YOLOv3

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant