CN113850761B - Remote sensing image target detection method based on multi-angle detection frame - Google Patents

Remote sensing image target detection method based on multi-angle detection frame Download PDF

Info

Publication number
CN113850761B
CN113850761B CN202111007113.0A CN202111007113A CN113850761B CN 113850761 B CN113850761 B CN 113850761B CN 202111007113 A CN202111007113 A CN 202111007113A CN 113850761 B CN113850761 B CN 113850761B
Authority
CN
China
Prior art keywords
angle
size
frame
loss
remote sensing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111007113.0A
Other languages
Chinese (zh)
Other versions
CN113850761A (en
Inventor
王素玉
许凯焱
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Technology
Original Assignee
Beijing University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Technology filed Critical Beijing University of Technology
Priority to CN202111007113.0A priority Critical patent/CN113850761B/en
Publication of CN113850761A publication Critical patent/CN113850761A/en
Application granted granted Critical
Publication of CN113850761B publication Critical patent/CN113850761B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10032Satellite or aerial image; Remote sensing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20016Hierarchical, coarse-to-fine, multiscale or multiresolution image processing; Pyramid transform

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Quality & Reliability (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a remote sensing image target detection method based on a multi-angle detection frame, which designs an inclination angle module on the basis of a positive frame predicted by a master-rcnn, and mainly comprises two stages; the first stage carries out preliminary angle offset rotation through the full-connection layer and the decoder, the second stage uses rotated roi align to extract rotation invariant features, and angle offset correction is carried out again to obtain a detection frame of an accurate angle. In addition, the regression loss function of the inclination detection module is redesigned according to the problems of large remote sensing image size and slow training, so that the loss function converges more quickly and the accuracy is higher. The experimental result shows that compared with the improved faster-rcnn, the accuracy is improved by 4.4%, and the invention has good detection effect.

Description

Remote sensing image target detection method based on multi-angle detection frame
Technical Field
The invention belongs to the field of target detection in computer vision, and discloses a method for detecting a target type and marking a target position of a picture by utilizing a convolutional neural network, which has higher accuracy compared with the existing remote sensing target detection method, such as ROI-Transfomer, SCRDet, R3Det and the like, and can detect the inclined direction according to the characteristics of a remote sensing image.
Background
In recent years, the development of the aerospace industry in China is rapid, the remote sensing satellite technology is continuously advanced, a large number of images are acquired by satellites every day for various purposes, the satellites of the carried visible light cameras are most common, the visible light remote sensing images are most visual, and targets in the images can be conveniently distinguished. However, the conventional detection algorithm needs to manually extract features, the recognition performance cannot meet the daily requirements, and the detection algorithm is greatly influenced by external factors. However, with the continuous development of deep learning, the characteristic of automatic extraction by using a convolutional neural network greatly reduces the cost of manual operation and obviously improves the accuracy. However, the accuracy of the current most advanced detector still cannot fully meet the current actual needs, and the problem of insufficient accuracy still exists in the field, so that the problem needs to be solved.
Currently, a remote sensing target detection method based on a convolutional neural network has been greatly developed, for example, an R3Det method, a PIOU method, a DRN method and the like based on a single-stage network, an R2CNN method, a RRPN method, an ROI-Transfomer method, a SCRDet method and the like based on a double-stage network. Although the above method has incomparable advantages over the conventional method, a relatively high accuracy is achieved on the mainstream data sets DOTA and HRSC2016, but there is still a problem of insufficient accuracy, and there is still a relatively large lifting space.
Disclosure of Invention
Aiming at the problem of insufficient accuracy of the algorithm, the invention designs a target detection algorithm based on multi-angle remote sensing images, which is improved to different degrees compared with the algorithm.
The invention adopts the following technical scheme: a target detection algorithm based on multi-angle remote sensing images. The specific flow of target detection is as follows: firstly, preprocessing a picture and enhancing data, then sending the picture into a convolutional neural network provided by the invention, extracting the characteristics of the picture through a backbone network, then sending the characteristics into an RPN network to generate a specific number proposal, then, needing to perform the roi align output on the proposal to fix the characteristic pictures with the size of 7 multiplied by 7, then, sending the characteristic pictures into a full-connection layer output positive frame, then, performing angle offset regression on the output positive frame through a full-connection layer and a decoder, and finally, performing angle correction through a rotated roi align to obtain a final detection result.
(1) Data preprocessing: the present invention discloses a dataset using DOTA. In order to facilitate training and prediction, the width and height of an input network image are limited, the image size of the input network is 1024×1024 during training and prediction, and if the original size is larger than 1024×1024 during training and prediction, the image is divided into a plurality of images with 1024×1024 sizes according to a sliding window with a step length 512; if the original size is less than 1024×1024, black background is used for padding, so that the size of the data can be preprocessed without losing boundary information.
(2) Data enhancement: aiming at the problem of numerous small target objects of a remote sensing image, a data enhancement strategy is designed, the regression loss of a frame with the calculated area smaller than 32 multiplied by 32 in each iteration during training accounts for the ratio of the total regression loss of the whole image, if the regression loss of the small target object in the iteration is smaller than 0.4, the contribution rate of the loss of the small target object in the iteration to the total regression loss is considered to be insufficient, four images of a training set are randomly selected in the next iteration, the length and the width of each image are shortened to 1/2 of the original length and width of each image, the four images are spliced into a new image, the corresponding coordinates of ground truth are modified, and the new image is sent into network training.
(3) Model setting and training: the network model mainly comprises a main convolution neural network, a feature pyramid, an RPN full convolution neural network, an ROI classifier and an inclination angle regression network. The main network is designed into a residual network with a depth of 152 layers and is divided into 5 parts, convolution of each part is connected in parallel, the features extracted through 4 times of downsampling operation of the main network enter a feature pyramid to be subjected to 3 times of upsampling operation and 1 time of maximum pooling and fusion, the number of channels of the output feature layer is 256, the output feature layer is then sent to an RPN full convolution neural network to be subjected to proposal generation, then the roi align is carried out to map the features to 7X 7, classification and regression are carried out through a full connection layer to obtain a positive frame, finally the positive frame is sent to an inclination angle regression network, preliminary angle offset regression is carried out through the full connection layer and a decoder, and then rotation invariant features are extracted through rotated roi align for obtaining a final detection result.
In the training process, a pre-training model ResNet is used, the classification loss of the RPN and the ROI classifier adopts a cross entropy loss function, the regression loss adopts a SmoothL loss function, the classification loss still adopts the cross entropy loss function in the inclination angle network, and the regression loss function is redesigned. The optimizer employs an SGD optimizer using momentum with an initial learning rate of 0.00125 for a total of 15 iterations.
(4) Model prediction: after model training is completed, the training model is stored, trained model parameters are loaded, test data with any size are input, and the types and positions of objects in the data can be obtained end to end. This stage is simply loading trained model parameters and data enhancement is not used for this step.
The evaluation index is the average accuracy (map). On the DOTA data set for testing, the performance of the whole algorithm is evaluated, and the method provided by the invention obtains a competitive result, and compared with the currently commonly used single-stage and double-stage commonly used algorithms, the method has higher accuracy and identification performance, and can identify data of any size end to end.
Drawings
Fig. 1 is a schematic overall flow chart of the method according to the present invention.
Fig. 2 is a schematic diagram of a convolutional neural network according to the present invention.
Fig. 3 is a schematic diagram of a data enhancement result according to the present invention.
Detailed Description
The following detailed description of embodiments of the invention refers to the accompanying drawings, which illustrate in detail:
A remote sensing image target detection method based on a multi-angle detection frame. As shown in fig. 1, the detection process includes preprocessing an image and enhancing data (only applied to a training stage), dividing the image into 1024×1024 sizes, sending the image into a main network to perform downsampling and feature extraction, sending the image into a feature pyramid to perform upsampling, merging features of different levels in the process, sending the image into an RPN full convolution neural network to generate proposal, performing roi alignment on the feature map to be 7×7 sizes, performing classification and regression, and finally sending the image into an inclination angle regression network to perform final classification and regression, thereby obtaining a detection result.
Specific algorithms are referenced below:
(1) Data preprocessing: the most well known DOTA data set for the current remote sensing target detection is preprocessed. Considering the characteristics of remote sensing images, the resolution ratio of a plurality of pictures is very high, the length and the width of the pictures are required to be limited for the convenience of training, prediction and saving of computer performance, the input pictures are uniformly limited to 1024×1024, 4 times of downsampling, 3 times of upsampling and 1 time of maximum pooling fusion characteristics are carried out in an input network through a 152-layer residual neural network and a characteristic pyramid structure, and the situation that each time can be divided by 32 times is guaranteed.
(2) Data enhancement: the poor detection performance caused by the large number of small objects in the data set is a very important problem. It is found through researches that one of the reasons for poor detection performance of the small target is that the regression loss of the small target object is not large enough to contribute to the total regression loss, so that improving the loss contribution of the small target is one of methods for solving the problem. The method for enhancing data is designed, firstly, the regression loss of the inclination detection module of an object with the size smaller than 32×32 in the current iteration is calculated, denoted by L s, s denotes the object with the size smaller than 32×32, then, the regression loss of the inclination detection module in the current iteration is calculated, denoted by L reg, and finally, the loss contribution rate a of a small target object is calculated according to the following formula:
a=l s/Lreg formula (1)
If the calculated a is smaller than 0.4, considering that the contribution rate of the small target loss of the iteration is insufficient for the total regression loss and the loss contribution rate of the small target needs to be enhanced, randomly selecting four pictures from the training set in the next iteration, reducing w and h of each picture to be the original w/2,h/2, changing the area to be the original 1/4, and arranging the pictures according to a two-dimensional arrayThe method is spliced, the size is still 1024×1024, the splicing diagram is shown in fig. 2, and the next iteration is carried out, so that the specific gravity of the small target is considered to be increased, and the detection performance of the small target is improved. And further, the overall detection performance is improved.
(3) Model setup and training
As shown in fig. 3, the network model mainly comprises a main convolution neural network, a feature pyramid structure, an RPN full convolution neural network, an ROI classifier and an inclination angle regression network. The method comprises the steps of inputting a picture into a main convolution neural network and a feature pyramid structure after data preprocessing to extract features, adding a receptive field of the network to the feature image output by the feature pyramid through convolution of 3 multiplied by 3, then reducing the dimension through convolution of 1 multiplied by 1, finally generating proposal candidate frames, obtaining mapped feature images through bilinear interpolation by an ROI classifier, finally connecting a full-connection layer, outputting regression and classification to obtain positive candidate frames, but because of the characteristics of a data set, the frames need to be sent to an inclination angle regression network to carry out final regression offset and classification prediction, and compared with a common two-stage network, the number of anchor points is not increased in the whole process, and the accurate regression frames can be obtained.
In order to obtain the best detection performance, a 152-layer residual network is used throughout the model, and the convolution kernel groups are connected in parallel for reducing the number of parameters. The backbone network is divided into five layers C1, C2, C3, C4 and C5, the picture size is downsampled for 4 times through the five layers, the backbone network C1 layer uses a 7x7 convolution layer and a relu activation function, and finally max pool is carried out once; in order to reduce the number of parameters and improve the detection performance, all the layers C2 to C5 of the backbone network use grouping convolution, the layer C2 of the backbone network uses 1×1,3×3 and 1×1 convolution groups, wherein the convolution dimension is divided into 32 groups, the layer C3 uses 8 convolution groups, the layer C4 uses 36 convolution groups, the layer C5 uses 3 convolution groups, and finally the feature images 256, 512, 1024 and 2048 of the dimension in 4 are output. And then, the feature graphs output by the C2-C5 layers are all subjected to 1x1 convolution, the feature dimension is compressed into 256, wherein the feature graphs of 2048 dimensions output by the C5 layers are subjected to 3 times up-sampling, the 256-dimension feature graphs are output and respectively added with the 256-dimension features output by the C2-C5 layers to obtain P2'-P5' layers, and then the final P2-P5 layers are obtained through 256-dimension 3 x 3 convolution, wherein the P6 layers are obtained by carrying out max pool with the 1x1 step length of2 on the P5 layers, and the feature layers are used for subsequent calculation.
The output P2-P6 layer extracts features through a convolution layer of 3X 3 with 256 dimensions, thus extracting features from different levels, fusing the features of different levels, better extracting different features, further improving the detection performance of the whole network, obtaining k anchors, dividing two paths into 1X 1 convolutions for classifying and regressing losses, and calculating 128 positive proposal and 128 negative proposal for the later roi classifier, wherein the classifying loss L cls is used according to the formula (2), and the regressing loss L reg is used according to the formula (3). The total RPN phase loss calculation is shown in equation (4).
Where p i represents the probability that the ith anchor predicts the real label,Indicating a1 when currently positive and 0 when negative
Wherein the method comprises the steps ofRepresenting the predicted offset of this anchor relative to ground truth, t i represents the predicted offset
Wherein in the present invention, gamma is 1.
The positive and negative samples enter a roi classifier, the roi align is carried out to generate a feature map with the size of 7 multiplied by 7 by proposal extracted by RPN, then a feature map with the size of 1024 is obtained through a full-connection layer, classification loss L clsr and regression loss L regr are carried out again, wherein the calculation mode is consistent with the calculation loss of the RPN, then a total loss function is calculated again, the calculation mode is shown as a formula (5), the regression frame obtained at the moment is a positive frame, but according to the characteristics of a remote sensing image, an angular inclined frame is needed to be obtained, and the calculated positive frame is connected to an inclined detection module.
Where p is the classifier-predicted softmax probability distribution, u is the true tag value of the corresponding target, and t u predicts the regression parameters of class uV corresponds to the regression parameters of the bounding box of the real object (v x,vy,vw,vh)
The tilt detection module is the key point here, the positive frame coming out from the improved master-rcnn module is taken as input, firstly, angle offset characteristics are extracted through a roi align and a full-connection layer with the size of 5, then the positive frame is sent to a decoder to output preliminary characteristics RROI, then, deep characteristics of the RROI are extracted through the roi align again, the preliminary characteristics are sent to the full-connection layer with the size of 2048 to carry out classification and regression loss calculation, wherein classification loss L clsx is consistent with formula (2), and regression loss L regx adopts a new calculation mode, and is specifically shown as formula (6). And finally obtaining classification and regression results.
Wherein in order to ensure continuous guidance, aln (b+beta) =μ when x=1, the parameter a=0.5, beta=1, μ=1.5,
The tilt detection module is mainly divided into two parts, the first part is an angle rotation module, the module mainly rotates a horizontal anchor frame into an inclined anchor frame, the coordinates of a positive frame obtained from the module are assumed to be (x, y, w and h), wherein x and y represent the coordinates of the center point of the positive anchor frame, w and h represent the width and the height of the positive anchor frame, and in the most ideal case, the positive anchor frame is an external rectangle of the inclined frame, and the position and the angle offset are carried out by a middle full-connection layer and a decoder in a network, and compared with the offset calculated by ground truth as shown in a formula (7).
Where (x r,yr,wr,hrr) represents the coordinates of the frame after the offset calculated by the angular rotation module and (x *,y*,w*,h**) represents the coordinates of the frame of ground truth.
The second part is the angle correction module, which can be considered to rotate the angle though the angle is not changed by extracting the deep-level features of the features after the first part is deviated, and the angle of the extracted deep-level features can be corrected again, so that the regressed rotating frame is more robust and is attached to the angle of the target object. The specific flow is that the characteristic diagram D with the inclination frame parameter (x r,yr,wr,hrr) calculated in the first part and the input size (H multiplied by W multiplied by C) is divided into K multiplied by K grids (bin) by rotated roi align, one characteristic diagram y with the size K multiplied by C is calculated, and the calculation mode of the characteristic diagram y with the grid output dimension C (0 is less than or equal to C is less than or equal to C) with the index (i is more than or equal to 0 and j is less than or equal to K) is shown as a formula (8).
Y c(i,j)=∑(x,y)∈bin(i,j)Di,j,c(Tθ(x,y))/nij formula (8)
Where D i,j,c represents a feature map of size KXKXC, n ij represents the number of samples of the grid, bin (i, j) represents a set of grid coordinates, where the calculation is shown in equation (9), and T θ represents the transformation of the true coordinates (x, y) of each grid into coordinates (x ', y') on the feature map, as shown in equation (10).
The loss function of the whole network is specifically shown as a formula (11), and the loss of the RPN stage, the loss of the full connection layer stage and the loss of the inclination module are integrated, so that the obtained total loss is subjected to joint training.
L all=L(p,u,tu,v)+Lclsx+Lregx formula (11)
In the training process, a 1080ti display card is used for calculation, a resnet model of pre-training is used, an SGD optimizer using momentum is adopted as an optimizer, the initial learning rate is 0.00125, and the total number of iterations is 15.
(4) Model prediction and assessment
The trained model is stored, parameters of the model are loaded and trained, the type and the position of an object in a remote sensing image with any size can be directly predicted end to end, in the prediction process, the image with the size larger than 1024×1024 is still divided into a plurality of images with the size 1024×1024 according to a sliding window with the step length of 512, the images are sent into the model for prediction, if the same object appears in a plurality of pictures, only the image with the highest confidence level is selected for drawing when an anchor frame is drawn, and the other images are discarded; if the size is smaller than 1024×1024, black background is used for filling up, and the picture with 1024×1024 size is obtained. The evaluation of the model is to load model parameters, predict local test set pictures, generate the types of the test set pictures and the position coordinate information of the anchor frame, and then evaluate the test set pictures on line through the DOTA website. The index of the model evaluation is the average accuracy (map). The predictive performance of the algorithm was evaluated on the dataset DOTA, which improved the algorithm with a greater boost than the improved faster-rcnn algorithm, with the experimental results shown in table 1.
TABLE 1 comparison of the predicted Performance of the methods of the invention
As shown in Table 1, on the DOTA data set, compared with the improved faster-rcnn, the improved algorithm is improved by 4.42%, a better result is obtained, and more excellent prediction results are obtained on the currently popular algorithms SCRet and R3Det, and the experimental result proves that the algorithm is effective and can more accurately identify objects in remote sensing images.

Claims (3)

1. A remote sensing image target detection method based on a multi-angle detection frame is characterized in that the method comprises the following steps of;
Firstly, preprocessing input data to ensure that the image size of an input network accords with a preset size, and then outputting a positive frame without angles through a main network, a characteristic pyramid structure, an RPN structure and an ROI classification;
Then entering an inclination angle module, and performing first angle rotation by using a 1 multiplied by 1 convolution, a full connection layer and a decoder;
in order to obtain a more accurate angle, the angle is required to be corrected, rotated roialign, 1 multiplied by 1 convolution and a full connection layer are used for correcting the angle of the first rotation, and in the training stage, the loss function of position regression is redesigned for facilitating training; determining whether the next iteration uses a data enhancement strategy to process the input data according to the loss contribution rate of the small target;
The first angular rotation is performed first using a 10-channel 1 x 1 convolution dimensionality reduction, and then using a fully-connected layer and decoder, as compared to ground truth, the offset calculation is as follows:
Where (x r,yr,wr,hrr) represents the coordinates of the frame after the offset calculated in the first stage and (x *,y*,w*,h**) represents the coordinates of the frame of ground truth;
Extracting deep features of the features after the first partial offset by using rotated roi align for the second angle correction, wherein the specific process of the angle correction is to divide the features and parameters into K x C feature patterns y by rotated roi align according to the calculated tilt frame parameters (x r,yr,wr,hrr) of the first part and the feature patterns D with the input size (H x W x C), then use 1 x 1 convolution dimension reduction of 10 channels, and finally use a full connection layer to carry out final classification and regression; the feature map y with the grid output dimension c with index (i.gtoreq.0, j < K) is calculated as follows:
yc(i,j)=∑(x,y)∈bin(i,j)Di,j,c(Tθ(x,y))/nij
Where D i,j,c represents a feature map of size KXKXC, n ij represents the number of samples of the grid, bin (i, j) represents the true coordinate values of the grid with a coordinate index of i, j, where T θ represents the transformation of the true coordinates (x, y) of each grid to coordinates (x ', y') on the feature map of 0C C, as follows:
In order to make the loss function converge faster in the training stage of angle correction, the inclination angle module is redesigned to return to the loss function, the gradient value of the range x <1 in the gradient function is improved, and the detection performance of the model is improved while the training time is shortened, wherein the loss function is calculated as follows:
Wherein in order to ensure continuous guidance, aln (b+beta) =μ when x=1, the parameter a=0.5, beta=1, μ=1.5,
2. The method for detecting the target of the remote sensing image based on the multi-angle detection frame according to claim 1, wherein the method comprises the following steps:
The data preprocessing is to firstly judge whether the size of a picture is smaller than 1024×1024, if the size is smaller than 1024×1024, the black background is used for filling the size of 1024×1024, if the size is larger than 1024×1024, a sliding window with a step length of 512 pixels is used for dividing the picture into n pictures with the size of 1024×1024, and the object of dividing the image boundary is completely detected.
3. The method for detecting the target of the remote sensing image based on the multi-angle detection frame according to claim 1, wherein the method comprises the following steps:
The data enhancement strategy is to randomly select four pictures from the training set according to the fact that the ground truth boxes in one iteration are smaller than the size of the ratio of the regression loss L s with the size of 32 multiplied by 32 to the total regression loss L reg, if the ratio is smaller than 0.4, the aspect ratio of each picture is reduced to 1/2, and then according to the following steps If the ratio is more than or equal to 0.4, the original training set picture is normally input, and the loss ratio calculation formula is shown as follows;
a=Ls/Lreg
CN202111007113.0A 2021-08-30 2021-08-30 Remote sensing image target detection method based on multi-angle detection frame Active CN113850761B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111007113.0A CN113850761B (en) 2021-08-30 2021-08-30 Remote sensing image target detection method based on multi-angle detection frame

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111007113.0A CN113850761B (en) 2021-08-30 2021-08-30 Remote sensing image target detection method based on multi-angle detection frame

Publications (2)

Publication Number Publication Date
CN113850761A CN113850761A (en) 2021-12-28
CN113850761B true CN113850761B (en) 2024-06-14

Family

ID=78976487

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111007113.0A Active CN113850761B (en) 2021-08-30 2021-08-30 Remote sensing image target detection method based on multi-angle detection frame

Country Status (1)

Country Link
CN (1) CN113850761B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114419520B (en) * 2022-03-28 2022-07-05 南京智谱科技有限公司 Training method, device, equipment and storage medium of video-level target detection model
CN116363435B (en) * 2023-04-03 2023-10-27 盐城工学院 Remote sensing image target detection system and method based on deep learning

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111723748A (en) * 2020-06-22 2020-09-29 电子科技大学 Infrared remote sensing image ship detection method
CN111950488A (en) * 2020-08-18 2020-11-17 山西大学 Improved fast-RCNN remote sensing image target detection method

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111091105B (en) * 2019-12-23 2020-10-20 郑州轻工业大学 Remote sensing image target detection method based on new frame regression loss function
CN112560614A (en) * 2020-12-04 2021-03-26 中国电子科技集团公司第十五研究所 Remote sensing image target detection method and system based on candidate frame feature correction

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111723748A (en) * 2020-06-22 2020-09-29 电子科技大学 Infrared remote sensing image ship detection method
CN111950488A (en) * 2020-08-18 2020-11-17 山西大学 Improved fast-RCNN remote sensing image target detection method

Also Published As

Publication number Publication date
CN113850761A (en) 2021-12-28

Similar Documents

Publication Publication Date Title
CN110443143B (en) Multi-branch convolutional neural network fused remote sensing image scene classification method
CN111325203B (en) American license plate recognition method and system based on image correction
WO2022002150A1 (en) Method and device for constructing visual point cloud map
Chen et al. Multi-scale spatial and channel-wise attention for improving object detection in remote sensing imagery
US20220067335A1 (en) Method for dim and small object detection based on discriminant feature of video satellite data
CN111652321B (en) Marine ship detection method based on improved YOLOV3 algorithm
CN113012212B (en) Depth information fusion-based indoor scene three-dimensional point cloud reconstruction method and system
CN113850761B (en) Remote sensing image target detection method based on multi-angle detection frame
US11176425B2 (en) Joint detection and description systems and methods
Zhu et al. Diverse sample generation with multi-branch conditional generative adversarial network for remote sensing objects detection
CN111738055A (en) Multi-class text detection system and bill form detection method based on same
CN112733942A (en) Variable-scale target detection method based on multi-stage feature adaptive fusion
CN114332942A (en) Night infrared pedestrian detection method and system based on improved YOLOv3
CN113850324A (en) Multispectral target detection method based on Yolov4
US20220335572A1 (en) Semantically accurate super-resolution generative adversarial networks
CN116740758A (en) Bird image recognition method and system for preventing misjudgment
CN113971764B (en) Remote sensing image small target detection method based on improvement YOLOv3
Liu et al. SLPR: A deep learning based Chinese ship license plate recognition framework
CN117496158A (en) Semi-supervised scene fusion improved MBI contrast learning and semantic segmentation method
Xu et al. Compressed YOLOv5 for oriented object detection with integrated network slimming and knowledge distillation
CN111046861B (en) Method for identifying infrared image, method for constructing identification model and application
CN115035429A (en) Aerial photography target detection method based on composite backbone network and multiple measuring heads
Peng et al. Deep learning-based autonomous real-time digital meter reading recognition method for natural scenes
Wang et al. Speed sign recognition in complex scenarios based on deep cascade networks
Dong et al. An Intelligent Detection Method for Optical Remote Sensing Images Based on Improved YOLOv7.

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant