CN110287927A - Based on the multiple dimensioned remote sensing image object detection method with context study of depth - Google Patents

Based on the multiple dimensioned remote sensing image object detection method with context study of depth Download PDF

Info

Publication number
CN110287927A
CN110287927A CN201910583811.1A CN201910583811A CN110287927A CN 110287927 A CN110287927 A CN 110287927A CN 201910583811 A CN201910583811 A CN 201910583811A CN 110287927 A CN110287927 A CN 110287927A
Authority
CN
China
Prior art keywords
feature
feature map
scale
enhanced
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910583811.1A
Other languages
Chinese (zh)
Other versions
CN110287927B (en
Inventor
张向荣
唐旭
王少娜
陈璞花
古晶
马文萍
马晶晶
侯彪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xidian University
Original Assignee
Xidian University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xidian University filed Critical Xidian University
Priority to CN201910583811.1A priority Critical patent/CN110287927B/en
Publication of CN110287927A publication Critical patent/CN110287927A/en
Application granted granted Critical
Publication of CN110287927B publication Critical patent/CN110287927B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/10Terrestrial scenes
    • G06V20/13Satellite images

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Astronomy & Astrophysics (AREA)
  • Remote Sensing (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a kind of based on the multiple dimensioned remote sensing image object detection method with context study of depth, and mainly solution prior art characteristic amalgamation mode is rough, does not consider the utilization of contextual feature information, leads to the problem that detection accuracy is low.Implementation step are as follows: training sample and test sample are obtained in remote sensing image target detection data set;The RetinaNet detection model of multiple dimensioned and contextual feature enhancing is constructed, setting target classification task and goal position returns the loss function of task entirety;Training sample is input in constructed detection model and is trained, obtains trained detection model;Test sample is inputted in trained detection model, prediction output target category, objective degrees of confidence and target position.The present invention improves the ability to express of feature, improves the mean accuracy of remote sensing image target detection, can be used for obtaining the position of interested target and target in a width remote sensing image.

Description

Remote sensing image target detection method based on depth multi-scale and context learning
Technical Field
The invention belongs to the technical field of remote sensing images, and particularly relates to a target detection method for a remote sensing image, which can be used for obtaining an interested target in one remote sensing image and the position of the target.
Background
Remote sensing image target detection is one of important research contents in the field of remote sensing, and is widely applied to the fields of homeland planning, disaster monitoring, military reconnaissance and the like. The purpose of remote sensing image target detection is to judge whether an interested target exists in a remote sensing image and determine the position of the target.
The traditional remote sensing image target detection methods comprise a template matching-based method, a knowledge-based method and a detection object-based method, and the methods rely on a large amount of characteristic engineering to realize the detection of the target in the remote sensing image to a great extent. However, for the complicated and changeable remote sensing image background environment, the target scale difference is obvious, and the like, the adaptability of the methods is not strong. In recent years, a method based on deep learning is widely adopted for remote sensing image target detection. The deep convolutional neural network does not need to design features manually on the aspect of target detection, the remote sensing image data is subjected to feature extraction automatically, and performance exceeds that of a traditional algorithm. The RetinaNet (local for detect Object detection) model has the advantages of no need of generating a candidate region, high target detection speed, high precision and the like. However, the RetinaNet model still has limitations. Because the network architecture adopted by RetinaNet is a feature pyramid network, the feature pyramid network adds and fuses the feature graph of the current layer and the adjacent higher-level feature graph to obtain a feature graph for detecting the target. In this case, the feature fusion mode is rough, and the more effective utilization of the high-level feature map and the utilization of the context information are omitted, which restricts the improvement of the target detection precision of the remote sensing image.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provides a remote sensing image target detection method based on depth multi-scale and context learning so as to improve the target detection precision in a remote sensing image.
The technical scheme of the invention is as follows: and a multi-scale feature enhancement module and a context feature enhancement module are introduced into the RetinaNet detection model to construct a RetinaNet detection model with multi-scale and context feature enhancement by fully considering the more effective feature map fusion mode and the problem of how to utilize global context feature information. Firstly, feature maps of a plurality of levels are obtained from a backbone network and a feature pyramid network of a RetinaNet detection model, then a multi-scale feature enhancement module is introduced, semantic information of each relatively high-level feature map is guided to adjacent low-level feature maps for the feature maps of the plurality of levels, the semantic information of each relatively low-level feature map is enriched, then a context feature enhancement module is introduced to the pyramid feature map fused with the multi-scale enhancement to obtain global context features of a remote sensing image scene, finally the enhanced pyramid feature map is used in the detection model, and multi-task type determination and target position positioning are achieved through multi-task learning. The concrete implementation steps comprise:
1. a remote sensing image target detection method based on depth multi-scale and context learning is characterized by comprising the following steps:
(1) taking 75% of the remote sensing image target detection data set as a training sample, and taking the remaining 25% as a test sample;
(2) constructing a multiscale and context feature enhanced RetinaNet detection model:
(2a) obtaining 3 convolution characteristic graphs C3, C4 and C5 from a backbone network ResNet-101 of a RetinaNet detection model;
(2b) obtaining 4 pyramid feature maps P3, P4, P5 and P6 from a feature pyramid network of a RetinaNet detection model;
(2c) constructing a multi-scale feature enhancement module consisting of 7 feature maps;
(2d) taking the 3 convolution feature maps C3, C4, C5 and the fourth pyramid feature map P6 as the input of the multi-scale feature enhancement module to obtain 3 pyramid feature maps F3, F4 and F5 after fusion multi-scale enhancement;
(2e) constructing a context feature enhancement module consisting of 5 feature graphs;
(2f) taking the 3 fused multi-scale enhanced pyramid feature maps F3, F4 and F5 as the input of the context feature enhancement module to obtain 3 fused multi-scale context feature enhanced pyramid feature maps G3, G4 and G5;
(3) setting an integral loss function L of a target classification and target position regression task in a multiscale and context feature enhanced RetinaNet detection model:
(3a) setting the existing Focal local function as a Loss function of a target classification task in a multiscale and context feature enhanced RetinaNet detection model, and using LclsRepresents;
(3b) setting the existing Smooth L1Loss function as a Loss function of a target position regression task in a multiscale and context feature enhanced RetinaNet detection model, and using LregRepresents:
(3c) loss function L of task classified by targetclsAnd the loss function L of the target position regression taskregSetting the overall loss function L of the multi-scale context feature enhanced RetinaNet detection model as follows:
L=L({pi},{ti}),
wherein,loss function for the target detection task and the target position regression task as a whole, NclsRepresents the total number of positive sample anchor boxes, p, in the target classification taskiRepresenting the probability that the ith anchor box is the predicted target,representing the probability that the ith anchor box is a true target,a loss function of a target classification task in a multi-scale and context feature enhanced RetinaNet detection model, lambda represents a balance weight parameter between the target classification task and a target position regression task, NregRepresents the total number of positive sample anchor boxes in the target location regression task,indicates the offset, t, of the ith anchor frame relative to the true target frameiIndicating the offset of the ith anchor box relative to the predicted target bounding box,i represents the index of an anchor frame, the value range of the index is from 1 to M, and M is the total number of the anchor frames;
(4) training a multiscale and context feature enhanced RetinaNet detection model constructed in the step (2):
(4a) setting the learning rate to be 0.00001, setting Adam by an optimizer, setting the number of training steps to be 2000, setting the number of training rounds to be 100, and using classification model parameters obtained by backbone network ResNet-101 pre-training on an ImageNet data set as initialization parameters of a RetinaNet detection model with multi-scale and context feature enhancement;
(4b) inputting the training samples obtained in the step (1) into a multiscale and context feature enhanced RetinaNet detection model, optimizing the overall loss function L in the step (3c) by using an optimizer Adam, updating weight parameters, and obtaining the multiscale and context feature enhanced RetinaNet detection model containing the weight parameters when the number of training rounds reaches 100;
(5) and inputting the test sample into a multiscale and context feature enhanced RetinaNet detection model containing weight parameters, and predicting and outputting the position of a target boundary box, the target category and the confidence score of the target in the test sample.
Compared with the prior art, the invention has the following advantages:
firstly, in the prior art, a multi-scale feature enhancement module is introduced, which considers the semantic information of the high-level feature map to be efficiently utilized and guides the high-level feature map and the low-level feature map to be fused, so that the low-level feature map has rich semantic information on the premise of keeping the resolution unchanged, the expression of the low-level feature map is enhanced, and the classification confidence of the target is improved.
Secondly, the invention considers the utilization of the global context feature information, introduces a context feature enhancement module, effectively utilizes the complex characteristic of the remote sensing image scene, establishes the relation between the current position and other positions from the feature level, and obtains the global context feature of the remote sensing image scene, thereby improving the target detection precision.
Drawings
FIG. 1 is a flow chart of an implementation of the present invention;
FIG. 2 is an image of simulation results of a baseball field test using the present invention and a reference method;
FIG. 3 is a simulation result image of a bridge being tested using the present invention and the baseline method;
FIG. 4 is an image of a simulation result of an aircraft being tested using the present invention and a baseline method.
Detailed Description
The embodiments and effects of the present invention will be described in further detail below with reference to the accompanying drawings.
Referring to fig. 1, the implementation steps of this embodiment are as follows:
step 1, obtaining a training sample and a testing sample.
The method comprises the steps of obtaining an open remote sensing image target detection data set NWPU VHR-10-v2, wherein the data set comprises 1172 remote sensing images with the size of 400 x 400 pixels and corresponding labeled target category and target position data on the remote sensing images, in the embodiment, 75% of data in the remote sensing image target detection data set are used as training samples, the rest 25% of data are used as test samples, namely 879 sample images in the remote sensing image target detection data set are used as the training samples, and the rest 293 images are used as the test samples.
And 2, constructing a multi-scale and context feature enhanced RetinaNet detection model.
2.1) obtaining 3 convolution characteristic graphs C3, C4 and C5 from a backbone network of a RetinaNet detection model:
the backbone network of the RetinaNet detection model comprises ResNet-50, ResNet-101 and ResNet-152, in the embodiment, the backbone network ResNet-101 is used, namely 3 convolution characteristic graphs C3, C4 and C5 are obtained from the backbone network ResNet-101 of the RetinaNet detection model;
2.2) obtaining 4 pyramid feature maps P3, P4, P5 and P6 from the feature pyramid network of the RetinaNet detection model;
2.3) constructing a multi-scale feature enhancement module consisting of 7 feature maps:
2.3.1) constructing 2 feature maps, wherein the first is a high-level feature map T1 and the second is a low-level feature map T2;
2.3.2) takes 2 branch operations in parallel on the first high-level feature map T1:
sequentially passing the first branch through a global average pooling layer, a dimension conversion layer, a 1 × 1 convolutional layer with a first step length of 1 and a first up-sampling layer to obtain a low-level feature map T3 containing global context information;
the second branch passes through a second 1 × 1 convolutional layer with the step size of 1 and a second up-sampling layer in sequence to obtain an up-sampled low-level characteristic diagram T4;
2.3.3) inputting the second low-level feature map T2 into the 3 × 3 convolutional layer with the step length of 1, and outputting to obtain a low-level feature map T5 after channel conversion;
2.3.4) inputting the low-level feature map T3 containing the global context information and the low-level feature map T5 after channel transformation into a fusion multiplication layer to obtain a fusion multiplied low-level feature map T6;
2.3.5) inputting the fused multiplied low-level feature map T6 and the up-sampled low-level feature map T4 into a fused addition layer to obtain a multi-scale enhanced feature map T7;
2.4) taking the 3 convolution feature maps C3, C4, C5 and the fourth pyramid feature map P6 as the input of the multi-scale feature enhancement module to obtain 3 fused multi-scale enhanced pyramid feature maps F3, F4 and F5:
2.4.1) inputting the second convolution feature map C4 as a high-level feature map T2 in the multi-scale feature enhancement module, inputting the first convolution feature map C3 as a low-level feature map T1 in the multi-scale enhancement module, and outputting to obtain a multi-scale enhanced first feature map E3;
2.4.2) adding and fusing the multi-scale enhanced first feature map E3 and the first pyramid feature map P3 to obtain a fused multi-scale enhanced first pyramid feature map F3;
2.4.3) inputting the third convolution feature map C5 as a high-level feature map T2 in the multi-scale feature enhancement module, inputting the second convolution feature map C4 as a low-level feature map T1 in the multi-scale feature enhancement module, and outputting to obtain a second feature map E4 after multi-scale enhancement;
2.4.4) adding and fusing the multi-scale enhanced second feature map E4 and the second pyramid feature map P4 to obtain a fused multi-scale enhanced second pyramid feature map F4;
2.4.5) inputting the fourth pyramid feature map P6 as a high-level feature map T2 in the multi-scale feature enhancement module, inputting the third convolution feature map C5 as a low-level feature map T1 in the multi-scale feature enhancement module, and outputting to obtain a multi-scale enhanced third feature map E5;
2.4.6) adding and fusing the multi-scale enhanced third feature map E5 and the third pyramid feature map P5 to obtain a fused multi-scale enhanced third pyramid feature map F5;
2.5) constructing a context feature enhancement module consisting of 5 feature maps:
2.5.1) constructing a fused multi-scale enhanced pyramid feature map S1, and sequentially passing the pyramid feature map through a 1 × 1 convolution layer with a first step size of 1 and a softmax layer to obtain an activated pyramid feature map S2;
2.5.2) inputting the activated pyramid feature map S2 and the fused multi-scale enhanced pyramid feature map S1 into a first fusion multiplication layer to obtain a pyramid feature map S3 after fusion multiplication;
2.5.3) sequentially passing the pyramid feature map S3 after fusion multiplication through a 1 × 1 convolution layer with a second step size of 1, a modified linear unit layer and a 1 × 1 convolution layer with a third step size of 1 to obtain a modified and fused pyramid feature map S4;
2.5.4) inputting the modified fused pyramid feature map S4 and the fused multi-scale enhanced pyramid feature map S1 into a second fusion multiplication layer to obtain a fused context feature enhanced pyramid feature map S5;
2.6) using the 3 pyramid feature maps F3, F4 and F5 after the fusion multi-scale enhancement as the input of the context feature enhancement module to obtain 3 pyramid feature maps G3, G4 and G5 after the fusion multi-scale context feature enhancement:
2.6.1) inputting the fused multi-scale enhanced first pyramid feature map F3 as a feature map S1 of a context feature enhancement module to obtain a fused context feature enhanced first pyramid feature map G3;
2.6.2) inputting the fused multi-scale enhanced second pyramid feature map F4 as a feature map S1 of a context feature enhancement module to obtain a fused context feature enhanced second pyramid feature map G4;
2.6.3) inputting the fused multi-scale enhanced third pyramid feature map F5 as the feature map S1 of the context feature enhancement module to obtain a fused context feature enhanced third pyramid feature map G5.
And 3, setting an integral loss function L of a target classification and target position regression task in the constructed multi-scale and context feature enhanced RetinaNet detection model.
3.1) setting the existing Focal local function as a Loss function of a target classification task in a multiscale and context feature enhanced RetinaNet detection model, and using LclsExpressed as:
Lcls=FL(pi),
wherein, FL (p)i)=-α(1-pi)γ×log(pi) Representing the focus loss function, α representing the equilibrium parameter for positive and negative samples, gamma representing the concentration parameter, piThe probability that the ith anchor frame is a prediction target is represented, i represents the index of the anchor frame, the value range of i is from 1 to M, and M is the total number of the anchor frames;
in this example, α is set to 0.25, γ is set to 2.0;
3.2) setting the existing Smooth L1Loss function as a Loss function of a target position regression task in a multiscale and context feature enhanced RetinaNet detection model, and using LregExpressed as:
Lreg=SmoothL1(x),
wherein, SmoothL1(x) Representing a smoothed L1 squared loss function, represents the offset t of the ith anchor frame relative to the predicted target frameiOffset of ith anchor frame relative to real target frameA difference of (d);
3.3) loss function L of the task of classification by targetclsAnd the loss function L of the target position regression taskregSetting the overall loss function L of the multi-scale context feature enhanced RetinaNet detection model as follows:
L=L({pi},{ti}),
wherein,loss function for the target detection task and the target position regression task as a whole, NclsRepresenting the total number of positive sample anchor boxes in the target classification task,representing the probability that the ith anchor box is a true target,a loss function of a target classification task in a multi-scale and context feature enhanced RetinaNet detection model, lambda represents a balance weight parameter between the target classification task and a target position regression task, NregRepresents the total number of positive sample anchor boxes in the target location regression task,indicates the offset, t, of the ith anchor frame relative to the true target frameiIndicating the offset of the ith anchor box relative to the predicted target bounding box,a loss function of a target position regression task in a multiscale and context feature enhanced RetinaNet detection model;
in this embodiment, λ is 1.
And 4, training the multiscale and context feature enhanced RetinaNet detection model constructed in the step 2.
4.1) setting training parameters:
in this embodiment, the learning rate is set to 0.00001, Adam is used by the optimizer, the number of training steps is set to 2000, the number of training rounds is set to 100, and classification model parameters obtained by using backbone network ResNet-101 pre-training are used on the ImageNet data set as initialization parameters of a multiscale and context feature enhanced retannet detection model;
4.2) inputting the training samples in the step 1 into a multi-scale and context feature enhanced RetinaNet detection model, optimizing the overall loss function L in the step 3 by using an optimizer Adam, updating the weight parameters, and obtaining the multi-scale and context feature enhanced RetinaNet detection model containing the weight parameters when the number of training rounds reaches 100.
And 5, inputting the test sample in the step 1 into a multiscale and context feature enhanced RetinaNet detection model containing weight parameters, and predicting and outputting the position of a target boundary frame, the target type and the confidence score of the target in the test sample image.
The effect of the invention can be further illustrated by the following simulation experiment:
simulation conditions and contents
The simulation adopts a public NWPU VHR-10-v2 data set widely applied to performance evaluation of a remote sensing image target detection algorithm to train and test a RetinaNet detection model with multi-scale and context feature enhancement, and the adopted benchmark method is the RetinaNet detection model.
Let the NWPU VHR-10-v2 dataset include 10 object classes, respectively: airplanes, ships, oil storage tanks, baseball fields, basketball fields, tennis courts, playgrounds, ports, vehicles, and bridges.
The processor used for simulation isXeon(R)CPU E5-2630v4@2.20GHz×40,The memory is 64.00GB, the GPU is 8G GeForce GTX1080, the simulation platform is an Ubuntu16.04 operating system, a Keras deep learning framework is used, and Python language is adopted for realization.
Second, simulation content
Simulation 1: the detection simulation of the baseball field using the present invention and the existing reference method has the result shown in fig. 2, and as can be seen from fig. 2, the classification confidence score of the baseball field of the reference method is 0.929, as shown in fig. 2(a), the classification confidence score of the baseball field of the present invention reaches 1.000, as shown in fig. 2(b), compared with the reference method, the classification performance of the baseball field of the present invention is relatively obviously improved.
Simulation 2: the bridge detection simulation is carried out by using the method and the existing benchmark method, the result is shown in fig. 3, the classification confidence scores of 2 bridges in the benchmark method are respectively 0.660 and 0.850, as shown in fig. 3(a), the classification confidence scores of 2 bridges in the method respectively reach 0.974 and 0.927, as shown in fig. 3(b), compared with the benchmark method, the method has obvious improvement on the classification confidence scores of the bridges, and the method is mainly characterized in that the expression of the context characteristics is enhanced by introducing a context characteristic enhancement module due to strong dependence of the bridges on context information of a scene.
Simulation 3: the results of the detection simulation of 5 airplanes by using the present invention and the existing benchmark method are shown in fig. 4, and it can be seen from fig. 4 that the classification confidence scores of 5 airplanes in the benchmark method are all 1.000, as shown in fig. 4(a), and the classification confidence scores of 5 airplanes in the present invention are all 1.000, as shown in fig. 4(b), which indicates that the benchmark method and the present invention have good performance for airplane classification.
Third, comparing and analyzing simulation experiment results
To verify the effectiveness of the present invention, 3 existing methods were set up, of which: the existing method 1 is a RetinaNet detection model; the existing method 2 is a remote sensing image target detection model with rotation insensitivity and context enhancement; the existing method 3 is a remote sensing image target detection model with multi-model decision fusion.
The mean average precision is used as an evaluation index when all target types are detected, the average precision is used as an evaluation index when a single-type target is detected, the target on the NWPU VHR-10-v2 test data set is subjected to detection simulation by using the method and 3 existing methods, and the numerical results of the detected evaluation indexes are compared, as shown in Table 1.
TABLE 1 comparison of evaluation index values measured by the present invention and 3 conventional methods
In table 1, the comparison of the evaluation index numerical results detected by the present invention and 3 existing methods, the results of the average precision of the multi-target detection and the average precision of each category are both decimal numbers, and bold represents the highest average precision of the detection of the category target in the above four methods.
According to table 1, the following 3 conclusions are obtained in comparison of the evaluation index numerical results detected by the invention and 3 existing methods:
1) the average precision of the mean value of the existing method 1 is 0.9150, the average precision of the mean value of the invention is 0.9551, and the average precision of the mean value of the invention is improved by 0.0401 compared with the average precision of the mean value of the existing method 1;
2) the average accuracy of 6 types of targets is higher than that of 8 types of targets in the prior method 1, particularly for bridges and basketball courts, the average accuracy is obviously improved, mainly because the bridges and the basketball courts have stronger dependence on context information, the introduced context characteristic enhancement module enhances the expression of context characteristics, the average accuracy of ship detection is also improved, and mainly because the scale change of ships is large, the introduced multi-scale characteristic enhancement module enhances the expression of multi-scale characteristics of the targets;
3) for the existing method 2 and the existing method 3, both belong to two-step target detection models, and the invention belongs to a single-step target detection model, generally, the average precision of the mean value of the two-step target detection model is higher than that of the single-step target detection model, and the comparison of the evaluation index numerical results of the detection shows that the average precision of the mean value of the invention is higher than that of the existing method 2 and the existing method 3.
In summary, the invention introduces a multi-scale feature enhancement module on the basis of the existing RetinaNet detection model, guides semantic information on a high-level feature map to a low-level feature map, enriches the semantic information of the low-level feature map, further introduces a context feature enhancement module, and finally applies the RetinaNet detection model introduced with the multi-scale and context feature enhancement module to target detection, outputs a detection result, and improves the precision of remote sensing image target detection.
The foregoing description is only an example of the present invention and is not intended to limit the invention, so that it will be apparent to those skilled in the art that various changes and modifications in form and detail may be made therein without departing from the spirit and scope of the invention.

Claims (7)

1. A remote sensing image target detection method based on depth multi-scale and context learning is characterized by comprising the following steps:
(1) taking 75% of the remote sensing image target detection data set as a training sample, and taking the remaining 25% as a test sample;
(2) constructing a multiscale and context feature enhanced RetinaNet detection model:
(2a) obtaining 3 convolution characteristic graphs C3, C4 and C5 from a backbone network ResNet-101 of a RetinaNet detection model;
(2b) obtaining 4 pyramid feature maps P3, P4, P5 and P6 from a feature pyramid network of a RetinaNet detection model;
(2c) constructing a multi-scale feature enhancement module consisting of 7 feature maps;
(2d) taking the 3 convolution feature maps C3, C4, C5 and the fourth pyramid feature map P6 as the input of the multi-scale feature enhancement module to obtain 3 pyramid feature maps F3, F4 and F5 after fusion multi-scale enhancement;
(2e) constructing a context feature enhancement module consisting of 5 feature graphs;
(2f) taking the 3 fused multi-scale enhanced pyramid feature maps F3, F4 and F5 as the input of the context feature enhancement module to obtain 3 fused multi-scale context feature enhanced pyramid feature maps G3, G4 and G5;
(3) setting an integral loss function L of a target classification and target position regression task in a multiscale and context feature enhanced RetinaNet detection model:
(3a) setting the existing Focal local function as a Loss function of a target classification task in a multiscale and context feature enhanced RetinaNet detection model, and using LclsRepresents;
(3b) setting the existing Smooth L1Loss function as a Loss function of a target position regression task in a multiscale and context feature enhanced RetinaNet detection model, and using LregRepresents:
(3c) loss function L of task classified by targetclsAnd the loss function L of the target position regression taskregSetting the overall loss function L of the multi-scale context feature enhanced RetinaNet detection model as follows:
L=L({pi},{ti}),
wherein,loss function for the target detection task and the target position regression task as a whole, NclsRepresents the total number of positive sample anchor boxes, p, in the target classification taskiRepresenting the probability that the ith anchor box is the predicted target,representing the probability that the ith anchor box is a true target,a loss function of a target classification task in a multi-scale and context feature enhanced RetinaNet detection model, lambda represents a balance weight parameter between the target classification task and a target position regression task, NregRepresents the total number of positive sample anchor boxes in the target location regression task,indicates the offset, t, of the ith anchor frame relative to the true target frameiIndicating the offset of the ith anchor box relative to the predicted target bounding box,i represents the index of an anchor frame, the value range of the index is from 1 to M, and M is the total number of the anchor frames;
(4) training a multiscale and context feature enhanced RetinaNet detection model constructed in the step (2):
(4a) setting the learning rate to be 0.00001, setting Adam by an optimizer, setting the number of training steps to be 2000, setting the number of training rounds to be 100, and using classification model parameters obtained by backbone network ResNet-101 pre-training on an ImageNet data set as initialization parameters of a RetinaNet detection model with multi-scale and context feature enhancement;
(4b) inputting the training samples obtained in the step (1) into a multiscale and context feature enhanced RetinaNet detection model, optimizing the overall loss function L in the step (3c) by using an optimizer Adam, updating weight parameters, and obtaining the multiscale and context feature enhanced RetinaNet detection model containing the weight parameters when the number of training rounds reaches 100;
(5) and inputting the test sample into a multiscale and context feature enhanced RetinaNet detection model containing weight parameters, and predicting and outputting the position of a target boundary box, the target category and the confidence score of the target in the test sample.
2. The method of claim 1, wherein (2c) constructs a multi-scale feature enhancement module consisting of 7 feature maps, which is implemented as follows:
(2c1) constructing 2 feature maps, wherein the first is a high-level feature map T1 and the second is a low-level feature map T2;
(2c2) take 2 branch operations side by side on the first high level feature graph:
sequentially passing the first branch through a global average pooling layer, a dimension conversion layer, a first 1 × 1 convolutional layer with the step length of 1 and a first up-sampling layer to obtain a low-level feature map T3 containing global context information;
the second branch passes through a second 1 × 1 convolutional layer with the step size of 1 and a second up-sampling layer in sequence to obtain an up-sampled low-level characteristic diagram T4;
(2c3) inputting the second low-level feature map T2 into the 3 × 3 convolutional layer with step length of 1, and outputting to obtain a low-level feature map T5 after channel conversion;
(2c4) inputting the low-level feature map T3 containing global context information and the channel-transformed low-level feature map T5 into a fusion multiplication layer to obtain a fusion-multiplied low-level feature map T6,
(2c5) and inputting the fused and multiplied low-level feature map T6 and the up-sampled low-level feature map T4 into a fused addition layer to obtain a multi-scale enhanced feature map T7.
3. The method of claim 1 or 2, wherein (2d) 3 convolution feature maps C3, C4, C5 and a fourth pyramid feature map P6 are used as input to the multi-scale feature enhancement module to obtain 3 fused multi-scale enhanced pyramid feature maps F3, F4 and F5, which are implemented as follows:
(2d1) inputting the second convolution feature map C4 as a high-level feature map T2 in the multi-scale feature enhancement module, inputting the first convolution feature map C3 as a low-level feature map T1 in the multi-scale enhancement module, and outputting to obtain a multi-scale enhanced first feature map E3;
(2d2) adding and fusing the multi-scale enhanced first feature map E3 and the first pyramid feature map P3 to obtain a fused multi-scale enhanced first pyramid feature map F3;
(2d3) inputting a third convolution feature map C5 as a high-level feature map T2 in the multi-scale feature enhancement module, inputting a second convolution feature map C4 as a low-level feature map T1 in the multi-scale feature enhancement module, and outputting to obtain a second feature map E4 after multi-scale enhancement;
(2d4) adding and fusing the multi-scale enhanced second feature map E4 and the second pyramid feature map P4 to obtain a fused multi-scale enhanced second pyramid feature map F4;
(2d5) inputting the fourth pyramid feature map P6 as a high-level feature map T2 in the multi-scale feature enhancement module, inputting the third convolution feature map C5 as a low-level feature map T1 in the multi-scale feature enhancement module, and outputting to obtain a multi-scale enhanced third feature map E5;
(2d6) and adding and fusing the multi-scale enhanced third feature map E5 and the third pyramid feature map P5 to obtain a fused multi-scale enhanced third pyramid feature map F5.
4. The method of claim 1, wherein (2e) constructs a context feature enhancement module consisting of 5 feature maps, which is implemented as follows:
(2e1) constructing a fused multi-scale enhanced pyramid feature map S1, and sequentially passing the pyramid feature map through a 1 × 1 convolution layer with a first step size of 1 and a softmax layer to obtain an activated pyramid feature map S2;
(2e2) inputting the activated pyramid feature map S2 and the fused multi-scale enhanced pyramid feature map S1 into a first fusion multiplication layer to obtain a pyramid feature map S3 after fusion multiplication;
(2e3) sequentially passing the fused and multiplied pyramid feature map S3 through a second 1 × 1 convolution layer with the step size of 1, a modified linear unit layer and a third 1 × 1 convolution layer with the step size of 1 to obtain a modified and fused pyramid feature map S4;
(2e4) inputting the modified fused pyramid feature map S4 and the fused multi-scale enhanced pyramid feature map S1 into a second fusion multiplication layer to obtain a fused context feature enhanced pyramid feature map S5.
5. The method of claim 1, wherein 3 fused multi-scale enhanced pyramid feature maps F3, F4, and F5 are used as input of the context feature enhancement module in (2F), resulting in 3 fused multi-scale context feature enhanced pyramid feature maps G3, G4, and G5, which are implemented as follows:
(2f1) inputting the fused multi-scale enhanced first pyramid feature map F3 as a feature map S1 of a context feature enhancement module to obtain a fused context feature enhanced first pyramid feature map G3;
(2f2) inputting the second pyramid feature map F4 subjected to fusion multi-scale enhancement as a feature map S1 of a context feature enhancement module to obtain a second pyramid feature map G4 subjected to fusion context feature enhancement;
(2f3) and inputting the third pyramid feature map F5 subjected to fusion multi-scale enhancement as a feature map S1 of a context feature enhancement module to obtain a third pyramid feature map G5 subjected to fusion context feature enhancement.
6. The method according to claim 1, wherein (3a) the existing Focal local function is set as a Loss function L of a target classification task in a multiscale and context feature enhanced RetinaNet detection modelclsIt is expressed as follows:
Lcls=FL(pi),
wherein, FL (p)i)=-α(1-pi)γ×log(pi) Representing the focus loss function, α representing the balance parameters of the positive and negative samplesγ denotes concentration parameter, piRepresenting the probability that the ith anchor box is the predicted target.
7. The method according to claim 1, wherein (3b) the existing Smooth L1Loss function is set as the Loss function L of the target position regression task in the multiscale and context feature enhanced RetinaNet detection modelregIt is expressed as follows:
Lreg=SmoothL1(x),
wherein, SmoothL1(x) Representing a smoothed L1 squared loss function,represents the offset t of the ith anchor frame relative to the predicted target frameiOffset of ith anchor frame relative to real target frameThe difference of (a).
CN201910583811.1A 2019-07-01 2019-07-01 Remote sensing image target detection method based on depth multi-scale and context learning Active CN110287927B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910583811.1A CN110287927B (en) 2019-07-01 2019-07-01 Remote sensing image target detection method based on depth multi-scale and context learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910583811.1A CN110287927B (en) 2019-07-01 2019-07-01 Remote sensing image target detection method based on depth multi-scale and context learning

Publications (2)

Publication Number Publication Date
CN110287927A true CN110287927A (en) 2019-09-27
CN110287927B CN110287927B (en) 2021-07-27

Family

ID=68021357

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910583811.1A Active CN110287927B (en) 2019-07-01 2019-07-01 Remote sensing image target detection method based on depth multi-scale and context learning

Country Status (1)

Country Link
CN (1) CN110287927B (en)

Cited By (27)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110782484A (en) * 2019-10-25 2020-02-11 上海浦东临港智慧城市发展中心 Unmanned aerial vehicle video personnel identification and tracking method
CN110991359A (en) * 2019-12-06 2020-04-10 重庆市地理信息和遥感应用中心(重庆市测绘产品质量检验测试中心) Satellite image target detection method based on multi-scale depth convolution neural network
CN111160249A (en) * 2019-12-30 2020-05-15 西北工业大学深圳研究院 Multi-class target detection method of optical remote sensing image based on cross-scale feature fusion
CN111242071A (en) * 2020-01-17 2020-06-05 陕西师范大学 Attention remote sensing image target detection method based on anchor frame
CN111274981A (en) * 2020-02-03 2020-06-12 中国人民解放军国防科技大学 Target detection network construction method and device and target detection method
CN111310611A (en) * 2020-01-22 2020-06-19 上海交通大学 Method for detecting cell visual field map and storage medium
CN111325116A (en) * 2020-02-05 2020-06-23 武汉大学 Remote sensing image target detection method capable of evolving based on offline training-online learning depth
CN111414931A (en) * 2019-12-31 2020-07-14 杭州电子科技大学 Multi-branch multi-scale small target detection method based on image depth
CN111414880A (en) * 2020-03-26 2020-07-14 电子科技大学 Method for detecting target of active component in microscopic image based on improved RetinaNet
CN111553303A (en) * 2020-05-07 2020-08-18 武汉大势智慧科技有限公司 Remote sensing ortho image dense building extraction method based on convolutional neural network
CN111833321A (en) * 2020-07-07 2020-10-27 杭州电子科技大学 Window-adjusting optimization-enhanced intracranial hemorrhage detection model and construction method thereof
CN112053342A (en) * 2020-09-02 2020-12-08 陈燕铭 Method and device for extracting and identifying pituitary magnetic resonance image based on artificial intelligence
CN112183435A (en) * 2020-10-12 2021-01-05 河南威虎智能科技有限公司 Two-stage hand target detection method
CN112200045A (en) * 2020-09-30 2021-01-08 华中科技大学 Remote sensing image target detection model establishing method based on context enhancement and application
CN112287983A (en) * 2020-10-15 2021-01-29 西安电子科技大学 Remote sensing image target extraction system and method based on deep learning
CN112418108A (en) * 2020-11-25 2021-02-26 西北工业大学深圳研究院 Remote sensing image multi-class target detection method based on sample reweighing
CN112464743A (en) * 2020-11-09 2021-03-09 西北工业大学 Small sample target detection method based on multi-scale feature weighting
CN112560956A (en) * 2020-12-16 2021-03-26 珠海格力智能装备有限公司 Target detection method and device, nonvolatile storage medium and electronic equipment
CN112634174A (en) * 2020-12-31 2021-04-09 上海明略人工智能(集团)有限公司 Image representation learning method and system
CN113128564A (en) * 2021-03-23 2021-07-16 武汉泰沃滋信息技术有限公司 Typical target detection method and system based on deep learning under complex background
CN113469088A (en) * 2021-07-08 2021-10-01 西安电子科技大学 SAR image ship target detection method and system in passive interference scene
CN113536986A (en) * 2021-06-29 2021-10-22 南京逸智网络空间技术创新研究院有限公司 Representative feature-based dense target detection method in remote sensing image
CN114170590A (en) * 2021-10-18 2022-03-11 中科南京人工智能创新研究院 RetinaNet network improvement-based new energy license plate detection and identification method
CN114998603A (en) * 2022-03-15 2022-09-02 燕山大学 Underwater target detection method based on depth multi-scale feature factor fusion
CN115937698A (en) * 2022-09-29 2023-04-07 华中师范大学 Self-adaptive tailing pond remote sensing deep learning detection method
CN116310850A (en) * 2023-05-25 2023-06-23 南京信息工程大学 Remote sensing image target detection method based on improved RetinaNet
CN117612029A (en) * 2023-12-21 2024-02-27 石家庄铁道大学 Remote sensing image target detection method based on progressive feature smoothing and scale adaptive expansion convolution

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160078273A1 (en) * 2014-07-25 2016-03-17 Digitalglobe, Inc. Global-scale damage detection using satellite imagery
CN108304873A (en) * 2018-01-30 2018-07-20 深圳市国脉畅行科技股份有限公司 Object detection method based on high-resolution optical satellite remote-sensing image and its system
CN109117876A (en) * 2018-07-26 2019-01-01 成都快眼科技有限公司 A kind of dense small target deteection model building method, model and detection method
US20190050667A1 (en) * 2017-03-10 2019-02-14 TuSimple System and method for occluding contour detection
CN109344821A (en) * 2018-08-30 2019-02-15 西安电子科技大学 Small target detecting method based on Fusion Features and deep learning
CN109583340A (en) * 2018-11-15 2019-04-05 中山大学 A kind of video object detection method based on deep learning

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160078273A1 (en) * 2014-07-25 2016-03-17 Digitalglobe, Inc. Global-scale damage detection using satellite imagery
US20190050667A1 (en) * 2017-03-10 2019-02-14 TuSimple System and method for occluding contour detection
CN108304873A (en) * 2018-01-30 2018-07-20 深圳市国脉畅行科技股份有限公司 Object detection method based on high-resolution optical satellite remote-sensing image and its system
CN109117876A (en) * 2018-07-26 2019-01-01 成都快眼科技有限公司 A kind of dense small target deteection model building method, model and detection method
CN109344821A (en) * 2018-08-30 2019-02-15 西安电子科技大学 Small target detecting method based on Fusion Features and deep learning
CN109583340A (en) * 2018-11-15 2019-04-05 中山大学 A kind of video object detection method based on deep learning

Cited By (40)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110782484A (en) * 2019-10-25 2020-02-11 上海浦东临港智慧城市发展中心 Unmanned aerial vehicle video personnel identification and tracking method
CN110991359A (en) * 2019-12-06 2020-04-10 重庆市地理信息和遥感应用中心(重庆市测绘产品质量检验测试中心) Satellite image target detection method based on multi-scale depth convolution neural network
CN111160249A (en) * 2019-12-30 2020-05-15 西北工业大学深圳研究院 Multi-class target detection method of optical remote sensing image based on cross-scale feature fusion
CN111414931A (en) * 2019-12-31 2020-07-14 杭州电子科技大学 Multi-branch multi-scale small target detection method based on image depth
CN111414931B (en) * 2019-12-31 2023-04-25 杭州电子科技大学 Multi-branch multi-scale small target detection method based on image depth
CN111242071A (en) * 2020-01-17 2020-06-05 陕西师范大学 Attention remote sensing image target detection method based on anchor frame
CN111242071B (en) * 2020-01-17 2023-04-07 陕西师范大学 Attention remote sensing image target detection method based on anchor frame
CN111310611B (en) * 2020-01-22 2023-06-06 上海交通大学 Method for detecting cell view map and storage medium
CN111310611A (en) * 2020-01-22 2020-06-19 上海交通大学 Method for detecting cell visual field map and storage medium
CN111274981A (en) * 2020-02-03 2020-06-12 中国人民解放军国防科技大学 Target detection network construction method and device and target detection method
CN111325116A (en) * 2020-02-05 2020-06-23 武汉大学 Remote sensing image target detection method capable of evolving based on offline training-online learning depth
CN111414880B (en) * 2020-03-26 2022-10-14 电子科技大学 Method for detecting target of active component in microscopic image based on improved RetinaNet
CN111414880A (en) * 2020-03-26 2020-07-14 电子科技大学 Method for detecting target of active component in microscopic image based on improved RetinaNet
CN111553303B (en) * 2020-05-07 2024-03-29 武汉大势智慧科技有限公司 Remote sensing orthographic image dense building extraction method based on convolutional neural network
CN111553303A (en) * 2020-05-07 2020-08-18 武汉大势智慧科技有限公司 Remote sensing ortho image dense building extraction method based on convolutional neural network
CN111833321B (en) * 2020-07-07 2023-10-20 杭州电子科技大学 Intracranial hemorrhage detection model with window adjusting optimization enhancement and construction method thereof
CN111833321A (en) * 2020-07-07 2020-10-27 杭州电子科技大学 Window-adjusting optimization-enhanced intracranial hemorrhage detection model and construction method thereof
CN112053342A (en) * 2020-09-02 2020-12-08 陈燕铭 Method and device for extracting and identifying pituitary magnetic resonance image based on artificial intelligence
CN112200045A (en) * 2020-09-30 2021-01-08 华中科技大学 Remote sensing image target detection model establishing method based on context enhancement and application
CN112200045B (en) * 2020-09-30 2024-03-19 华中科技大学 Remote sensing image target detection model establishment method based on context enhancement and application
CN112183435A (en) * 2020-10-12 2021-01-05 河南威虎智能科技有限公司 Two-stage hand target detection method
CN112287983A (en) * 2020-10-15 2021-01-29 西安电子科技大学 Remote sensing image target extraction system and method based on deep learning
CN112287983B (en) * 2020-10-15 2023-10-10 西安电子科技大学 Remote sensing image target extraction system and method based on deep learning
CN112464743A (en) * 2020-11-09 2021-03-09 西北工业大学 Small sample target detection method based on multi-scale feature weighting
CN112464743B (en) * 2020-11-09 2023-06-02 西北工业大学 Small sample target detection method based on multi-scale feature weighting
CN112418108A (en) * 2020-11-25 2021-02-26 西北工业大学深圳研究院 Remote sensing image multi-class target detection method based on sample reweighing
CN112560956A (en) * 2020-12-16 2021-03-26 珠海格力智能装备有限公司 Target detection method and device, nonvolatile storage medium and electronic equipment
CN112634174A (en) * 2020-12-31 2021-04-09 上海明略人工智能(集团)有限公司 Image representation learning method and system
CN112634174B (en) * 2020-12-31 2023-12-12 上海明略人工智能(集团)有限公司 Image representation learning method and system
CN113128564A (en) * 2021-03-23 2021-07-16 武汉泰沃滋信息技术有限公司 Typical target detection method and system based on deep learning under complex background
CN113536986A (en) * 2021-06-29 2021-10-22 南京逸智网络空间技术创新研究院有限公司 Representative feature-based dense target detection method in remote sensing image
CN113469088A (en) * 2021-07-08 2021-10-01 西安电子科技大学 SAR image ship target detection method and system in passive interference scene
CN114170590A (en) * 2021-10-18 2022-03-11 中科南京人工智能创新研究院 RetinaNet network improvement-based new energy license plate detection and identification method
CN114998603A (en) * 2022-03-15 2022-09-02 燕山大学 Underwater target detection method based on depth multi-scale feature factor fusion
CN114998603B (en) * 2022-03-15 2024-08-16 燕山大学 Underwater target detection method based on depth multi-scale feature factor fusion
CN115937698A (en) * 2022-09-29 2023-04-07 华中师范大学 Self-adaptive tailing pond remote sensing deep learning detection method
CN116310850B (en) * 2023-05-25 2023-08-15 南京信息工程大学 Remote sensing image target detection method based on improved RetinaNet
CN116310850A (en) * 2023-05-25 2023-06-23 南京信息工程大学 Remote sensing image target detection method based on improved RetinaNet
CN117612029A (en) * 2023-12-21 2024-02-27 石家庄铁道大学 Remote sensing image target detection method based on progressive feature smoothing and scale adaptive expansion convolution
CN117612029B (en) * 2023-12-21 2024-05-24 石家庄铁道大学 Remote sensing image target detection method based on progressive feature smoothing and scale adaptive expansion convolution

Also Published As

Publication number Publication date
CN110287927B (en) 2021-07-27

Similar Documents

Publication Publication Date Title
CN110287927B (en) Remote sensing image target detection method based on depth multi-scale and context learning
CN108985334B (en) General object detection system and method for improving active learning based on self-supervision process
CN111860235A (en) Method and system for generating high-low-level feature fused attention remote sensing image description
CN116612120B (en) Two-stage road defect detection method for data unbalance
CN110659601B (en) Depth full convolution network remote sensing image dense vehicle detection method based on central point
CN114998603B (en) Underwater target detection method based on depth multi-scale feature factor fusion
CN111368634B (en) Human head detection method, system and storage medium based on neural network
CN113487600B (en) Feature enhancement scale self-adaptive perception ship detection method
CN116310850B (en) Remote sensing image target detection method based on improved RetinaNet
CN115587964A (en) Entropy screening-based pseudo label cross consistency change detection method
CN116977710A (en) Remote sensing image long tail distribution target semi-supervised detection method
CN114898173A (en) Semi-supervised target detection method for improving quality and class imbalance of pseudo label
CN117217368A (en) Training method, device, equipment, medium and program product of prediction model
CN115439654A (en) Method and system for finely dividing weakly supervised farmland plots under dynamic constraint
CN116561322A (en) Relation extracting method, relation extracting device and medium for network information
CN105787045B (en) A kind of precision Enhancement Method for visual media semantic indexing
CN112579583B (en) Evidence and statement combined extraction method for fact detection
CN117351440A (en) Semi-supervised ship detection method and system based on open text detection
Zhao et al. Recognition and Classification of Concrete Cracks under Strong Interference Based on Convolutional Neural Network.
CN116385876A (en) Optical remote sensing image ground object detection method based on YOLOX
CN115661542A (en) Small sample target detection method based on feature relation migration
CN115471456A (en) Aircraft landing gear detection method based on improved yolov5
CN114782983A (en) Road scene pedestrian detection method based on improved feature pyramid and boundary loss
CN113128559A (en) Remote sensing image target detection method based on cross-scale feature fusion pyramid network
CN114331950A (en) SAR image ship detection method based on dense connection sparse activation network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant