CN110287927B - Remote sensing image target detection method based on depth multi-scale and context learning - Google Patents

Remote sensing image target detection method based on depth multi-scale and context learning Download PDF

Info

Publication number
CN110287927B
CN110287927B CN201910583811.1A CN201910583811A CN110287927B CN 110287927 B CN110287927 B CN 110287927B CN 201910583811 A CN201910583811 A CN 201910583811A CN 110287927 B CN110287927 B CN 110287927B
Authority
CN
China
Prior art keywords
feature
feature map
scale
enhanced
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910583811.1A
Other languages
Chinese (zh)
Other versions
CN110287927A (en
Inventor
张向荣
唐旭
王少娜
陈璞花
古晶
马文萍
马晶晶
侯彪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xidian University
Original Assignee
Xidian University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xidian University filed Critical Xidian University
Priority to CN201910583811.1A priority Critical patent/CN110287927B/en
Publication of CN110287927A publication Critical patent/CN110287927A/en
Application granted granted Critical
Publication of CN110287927B publication Critical patent/CN110287927B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/10Terrestrial scenes
    • G06V20/13Satellite images

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Astronomy & Astrophysics (AREA)
  • Remote Sensing (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a remote sensing image target detection method based on depth multi-scale and context learning, and mainly solves the problems that in the prior art, a feature fusion mode is rough, and the utilization of context feature information is not considered, so that the detection precision is low. The method comprises the following implementation steps: acquiring a training sample and a test sample in a remote sensing image target detection data set; constructing a multiscale and context feature enhanced RetinaNet detection model, and setting a target classification task and a target position regression task overall loss function; inputting the training sample into the constructed detection model for training to obtain a trained detection model; and inputting the test sample into the trained detection model, and predicting and outputting the target type, the target confidence coefficient and the target position. The method improves the expression capability of the characteristics, improves the average precision of target detection of the remote sensing image, and can be used for acquiring the target and the position of the target which are interested in one remote sensing image.

Description

Remote sensing image target detection method based on depth multi-scale and context learning
Technical Field
The invention belongs to the technical field of remote sensing images, and particularly relates to a target detection method for a remote sensing image, which can be used for obtaining an interested target in one remote sensing image and the position of the target.
Background
Remote sensing image target detection is one of important research contents in the field of remote sensing, and is widely applied to the fields of homeland planning, disaster monitoring, military reconnaissance and the like. The purpose of remote sensing image target detection is to judge whether an interested target exists in a remote sensing image and determine the position of the target.
The traditional remote sensing image target detection methods comprise a template matching-based method, a knowledge-based method and a detection object-based method, and the methods rely on a large amount of characteristic engineering to realize the detection of the target in the remote sensing image to a great extent. However, for the complicated and changeable remote sensing image background environment, the target scale difference is obvious, and the like, the adaptability of the methods is not strong. In recent years, a method based on deep learning is widely adopted for remote sensing image target detection. The deep convolutional neural network does not need to design features manually on the aspect of target detection, the remote sensing image data is subjected to feature extraction automatically, and performance exceeds that of a traditional algorithm. The RetinaNet (local for detect Object detection) model has the advantages of no need of generating a candidate region, high target detection speed, high precision and the like. However, the RetinaNet model still has limitations. Because the network architecture adopted by RetinaNet is a feature pyramid network, the feature pyramid network adds and fuses the feature graph of the current layer and the adjacent higher-level feature graph to obtain a feature graph for detecting the target. In this case, the feature fusion mode is rough, and the more effective utilization of the high-level feature map and the utilization of the context information are omitted, which restricts the improvement of the target detection precision of the remote sensing image.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provides a remote sensing image target detection method based on depth multi-scale and context learning so as to improve the target detection precision in a remote sensing image.
The technical scheme of the invention is as follows: and a multi-scale feature enhancement module and a context feature enhancement module are introduced into the RetinaNet detection model to construct a RetinaNet detection model with multi-scale and context feature enhancement by fully considering the more effective feature map fusion mode and the problem of how to utilize global context feature information. Firstly, feature maps of a plurality of levels are obtained from a backbone network and a feature pyramid network of a RetinaNet detection model, then a multi-scale feature enhancement module is introduced, semantic information of each relatively high-level feature map is guided to adjacent low-level feature maps for the feature maps of the plurality of levels, the semantic information of each relatively low-level feature map is enriched, then a context feature enhancement module is introduced to the pyramid feature map fused with the multi-scale enhancement to obtain global context features of a remote sensing image scene, finally the enhanced pyramid feature map is used in the detection model, and multi-task type determination and target position positioning are achieved through multi-task learning. The concrete implementation steps comprise:
1. a remote sensing image target detection method based on depth multi-scale and context learning is characterized by comprising the following steps:
(1) taking 75% of the remote sensing image target detection data set as a training sample, and taking the remaining 25% as a test sample;
(2) constructing a multiscale and context feature enhanced RetinaNet detection model:
(2a) obtaining 3 convolution characteristic graphs C3, C4 and C5 from a backbone network ResNet-101 of a RetinaNet detection model;
(2b) obtaining 4 pyramid feature maps P3, P4, P5 and P6 from a feature pyramid network of a RetinaNet detection model;
(2c) constructing a multi-scale feature enhancement module consisting of 7 feature maps;
(2d) taking the 3 convolution feature maps C3, C4, C5 and the fourth pyramid feature map P6 as the input of the multi-scale feature enhancement module to obtain 3 pyramid feature maps F3, F4 and F5 after fusion multi-scale enhancement;
(2e) constructing a context feature enhancement module consisting of 5 feature graphs;
(2f) taking the 3 fused multi-scale enhanced pyramid feature maps F3, F4 and F5 as the input of the context feature enhancement module to obtain 3 fused multi-scale context feature enhanced pyramid feature maps G3, G4 and G5;
(3) setting an integral loss function L of a target classification and target position regression task in a multiscale and context feature enhanced RetinaNet detection model:
(3a) setting the existing Focal local function as a Loss function of a target classification task in a multiscale and context feature enhanced RetinaNet detection model, and using LclsRepresents;
(3b) setting the existing Smooth L1Loss function as a Loss function of a target position regression task in a multiscale and context feature enhanced RetinaNet detection model, and using LregRepresents:
(3c) loss function L of task classified by targetclsAnd the loss function L of the target position regression taskregSetting the overall loss function L of the multi-scale context feature enhanced RetinaNet detection model as follows:
L=L({pi},{ti}),
wherein,
Figure BDA0002113830500000031
loss function for the target detection task and the target position regression task as a whole, NclsRepresents the total number of positive sample anchor boxes, p, in the target classification taskiRepresenting the probability that the ith anchor box is the predicted target,
Figure BDA0002113830500000035
representing the probability that the ith anchor box is a true target,
Figure BDA0002113830500000032
a loss function of a target classification task in a multi-scale and context feature enhanced RetinaNet detection model, lambda represents a balance weight parameter between the target classification task and a target position regression task, NregRepresents the total number of positive sample anchor boxes in the target location regression task,
Figure BDA0002113830500000033
indicates the offset, t, of the ith anchor frame relative to the true target frameiIndicating the offset of the ith anchor box relative to the predicted target bounding box,
Figure BDA0002113830500000034
i represents the index of an anchor frame, the value range of the index is from 1 to M, and M is the total number of the anchor frames;
(4) training a multiscale and context feature enhanced RetinaNet detection model constructed in the step (2):
(4a) setting the learning rate to be 0.00001, setting Adam by an optimizer, setting the number of training steps to be 2000, setting the number of training rounds to be 100, and using classification model parameters obtained by backbone network ResNet-101 pre-training on an ImageNet data set as initialization parameters of a RetinaNet detection model with multi-scale and context feature enhancement;
(4b) inputting the training samples obtained in the step (1) into a multiscale and context feature enhanced RetinaNet detection model, optimizing the overall loss function L in the step (3c) by using an optimizer Adam, updating weight parameters, and obtaining the multiscale and context feature enhanced RetinaNet detection model containing the weight parameters when the number of training rounds reaches 100;
(5) and inputting the test sample into a multiscale and context feature enhanced RetinaNet detection model containing weight parameters, and predicting and outputting the position of a target boundary box, the target category and the confidence score of the target in the test sample.
Compared with the prior art, the invention has the following advantages:
firstly, in the prior art, a multi-scale feature enhancement module is introduced, which considers the semantic information of the high-level feature map to be efficiently utilized and guides the high-level feature map and the low-level feature map to be fused, so that the low-level feature map has rich semantic information on the premise of keeping the resolution unchanged, the expression of the low-level feature map is enhanced, and the classification confidence of the target is improved.
Secondly, the invention considers the utilization of the global context feature information, introduces a context feature enhancement module, effectively utilizes the complex characteristic of the remote sensing image scene, establishes the relation between the current position and other positions from the feature level, and obtains the global context feature of the remote sensing image scene, thereby improving the target detection precision.
Drawings
FIG. 1 is a flow chart of an implementation of the present invention;
FIG. 2 is an image of simulation results of a baseball field test using the present invention and a reference method;
FIG. 3 is a simulation result image of a bridge being tested using the present invention and the baseline method;
FIG. 4 is an image of a simulation result of an aircraft being tested using the present invention and a baseline method.
Detailed Description
The embodiments and effects of the present invention will be described in further detail below with reference to the accompanying drawings.
Referring to fig. 1, the implementation steps of this embodiment are as follows:
step 1, obtaining a training sample and a testing sample.
The method comprises the steps of obtaining an open remote sensing image target detection data set NWPU VHR-10-v2, wherein the data set comprises 1172 remote sensing images with the size of 400 x 400 pixels and corresponding labeled target category and target position data on the remote sensing images, in the embodiment, 75% of data in the remote sensing image target detection data set are used as training samples, the rest 25% of data are used as test samples, namely 879 sample images in the remote sensing image target detection data set are used as the training samples, and the rest 293 images are used as the test samples.
And 2, constructing a multi-scale and context feature enhanced RetinaNet detection model.
2.1) obtaining 3 convolution characteristic graphs C3, C4 and C5 from a backbone network of a RetinaNet detection model:
the backbone network of the RetinaNet detection model comprises ResNet-50, ResNet-101 and ResNet-152, in the embodiment, the backbone network ResNet-101 is used, namely 3 convolution characteristic graphs C3, C4 and C5 are obtained from the backbone network ResNet-101 of the RetinaNet detection model;
2.2) obtaining 4 pyramid feature maps P3, P4, P5 and P6 from the feature pyramid network of the RetinaNet detection model;
2.3) constructing a multi-scale feature enhancement module consisting of 7 feature maps:
2.3.1) constructing 2 feature maps, wherein the first is a high-level feature map T1 and the second is a low-level feature map T2;
2.3.2) takes 2 branch operations in parallel on the first high-level feature map T1:
sequentially passing the first branch through a global average pooling layer, a dimension conversion layer, a 1 × 1 convolutional layer with a first step length of 1 and a first up-sampling layer to obtain a low-level feature map T3 containing global context information;
the second branch passes through a second 1 × 1 convolutional layer with the step size of 1 and a second up-sampling layer in sequence to obtain an up-sampled low-level characteristic diagram T4;
2.3.3) inputting the second low-level feature map T2 into the 3 × 3 convolutional layer with the step length of 1, and outputting to obtain a low-level feature map T5 after channel conversion;
2.3.4) inputting the low-level feature map T3 containing the global context information and the low-level feature map T5 after channel transformation into a fusion multiplication layer to obtain a fusion multiplied low-level feature map T6;
2.3.5) inputting the fused multiplied low-level feature map T6 and the up-sampled low-level feature map T4 into a fused addition layer to obtain a multi-scale enhanced feature map T7;
2.4) taking the 3 convolution feature maps C3, C4, C5 and the fourth pyramid feature map P6 as the input of the multi-scale feature enhancement module to obtain 3 fused multi-scale enhanced pyramid feature maps F3, F4 and F5:
2.4.1) inputting the second convolution feature map C4 as a high-level feature map T2 in the multi-scale feature enhancement module, inputting the first convolution feature map C3 as a low-level feature map T1 in the multi-scale enhancement module, and outputting to obtain a multi-scale enhanced first feature map E3;
2.4.2) adding and fusing the multi-scale enhanced first feature map E3 and the first pyramid feature map P3 to obtain a fused multi-scale enhanced first pyramid feature map F3;
2.4.3) inputting the third convolution feature map C5 as a high-level feature map T2 in the multi-scale feature enhancement module, inputting the second convolution feature map C4 as a low-level feature map T1 in the multi-scale feature enhancement module, and outputting to obtain a second feature map E4 after multi-scale enhancement;
2.4.4) adding and fusing the multi-scale enhanced second feature map E4 and the second pyramid feature map P4 to obtain a fused multi-scale enhanced second pyramid feature map F4;
2.4.5) inputting the fourth pyramid feature map P6 as a high-level feature map T2 in the multi-scale feature enhancement module, inputting the third convolution feature map C5 as a low-level feature map T1 in the multi-scale feature enhancement module, and outputting to obtain a multi-scale enhanced third feature map E5;
2.4.6) adding and fusing the multi-scale enhanced third feature map E5 and the third pyramid feature map P5 to obtain a fused multi-scale enhanced third pyramid feature map F5;
2.5) constructing a context feature enhancement module consisting of 5 feature maps:
2.5.1) constructing a fused multi-scale enhanced pyramid feature map S1, and sequentially passing the pyramid feature map through a 1 × 1 convolution layer with a first step size of 1 and a softmax layer to obtain an activated pyramid feature map S2;
2.5.2) inputting the activated pyramid feature map S2 and the fused multi-scale enhanced pyramid feature map S1 into a first fusion multiplication layer to obtain a pyramid feature map S3 after fusion multiplication;
2.5.3) sequentially passing the pyramid feature map S3 after fusion multiplication through a 1 × 1 convolution layer with a second step size of 1, a modified linear unit layer and a 1 × 1 convolution layer with a third step size of 1 to obtain a modified and fused pyramid feature map S4;
2.5.4) inputting the modified fused pyramid feature map S4 and the fused multi-scale enhanced pyramid feature map S1 into a second fusion multiplication layer to obtain a fused context feature enhanced pyramid feature map S5;
2.6) using the 3 pyramid feature maps F3, F4 and F5 after the fusion multi-scale enhancement as the input of the context feature enhancement module to obtain 3 pyramid feature maps G3, G4 and G5 after the fusion multi-scale context feature enhancement:
2.6.1) inputting the fused multi-scale enhanced first pyramid feature map F3 as a feature map S1 of a context feature enhancement module to obtain a fused context feature enhanced first pyramid feature map G3;
2.6.2) inputting the fused multi-scale enhanced second pyramid feature map F4 as a feature map S1 of a context feature enhancement module to obtain a fused context feature enhanced second pyramid feature map G4;
2.6.3) inputting the fused multi-scale enhanced third pyramid feature map F5 as the feature map S1 of the context feature enhancement module to obtain a fused context feature enhanced third pyramid feature map G5.
And 3, setting an integral loss function L of a target classification and target position regression task in the constructed multi-scale and context feature enhanced RetinaNet detection model.
3.1) setting the existing Focal local function as a Loss function of a target classification task in a multiscale and context feature enhanced RetinaNet detection model, and using LclsExpressed as:
Lcls=FL(pi),
wherein, FL (p)i)=-α(1-pi)γ×log(pi) Representing the focus loss function, alpha representing the equilibrium parameter of the positive and negative samples, gamma representing the concentration parameter, piThe probability that the ith anchor frame is a prediction target is represented, i represents the index of the anchor frame, the value range of i is from 1 to M, and M is the total number of the anchor frames;
in this example, α is set to 0.25 and γ is set to 2.0;
3.2) setting the existing Smooth L1Loss function as a Loss function of a target position regression task in a multiscale and context feature enhanced RetinaNet detection model, and using LregExpressed as:
Lreg=SmoothL1(x),
wherein, SmoothL1(x) Representing a smoothed L1 squared loss function,
Figure BDA0002113830500000061
Figure BDA0002113830500000062
represents the offset t of the ith anchor frame relative to the predicted target frameiOffset of ith anchor frame relative to real target frame
Figure BDA0002113830500000063
A difference of (d);
3.3) loss function L of the task of classification by targetclsAnd the loss function L of the target position regression taskregSetting the overall loss function L of the multi-scale context feature enhanced RetinaNet detection model as follows:
L=L({pi},{ti}),
wherein,
Figure BDA0002113830500000071
loss function for the target detection task and the target position regression task as a whole, NclsRepresenting the total number of positive sample anchor boxes in the target classification task,
Figure BDA0002113830500000072
representing the probability that the ith anchor box is a true target,
Figure BDA0002113830500000073
a loss function of a target classification task in a multi-scale and context feature enhanced RetinaNet detection model, lambda represents a balance weight parameter between the target classification task and a target position regression task, NregRepresents the total number of positive sample anchor boxes in the target location regression task,
Figure BDA0002113830500000074
indicates the offset, t, of the ith anchor frame relative to the true target frameiIndicating the offset of the ith anchor box relative to the predicted target bounding box,
Figure BDA0002113830500000075
a loss function of a target position regression task in a multiscale and context feature enhanced RetinaNet detection model;
in this embodiment, λ is 1.
And 4, training the multiscale and context feature enhanced RetinaNet detection model constructed in the step 2.
4.1) setting training parameters:
in this embodiment, the learning rate is set to 0.00001, Adam is used by the optimizer, the number of training steps is set to 2000, the number of training rounds is set to 100, and classification model parameters obtained by using backbone network ResNet-101 pre-training are used on the ImageNet data set as initialization parameters of a multiscale and context feature enhanced retannet detection model;
4.2) inputting the training samples in the step 1 into a multi-scale and context feature enhanced RetinaNet detection model, optimizing the overall loss function L in the step 3 by using an optimizer Adam, updating the weight parameters, and obtaining the multi-scale and context feature enhanced RetinaNet detection model containing the weight parameters when the number of training rounds reaches 100.
And 5, inputting the test sample in the step 1 into a multiscale and context feature enhanced RetinaNet detection model containing weight parameters, and predicting and outputting the position of a target boundary frame, the target type and the confidence score of the target in the test sample image.
The effect of the invention can be further illustrated by the following simulation experiment:
simulation conditions and contents
The simulation adopts a public NWPU VHR-10-v2 data set widely applied to performance evaluation of a remote sensing image target detection algorithm to train and test a RetinaNet detection model with multi-scale and context feature enhancement, and the adopted benchmark method is the RetinaNet detection model.
Let the NWPU VHR-10-v2 dataset include 10 object classes, respectively: airplanes, ships, oil storage tanks, baseball fields, basketball fields, tennis courts, playgrounds, ports, vehicles, and bridges.
The processor used for simulation is
Figure BDA0002113830500000081
Xeon (R) CPU E5-2630v4@2.20GHz x 40, memory 64.00GB, GPU 8G GeForce GTX1080, simulation platform Ubuntu16.04 operating system, Keras deep learning framework and Python language.
Second, simulation content
Simulation 1: the detection simulation of the baseball field using the present invention and the existing reference method has the result shown in fig. 2, and as can be seen from fig. 2, the classification confidence score of the baseball field of the reference method is 0.929, as shown in fig. 2(a), the classification confidence score of the baseball field of the present invention reaches 1.000, as shown in fig. 2(b), compared with the reference method, the classification performance of the baseball field of the present invention is relatively obviously improved.
Simulation 2: the bridge detection simulation is carried out by using the method and the existing benchmark method, the result is shown in fig. 3, the classification confidence scores of 2 bridges in the benchmark method are respectively 0.660 and 0.850, as shown in fig. 3(a), the classification confidence scores of 2 bridges in the method respectively reach 0.974 and 0.927, as shown in fig. 3(b), compared with the benchmark method, the method has obvious improvement on the classification confidence scores of the bridges, and the method is mainly characterized in that the expression of the context characteristics is enhanced by introducing a context characteristic enhancement module due to strong dependence of the bridges on context information of a scene.
Simulation 3: the results of the detection simulation of 5 airplanes by using the present invention and the existing benchmark method are shown in fig. 4, and it can be seen from fig. 4 that the classification confidence scores of 5 airplanes in the benchmark method are all 1.000, as shown in fig. 4(a), and the classification confidence scores of 5 airplanes in the present invention are all 1.000, as shown in fig. 4(b), which indicates that the benchmark method and the present invention have good performance for airplane classification.
Third, comparing and analyzing simulation experiment results
To verify the effectiveness of the present invention, 3 existing methods were set up, of which: the existing method 1 is a RetinaNet detection model; the existing method 2 is a remote sensing image target detection model with rotation insensitivity and context enhancement; the existing method 3 is a remote sensing image target detection model with multi-model decision fusion.
The mean average precision is used as an evaluation index when all target types are detected, the average precision is used as an evaluation index when a single-type target is detected, the target on the NWPU VHR-10-v2 test data set is subjected to detection simulation by using the method and 3 existing methods, and the numerical results of the detected evaluation indexes are compared, as shown in Table 1.
TABLE 1 comparison of evaluation index values measured by the present invention and 3 conventional methods
Figure BDA0002113830500000091
In table 1, the comparison of the evaluation index numerical results detected by the present invention and 3 existing methods, the results of the average precision of the multi-target detection and the average precision of each category are both decimal numbers, and bold represents the highest average precision of the detection of the category target in the above four methods.
According to table 1, the following 3 conclusions are obtained in comparison of the evaluation index numerical results detected by the invention and 3 existing methods:
1) the average precision of the mean value of the existing method 1 is 0.9150, the average precision of the mean value of the invention is 0.9551, and the average precision of the mean value of the invention is improved by 0.0401 compared with the average precision of the mean value of the existing method 1;
2) the average accuracy of 6 types of targets is higher than that of 8 types of targets in the prior method 1, particularly for bridges and basketball courts, the average accuracy is obviously improved, mainly because the bridges and the basketball courts have stronger dependence on context information, the introduced context characteristic enhancement module enhances the expression of context characteristics, the average accuracy of ship detection is also improved, and mainly because the scale change of ships is large, the introduced multi-scale characteristic enhancement module enhances the expression of multi-scale characteristics of the targets;
3) for the existing method 2 and the existing method 3, both belong to two-step target detection models, and the invention belongs to a single-step target detection model, generally, the average precision of the mean value of the two-step target detection model is higher than that of the single-step target detection model, and the comparison of the evaluation index numerical results of the detection shows that the average precision of the mean value of the invention is higher than that of the existing method 2 and the existing method 3.
In summary, the invention introduces a multi-scale feature enhancement module on the basis of the existing RetinaNet detection model, guides semantic information on a high-level feature map to a low-level feature map, enriches the semantic information of the low-level feature map, further introduces a context feature enhancement module, and finally applies the RetinaNet detection model introduced with the multi-scale and context feature enhancement module to target detection, outputs a detection result, and improves the precision of remote sensing image target detection.
The foregoing description is only an example of the present invention and is not intended to limit the invention, so that it will be apparent to those skilled in the art that various changes and modifications in form and detail may be made therein without departing from the spirit and scope of the invention.

Claims (7)

1. A remote sensing image target detection method based on depth multi-scale and context learning is characterized by comprising the following steps:
(1) taking 75% of the remote sensing image target detection data set as a training sample, and taking the remaining 25% as a test sample;
(2) constructing a multiscale and context feature enhanced RetinaNet detection model:
(2a) obtaining 3 convolution characteristic graphs C3, C4 and C5 from a backbone network ResNet-101 of a RetinaNet detection model;
(2b) obtaining 4 pyramid feature maps P3, P4, P5 and P6 from a feature pyramid network of a RetinaNet detection model;
(2c) constructing a multi-scale feature enhancement module consisting of 7 feature maps;
(2d) taking the 3 convolution feature maps C3, C4, C5 and the fourth pyramid feature map P6 as the input of the multi-scale feature enhancement module to obtain 3 pyramid feature maps F3, F4 and F5 after fusion multi-scale enhancement;
(2e) constructing a context feature enhancement module consisting of 5 feature graphs;
(2f) taking the 3 fused multi-scale enhanced pyramid feature maps F3, F4 and F5 as the input of the context feature enhancement module to obtain 3 fused multi-scale context feature enhanced pyramid feature maps G3, G4 and G5;
(3) setting an integral loss function L of a target classification and target position regression task in a multiscale and context feature enhanced RetinaNet detection model:
(3a) setting the existing FocalLoss function as a loss function of a target classification task in a multiscale and context feature enhanced RetinaNet detection model, and using LclsRepresents;
(3b) setting the existing Smooth L1Loss function as a Loss function of a target position regression task in a multiscale and context feature enhanced RetinaNet detection model, and using LregRepresents:
(3c) loss function L of task classified by targetclsAnd the loss function L of the target position regression taskregSetting the overall loss function L of the multi-scale context feature enhanced RetinaNet detection model as follows:
L=L({pi},{ti}),
wherein,
Figure FDA0003074673130000011
loss function for the target detection task and the target position regression task as a whole, NclsRepresents the total number of positive sample anchor boxes, p, in the target classification taskiRepresenting the probability that the ith anchor box is the predicted target,
Figure FDA0003074673130000012
representing the probability that the ith anchor box is a true target,
Figure FDA0003074673130000013
a loss function of a target classification task in a multi-scale and context feature enhanced RetinaNet detection model, lambda represents a balance weight parameter between the target classification task and a target position regression task, NregRepresents the total number of positive sample anchor boxes in the target location regression task,
Figure FDA0003074673130000021
indicating the ith anchor frame relative toOffset of real target frame, tiIndicating the offset of the ith anchor box relative to the predicted target bounding box,
Figure FDA0003074673130000022
i represents the index of an anchor frame, the value range of the index is from 1 to M, and M is the total number of the anchor frames;
(4) training a multiscale and context feature enhanced RetinaNet detection model constructed in the step (2):
(4a) setting the learning rate to be 0.00001, setting Adam by an optimizer, setting the number of training steps to be 2000, setting the number of training rounds to be 100, and using classification model parameters obtained by backbone network ResNet-101 pre-training on an ImageNet data set as initialization parameters of a RetinaNet detection model with multi-scale and context feature enhancement;
(4b) inputting the training samples obtained in the step (1) into a multiscale and context feature enhanced RetinaNet detection model, optimizing the overall loss function L in the step (3c) by using an optimizer Adam, updating weight parameters, and obtaining the multiscale and context feature enhanced RetinaNet detection model containing the weight parameters when the number of training rounds reaches 100;
(5) and inputting the test sample into a multiscale and context feature enhanced RetinaNet detection model containing weight parameters, and predicting and outputting the position of a target boundary box, the target category and the confidence score of the target in the test sample.
2. The method of claim 1, wherein (2c) constructs a multi-scale feature enhancement module consisting of 7 feature maps, which is implemented as follows:
(2c1) constructing 2 feature maps, wherein the first is a high-level feature map T1 and the second is a low-level feature map T2;
(2c2) take 2 branch operations side by side on the first high level feature graph:
sequentially passing the first branch through a global average pooling layer, a dimension conversion layer, a first 1 × 1 convolutional layer with the step length of 1 and a first up-sampling layer to obtain a low-level feature map T3 containing global context information;
the second branch passes through a second 1 × 1 convolutional layer with the step size of 1 and a second up-sampling layer in sequence to obtain an up-sampled low-level characteristic diagram T4;
(2c3) inputting the second low-level feature map T2 into the 3 × 3 convolutional layer with step length of 1, and outputting to obtain a low-level feature map T5 after channel conversion;
(2c4) inputting the low-level feature map T3 containing global context information and the channel-transformed low-level feature map T5 into a fusion multiplication layer to obtain a fusion-multiplied low-level feature map T6,
(2c5) and inputting the fused and multiplied low-level feature map T6 and the up-sampled low-level feature map T4 into a fused addition layer to obtain a multi-scale enhanced feature map T7.
3. The method of claim 1 or 2, wherein (2d) 3 convolution feature maps C3, C4, C5 and a fourth pyramid feature map P6 are used as input to the multi-scale feature enhancement module to obtain 3 fused multi-scale enhanced pyramid feature maps F3, F4 and F5, which are implemented as follows:
(2d1) inputting the second convolution feature map C4 as a high-level feature map T2 in the multi-scale feature enhancement module, inputting the first convolution feature map C3 as a low-level feature map T1 in the multi-scale enhancement module, and outputting to obtain a multi-scale enhanced first feature map E3;
(2d2) adding and fusing the multi-scale enhanced first feature map E3 and the first pyramid feature map P3 to obtain a fused multi-scale enhanced first pyramid feature map F3;
(2d3) inputting a third convolution feature map C5 as a high-level feature map T2 in the multi-scale feature enhancement module, inputting a second convolution feature map C4 as a low-level feature map T1 in the multi-scale feature enhancement module, and outputting to obtain a second feature map E4 after multi-scale enhancement;
(2d4) adding and fusing the multi-scale enhanced second feature map E4 and the second pyramid feature map P4 to obtain a fused multi-scale enhanced second pyramid feature map F4;
(2d5) inputting the fourth pyramid feature map P6 as a high-level feature map T2 in the multi-scale feature enhancement module, inputting the third convolution feature map C5 as a low-level feature map T1 in the multi-scale feature enhancement module, and outputting to obtain a multi-scale enhanced third feature map E5;
(2d6) and adding and fusing the multi-scale enhanced third feature map E5 and the third pyramid feature map P5 to obtain a fused multi-scale enhanced third pyramid feature map F5.
4. The method of claim 1, wherein (2e) constructs a context feature enhancement module consisting of 5 feature maps, which is implemented as follows:
(2e1) constructing a fused multi-scale enhanced pyramid feature map S1, and sequentially passing the pyramid feature map through a 1 × 1 convolution layer with a first step size of 1 and a softmax layer to obtain an activated pyramid feature map S2;
(2e2) inputting the activated pyramid feature map S2 and the fused multi-scale enhanced pyramid feature map S1 into a first fusion multiplication layer to obtain a pyramid feature map S3 after fusion multiplication;
(2e3) sequentially passing the fused and multiplied pyramid feature map S3 through a second 1 × 1 convolution layer with the step size of 1, a modified linear unit layer and a third 1 × 1 convolution layer with the step size of 1 to obtain a modified and fused pyramid feature map S4;
(2e4) inputting the modified fused pyramid feature map S4 and the fused multi-scale enhanced pyramid feature map S1 into a second fusion multiplication layer to obtain a fused context feature enhanced pyramid feature map S5.
5. The method of claim 1, wherein 3 fused multi-scale enhanced pyramid feature maps F3, F4, and F5 are used as input of the context feature enhancement module in (2F), resulting in 3 fused multi-scale context feature enhanced pyramid feature maps G3, G4, and G5, which are implemented as follows:
(2f1) inputting the fused multi-scale enhanced first pyramid feature map F3 as a feature map S1 of a context feature enhancement module to obtain a fused context feature enhanced first pyramid feature map G3;
(2f2) inputting the second pyramid feature map F4 subjected to fusion multi-scale enhancement as a feature map S1 of a context feature enhancement module to obtain a second pyramid feature map G4 subjected to fusion context feature enhancement;
(2f3) and inputting the third pyramid feature map F5 subjected to fusion multi-scale enhancement as a feature map S1 of a context feature enhancement module to obtain a third pyramid feature map G5 subjected to fusion context feature enhancement.
6. The method according to claim 1, wherein (3a) the existing Focal local function is set as a Loss function L of a target classification task in a multiscale and context feature enhanced RetinaNet detection modelclsIt is expressed as follows:
Lcls=FL(pi,pi *),
wherein, FL (p)i,pi *)=-pi *×α(1-pi)γ×log(pi) Representing the focus loss function, alpha representing the equilibrium parameter of the positive and negative samples, gamma representing the concentration parameter, piRepresenting the probability that the ith anchor box is the predicted target.
7. The method according to claim 1, wherein (3b) the existing Smooth L1Loss function is set as the Loss function L of the target position regression task in the multiscale and context feature enhanced RetinaNet detection modelregIt is expressed as follows:
Lreg=SmoothL1(ti,ti *),
wherein, SmoothL1(ti,ti *) Representing a smoothed L1 squared loss function,
Figure FDA0003074673130000041
CN201910583811.1A 2019-07-01 2019-07-01 Remote sensing image target detection method based on depth multi-scale and context learning Active CN110287927B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910583811.1A CN110287927B (en) 2019-07-01 2019-07-01 Remote sensing image target detection method based on depth multi-scale and context learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910583811.1A CN110287927B (en) 2019-07-01 2019-07-01 Remote sensing image target detection method based on depth multi-scale and context learning

Publications (2)

Publication Number Publication Date
CN110287927A CN110287927A (en) 2019-09-27
CN110287927B true CN110287927B (en) 2021-07-27

Family

ID=68021357

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910583811.1A Active CN110287927B (en) 2019-07-01 2019-07-01 Remote sensing image target detection method based on depth multi-scale and context learning

Country Status (1)

Country Link
CN (1) CN110287927B (en)

Families Citing this family (27)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110782484A (en) * 2019-10-25 2020-02-11 上海浦东临港智慧城市发展中心 Unmanned aerial vehicle video personnel identification and tracking method
CN110991359A (en) * 2019-12-06 2020-04-10 重庆市地理信息和遥感应用中心(重庆市测绘产品质量检验测试中心) Satellite image target detection method based on multi-scale depth convolution neural network
CN111160249A (en) * 2019-12-30 2020-05-15 西北工业大学深圳研究院 Multi-class target detection method of optical remote sensing image based on cross-scale feature fusion
CN111414931B (en) * 2019-12-31 2023-04-25 杭州电子科技大学 Multi-branch multi-scale small target detection method based on image depth
CN111242071B (en) * 2020-01-17 2023-04-07 陕西师范大学 Attention remote sensing image target detection method based on anchor frame
CN111310611B (en) * 2020-01-22 2023-06-06 上海交通大学 Method for detecting cell view map and storage medium
CN111274981B (en) * 2020-02-03 2021-10-08 中国人民解放军国防科技大学 Target detection network construction method and device and target detection method
CN111325116A (en) * 2020-02-05 2020-06-23 武汉大学 Remote sensing image target detection method capable of evolving based on offline training-online learning depth
CN111414880B (en) * 2020-03-26 2022-10-14 电子科技大学 Method for detecting target of active component in microscopic image based on improved RetinaNet
CN111553303B (en) * 2020-05-07 2024-03-29 武汉大势智慧科技有限公司 Remote sensing orthographic image dense building extraction method based on convolutional neural network
CN111833321B (en) * 2020-07-07 2023-10-20 杭州电子科技大学 Intracranial hemorrhage detection model with window adjusting optimization enhancement and construction method thereof
CN112053342A (en) * 2020-09-02 2020-12-08 陈燕铭 Method and device for extracting and identifying pituitary magnetic resonance image based on artificial intelligence
CN112200045B (en) * 2020-09-30 2024-03-19 华中科技大学 Remote sensing image target detection model establishment method based on context enhancement and application
CN112183435B (en) * 2020-10-12 2024-08-06 河南威虎智能科技有限公司 Two-stage hand target detection method
CN112287983B (en) * 2020-10-15 2023-10-10 西安电子科技大学 Remote sensing image target extraction system and method based on deep learning
CN112464743B (en) * 2020-11-09 2023-06-02 西北工业大学 Small sample target detection method based on multi-scale feature weighting
CN112418108B (en) * 2020-11-25 2022-04-26 西北工业大学深圳研究院 Remote sensing image multi-class target detection method based on sample reweighing
CN112560956A (en) * 2020-12-16 2021-03-26 珠海格力智能装备有限公司 Target detection method and device, nonvolatile storage medium and electronic equipment
CN112634174B (en) * 2020-12-31 2023-12-12 上海明略人工智能(集团)有限公司 Image representation learning method and system
CN113128564B (en) * 2021-03-23 2022-03-22 武汉泰沃滋信息技术有限公司 Typical target detection method and system based on deep learning under complex background
CN113536986B (en) * 2021-06-29 2024-06-14 南京逸智网络空间技术创新研究院有限公司 Dense target detection method in remote sensing image based on representative features
CN113469088B (en) * 2021-07-08 2023-05-12 西安电子科技大学 SAR image ship target detection method and system under passive interference scene
CN114170590A (en) * 2021-10-18 2022-03-11 中科南京人工智能创新研究院 RetinaNet network improvement-based new energy license plate detection and identification method
CN114998603B (en) * 2022-03-15 2024-08-16 燕山大学 Underwater target detection method based on depth multi-scale feature factor fusion
CN115937698A (en) * 2022-09-29 2023-04-07 华中师范大学 Self-adaptive tailing pond remote sensing deep learning detection method
CN116310850B (en) * 2023-05-25 2023-08-15 南京信息工程大学 Remote sensing image target detection method based on improved RetinaNet
CN117612029B (en) * 2023-12-21 2024-05-24 石家庄铁道大学 Remote sensing image target detection method based on progressive feature smoothing and scale adaptive expansion convolution

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108304873A (en) * 2018-01-30 2018-07-20 深圳市国脉畅行科技股份有限公司 Object detection method based on high-resolution optical satellite remote-sensing image and its system
CN109117876A (en) * 2018-07-26 2019-01-01 成都快眼科技有限公司 A kind of dense small target deteection model building method, model and detection method
CN109344821A (en) * 2018-08-30 2019-02-15 西安电子科技大学 Small target detecting method based on Fusion Features and deep learning
CN109583340A (en) * 2018-11-15 2019-04-05 中山大学 A kind of video object detection method based on deep learning

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9858479B2 (en) * 2014-07-25 2018-01-02 Digitalglobe, Inc. Global-scale damage detection using satellite imagery
US11587304B2 (en) * 2017-03-10 2023-02-21 Tusimple, Inc. System and method for occluding contour detection

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108304873A (en) * 2018-01-30 2018-07-20 深圳市国脉畅行科技股份有限公司 Object detection method based on high-resolution optical satellite remote-sensing image and its system
CN109117876A (en) * 2018-07-26 2019-01-01 成都快眼科技有限公司 A kind of dense small target deteection model building method, model and detection method
CN109344821A (en) * 2018-08-30 2019-02-15 西安电子科技大学 Small target detecting method based on Fusion Features and deep learning
CN109583340A (en) * 2018-11-15 2019-04-05 中山大学 A kind of video object detection method based on deep learning

Also Published As

Publication number Publication date
CN110287927A (en) 2019-09-27

Similar Documents

Publication Publication Date Title
CN110287927B (en) Remote sensing image target detection method based on depth multi-scale and context learning
CN112634292B (en) Asphalt pavement crack image segmentation method based on deep convolutional neural network
CN109117883B (en) SAR image sea ice classification method and system based on long-time memory network
CN110333554B (en) NRIET rainstorm intelligent similarity analysis method
CN109671071B (en) Underground pipeline defect positioning and grade judging method based on deep learning
CN107506729B (en) Visibility detection method based on deep learning
CN111507371B (en) Method and device for automatically evaluating reliability of label on training image
CN116612120B (en) Two-stage road defect detection method for data unbalance
CN110659601B (en) Depth full convolution network remote sensing image dense vehicle detection method based on central point
Courtial et al. Constraint-based evaluation of map images generalized by deep learning
CN113487600B (en) Feature enhancement scale self-adaptive perception ship detection method
CN114998603A (en) Underwater target detection method based on depth multi-scale feature factor fusion
CN115546485A (en) Construction method of layered self-attention field Jing Yuyi segmentation model
CN116977710A (en) Remote sensing image long tail distribution target semi-supervised detection method
CN115587964A (en) Entropy screening-based pseudo label cross consistency change detection method
CN117217368A (en) Training method, device, equipment, medium and program product of prediction model
CN114331950A (en) SAR image ship detection method based on dense connection sparse activation network
CN115546553A (en) Zero sample classification method based on dynamic feature extraction and attribute correction
CN115439654A (en) Method and system for finely dividing weakly supervised farmland plots under dynamic constraint
CN105787045B (en) A kind of precision Enhancement Method for visual media semantic indexing
CN112579583B (en) Evidence and statement combined extraction method for fact detection
CN113128559A (en) Remote sensing image target detection method based on cross-scale feature fusion pyramid network
CN117351440A (en) Semi-supervised ship detection method and system based on open text detection
CN115661542A (en) Small sample target detection method based on feature relation migration
CN115471456A (en) Aircraft landing gear detection method based on improved yolov5

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant