CN108764063B - Remote sensing image time-sensitive target identification system and method based on characteristic pyramid - Google Patents

Remote sensing image time-sensitive target identification system and method based on characteristic pyramid Download PDF

Info

Publication number
CN108764063B
CN108764063B CN201810427107.2A CN201810427107A CN108764063B CN 108764063 B CN108764063 B CN 108764063B CN 201810427107 A CN201810427107 A CN 201810427107A CN 108764063 B CN108764063 B CN 108764063B
Authority
CN
China
Prior art keywords
layer
feature
sub
feature layer
hierarchical
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810427107.2A
Other languages
Chinese (zh)
Other versions
CN108764063A (en
Inventor
杨卫东
金俊波
王祯瑞
习思
黄竞辉
钟胜
陈俊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Avic Tianhai Wuhan Technology Co ltd
Original Assignee
Huazhong University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huazhong University of Science and Technology filed Critical Huazhong University of Science and Technology
Priority to CN201810427107.2A priority Critical patent/CN108764063B/en
Publication of CN108764063A publication Critical patent/CN108764063A/en
Application granted granted Critical
Publication of CN108764063B publication Critical patent/CN108764063B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/10Terrestrial scenes
    • G06V20/13Satellite images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Astronomy & Astrophysics (AREA)
  • Remote Sensing (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a remote sensing image time-sensitive target identification system and method based on a feature pyramid. The target feature extraction sub-network is used for carrying out multi-level convolution processing on the image to be processed and outputting the convolution processing result of each level as a feature layer. And the feature layer sub-network superposes the previous feature layer and the current feature layer to obtain a current fused feature layer, wherein the topmost fused feature layer is the topmost feature layer. The candidate region generation sub-network is used for extracting candidate regions from different levels of fusion feature layers, the classification regression sub-network maps the candidate regions to the different levels of fusion feature layers to obtain a plurality of levels of fusion feature layers after mapping processing, and target judgment is carried out on the plurality of levels of fusion feature layers after mapping processing to output results. And by utilizing the hierarchical structure of the characteristic pyramid, the characteristics of all scales have rich semantic information.

Description

Remote sensing image time-sensitive target identification system and method based on characteristic pyramid
Technical Field
The invention belongs to the field of remote sensing image processing, and particularly relates to a remote sensing image time-sensitive target identification system and method based on a feature pyramid.
Background
The detection of large-format remote sensing image moving targets is an important component in the field of remote sensing image analysis and is extremely challenging due to the characteristics of multiple scales, wide format, special visual angle and the like, and the basic task of the detection is to determine whether one or more category target objects are contained in a given aviation or satellite image and the accurate position of each category target object. The time-sensitive target detection of the large-breadth remote sensing image is used as an important component in the fields of computer vision and remote sensing image analysis, and has high research significance. With the rapid development of the current remote sensing technology, the sensor technology and the internet technology, people can utilize the remote sensing technology to realize large-format ground area or ocean area detection, so that the data obtained by remote sensing is more and more comprehensive, and the data also comprises information in various aspects such as geography, humanity and the like. The remote sensing technology has wide application in the aspects of ground aircraft target and marine vessel target searching and salvation, illegal immigration, guard soil taking, environment monitoring, military information collection, ground resource investigation and the like.
Object detection has two basic tasks, namely determining whether an object exists in an image and determining the position of the object. Target detection in natural scenes is a core research field of vision and digital image processing, and has been a hot direction for academic research and industrial application in recent years. Since 2012, the convolutional neural network makes a major breakthrough in the field of image classification, the strong feature extraction capability of deep learning also brings possibility for high-precision target detection, the target detection problem is converted into the image classification problem, the deep learning is applied to target detection, the rapid development of the technical field of target detection is promoted, and a target detection method based on the deep learning is also different day by day.
The existing deep learning target detection model only utilizes the previous layer of single-scale features, although semantic information of the previous layer of features is very strong, position information is very weak, the detection effect on multi-scale targets is poor, and meanwhile, the existing network has large scale and many parameters and has high demand on computing resources.
Disclosure of Invention
The invention provides a remote sensing image time-sensitive target identification system and method based on a characteristic pyramid, aiming at the difficult problem of large-format remote sensing image multi-scale target detection. The characteristic pyramid network SEM-FPN based on channel weighting is applied to time-sensitive target detection of remote sensing images, improvement is carried out according to problems existing in practical application of a network model, multi-scale target detection is achieved through an algorithm, and meanwhile the characteristic of light weight is achieved.
As one aspect of the present invention, the present invention provides a remote sensing image time-sensitive target recognition system based on a feature pyramid, including:
the target feature extraction sub-network is provided with a plurality of hierarchical output ends and is used for carrying out multi-level convolution processing on the image to be processed and outputting each hierarchical convolution processing result as a feature layer;
the feature layer sub-network is provided with a plurality of hierarchical input ends and a plurality of hierarchical output ends, one hierarchical input end is connected with one hierarchical output end of the target feature extraction sub-network and is used for superposing the previous feature layer and the current feature layer to obtain a current fusion feature layer, and the topmost fusion feature layer is the topmost feature layer;
the candidate region generation sub-network is provided with a plurality of hierarchical input ends and a plurality of hierarchical output ends, wherein one hierarchical input end is connected with one hierarchical output end of the feature layer sub-network and is used for extracting candidate regions from different hierarchical fusion feature layers;
and the classification regression sub-network is provided with a plurality of hierarchy input ends and RPN input ends, wherein one hierarchy input end is connected with one hierarchy output end of the feature layer sub-network, and the RPN input end is connected with the output end of the candidate region generation sub-network and is used for mapping the candidate regions to different hierarchy fusion feature layers to obtain a plurality of hierarchy fusion feature layers after mapping processing and performing target judgment on the plurality of hierarchy fusion feature layers after mapping processing to output results.
Preferably, the feature layer sub-network comprises a plurality of feature layer sub-modules, which are denoted as a first feature layer sub-module, a second feature layer sub-module, … …, an ith feature layer sub-module, … …, and an nth feature layer sub-module; i is more than or equal to 1 and less than or equal to N-1;
one input end of the ith characteristic layer sub-module is used as a layer input end of the characteristic layer sub-network, the other input end of the ith characteristic layer sub-module is connected with the output end of the (i + 1) th characteristic layer sub-module, and the input end of the Nth characteristic layer sub-module is used as a layer input end of the characteristic layer sub-network;
the first N-1 characteristic layer submodules are used for carrying out superposition processing on the previous characteristic layer and the current characteristic layer to obtain a current fusion characteristic layer; and the Nth characteristic layer submodule is used for outputting the current characteristic layer as a current fusion characteristic layer, wherein the previous characteristic layer is obtained by performing convolution on the current characteristic layer.
Preferably, any one of the first N-1 feature layer sub-modules comprises:
the system comprises a previous layer processing subunit, a current processing subunit and a superposition unit, wherein the output end of the previous layer processing subunit is connected with the first input end of the superposition unit, and the output end of the current processing subunit is connected with the second input end of the superposition unit;
the last layer of processing subunit is used for performing up-sampling processing on the last layer of feature layer and outputting the processed last layer of feature layer, the current processing subunit is used for performing convolution processing on the current feature layer and outputting the processed current feature layer, and the superposition unit is used for performing superposition processing on the processed last layer of feature layer and the processed current feature layer and outputting the current fusion feature layer.
Preferably, the feature layer sub-module further includes an aliasing effect processing unit, an input end of the aliasing effect processing unit is connected to an output end of the superposition unit, and the aliasing effect processing unit is configured to perform convolution processing on the current fusion feature layer and output a final current fusion feature layer.
Preferably, the feature layer sub-network further comprises an N +1 th feature layer sub-module, and the candidate region generation sub-network is provided with an additional hierarchical input end; the input end of the (N + 1) th feature layer submodule is connected with the output end of the Nth feature layer submodule, and the output end of the (N + 1) th feature layer submodule is connected with the additional level input end of the candidate region generation sub-network;
the (N + 1) th feature layer submodule is used for performing up-sampling on the fusion feature layer output by the (N) th feature layer submodule to output an (N + 1) th feature layer, and the candidate region generation sub-network simultaneously extracts the candidate region from the (N + 1) th feature layer.
Preferably, the classification regression sub-network includes a mapping sub-module, a fusion sub-module and a target judgment sub-module, which are connected in sequence, the mapping sub-module is configured to map the candidate region to different levels of fusion feature layers to obtain a plurality of levels of fusion feature layers after mapping, the fusion sub-module is configured to perform fusion processing on the plurality of levels of fusion feature layers after mapping to obtain a target judgment feature layer, and the target judgment sub-module is configured to perform target judgment on the target judgment feature layer to output a target judgment result.
As another aspect of the present invention, the present invention provides a method for identifying a time-sensitive target based on a remote sensing image, including the following steps:
s110, performing multi-level convolution processing on the image to be processed, and taking each level convolution processing result as a level characteristic layer in the pyramid characteristic layer;
s120, overlapping the ith hierarchical feature layer and the (i + 1) th hierarchical feature layer in the pyramid feature layer to obtain an ith hierarchical fusion feature layer; traversing the first N-1 levels in the pyramid feature layer by the step i to obtain N-1 level fusion feature layers; taking the Nth hierarchical feature layer in the pyramid feature layer as an Nth hierarchical fusion feature layer;
s130, extracting candidate regions from the N hierarchical fusion feature layers;
s140 maps the candidate region to N hierarchical fusion feature layers to obtain a plurality of hierarchical fusion feature layers after mapping, and performs target determination on the plurality of hierarchical fusion feature layers after mapping to output a result.
Preferably, the following steps are further included between step S110 and step S120:
performing convolution processing on the ith level feature layer to enable the channel of the ith level feature layer to be the same as the channel of the (i + 1) th level feature layer; performing upsampling processing on the (i + 1) th level feature layer to enable the size of the (i) th level feature layer to be the same as that of the (i + 1) th level feature layer; i is more than or equal to 1 and less than or equal to N-1.
Preferably, step S140 includes the following sub-steps:
step S141: mapping the candidate region to N hierarchical fusion feature layers to obtain N hierarchical fusion feature layers after mapping processing;
step S142: fusing the N levels of fused feature layers subjected to mapping processing to obtain a target judgment feature layer;
step S143: and the target judgment characteristic layer performs target judgment and outputs a result.
Generally, compared with the existing remote sensing image multi-scale target detection technology, the method has the following advantages:
1. the invention provides a remote sensing image time-sensitive target identification method based on a characteristic pyramid network, which is characterized in that the structural idea of the characteristic pyramid is introduced into the target detection of a remote sensing image, the hierarchical structure of the characteristic pyramid of a convolutional neural network is utilized, information among all levels is fused and applied, the top-down side connection is adopted, and high-level semantic information is transmitted downwards, so that the characteristics of all scales have rich semantic information;
2. the invention provides a remote sensing image time-sensitive target identification method based on a feature pyramid network, and provides a multilevel candidate area (RoI) pooling feature map fusion method, which is used for fusing multilevel RoI pooling features.
3. The invention provides a remote sensing image time-sensitive target identification method based on a characteristic pyramid network, which is characterized in that an SE structure is integrated into a target detection network structure unit, the interdependency relation between channels is explicitly modeled, the characteristic response of the channels is adaptively recalibrated, and the characteristic extraction capability is improved.
Drawings
Fig. 1 is a schematic structural diagram of a remote sensing image multi-scale target recognition system based on a feature pyramid according to the present invention;
FIG. 2 is a flowchart of a method for identifying a multi-scale target of a remote sensing image based on a feature pyramid according to the present invention;
FIG. 3 is a schematic diagram of an SE structure in the remote sensing image multi-scale target recognition system provided by the invention;
FIG. 4 is a schematic diagram illustrating multi-scale feature information fusion in a remote sensing image multi-scale target recognition system provided by the present invention;
FIG. 5 is a schematic diagram of a classification regression subnetwork constructed feature pyramid structure in the remote sensing image multi-scale target recognition system provided by the invention;
FIG. 6 is a schematic diagram illustrating the hierarchical region pooling feature fusion in the remote sensing image multi-scale target recognition system provided by the present invention;
FIG. 7 is a schematic view of a data set analysis target scale distribution according to the present invention;
FIG. 8 is a diagram illustrating a detection result of a time-sensitive target in a remote sensing image according to an embodiment of the present invention;
fig. 9(a1) shows Fast RCNN detection results in the recognition scenario, fig. 9(a2) shows R-FCN detection results in the recognition scenario, fig. 9(a3) shows FPN detection results in the recognition scenario, fig. 9(b1) shows Fast RCNN detection results in the recognition scenario two, fig. 9(b2) shows R-FCN detection results in the recognition scenario two, fig. 9(b3) shows FPN detection results in the recognition scenario two, fig. 9(c1) shows Fast RCNN detection results in the recognition scenario three, fig. 9(c2) shows R-FCN detection results in the recognition scenario three, and fig. 9(c3) shows FPN detection results in the recognition scenario three.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.
Example one
The invention provides a remote sensing image time-sensitive target recognition system based on a feature pyramid. The target feature extraction sub-network is provided with a plurality of hierarchical output ends, the classification regression sub-network is provided with a plurality of hierarchical input ends and RPN input ends, the feature layer sub-network and the candidate region generation sub-network are respectively provided with a plurality of hierarchical input ends and a plurality of hierarchical output ends, one hierarchical output end of the target feature extraction sub-network is connected with one hierarchical input end of the feature layer sub-network, one hierarchical output end of the feature layer sub-network is connected with one hierarchical input end of the candidate region generation sub-network, one hierarchical output end of the feature layer sub-network is connected with one hierarchical input end of the classification regression sub-network, and an output end of the candidate region generation sub-network is connected with the.
The target feature extraction sub-network is used for carrying out multi-level convolution processing on the image to be processed and outputting the convolution processing result of each level as a feature layer. And the feature layer sub-network superposes the previous feature layer and the current feature layer to obtain a current fused feature layer, wherein the topmost fused feature layer is the topmost feature layer. The candidate region generation sub-network is used for extracting candidate regions from different levels of fusion feature layers, the classification regression sub-network maps the candidate regions to the different levels of fusion feature layers to obtain a plurality of levels of fusion feature layers after mapping processing, and target judgment is carried out on the plurality of levels of fusion feature layers after mapping processing to output results.
Example two
Based on the first embodiment, the feature layer sub-network includes a plurality of feature layer sub-modules, which are denoted as a first feature layer sub-module, a second feature layer sub-module, … …, an i-th feature layer sub-module, … …, and an N-th feature layer sub-module. Wherein, one input end of the ith characteristic layer sub-module is used as a level input end of the characteristic layer sub-network, the other input end of the ith characteristic layer sub-module is connected with the output end of the (i + 1) th characteristic layer sub-module, the input end of the Nth characteristic layer sub-module is used as a level input end of the characteristic layer sub-network, i is more than or equal to 1 and less than or equal to N-1, and the first N-1 characteristic layer sub-modules are used for carrying out superposition processing on the previous characteristic layer and the current characteristic layer to obtain the current fusion characteristic layer; and the Nth characteristic layer submodule is used for outputting the current characteristic layer as a current fusion characteristic layer, and the previous characteristic layer is obtained by performing convolution on the current characteristic layer.
EXAMPLE III
As shown in fig. 2, based on the second embodiment, any one of the first N-1 feature layer sub-modules includes a previous layer processing sub-unit, a current processing sub-unit, and a superposition unit, an output end of the previous layer processing sub-unit is connected to a first input end of the superposition unit, an output end of the current processing sub-unit is connected to a second input end of the superposition unit, the previous layer processing sub-unit is configured to perform upsampling processing on the previous layer of feature layer to output a processed previous layer of feature layer T2, the current processing sub-unit is configured to perform 1 × 1 convolution processing on the current feature layer to output a processed current feature layer S2, and the superposition unit is configured to perform superposition processing on the processed previous layer of feature layer and the current layer of feature layer to output a current fused feature layer.
The feature layer submodule provided in the embodiment of the present invention performs upsampling processing on the previous feature layer to increase the size of the previous feature layer by one time, performs convolution processing on the current feature layer to make the number of channels of the current feature layer the same as the number of channels of the previous feature layer, and performs superposition processing to obtain the current fused feature layer, where the current fused feature layer has the characteristic of strong feature information of the previous feature layer and the characteristic of strong position information of the current feature layer.
Example four
On the basis of the third embodiment, the feature layer sub-module further includes an aliasing effect processing unit, an input end of the aliasing effect processing unit is connected with an output end of the superposition unit, and the aliasing effect processing unit is used for performing convolution processing on the current fusion feature layer, outputting the final current fusion feature layer, and realizing removal of an aliasing effect of upsampling.
EXAMPLE five
On the basis of any one of the second embodiment to the fifth embodiment, the feature layer sub-network further includes an N +1 th feature layer sub-module, the candidate region generating sub-network is provided with an additional hierarchical input end, an input end of the additional hierarchical input end is connected with an output end of the nth feature layer sub-module, an output end of the additional hierarchical input end is used for being connected with an additional hierarchical input end of the candidate region generating sub-network, the additional hierarchical input end is used for performing up-sampling on the fused feature layer output by the nth feature layer sub-module to output the N +1 th feature layer, and the candidate region generating sub-network simultaneously extracts the candidate region from the N +1 th feature layer.
EXAMPLE six
On the basis of any one of the first to fifth embodiments, the classification regression sub-network includes a mapping sub-module, a fusion sub-module, and a target determination sub-module, which are connected in sequence, where the mapping sub-module is configured to map the candidate region to different hierarchical fusion feature layers to obtain a plurality of hierarchical fusion feature layers after mapping, the fusion sub-module is configured to perform fusion processing on the plurality of hierarchical fusion feature layers after mapping to obtain a target determination feature layer, and the target determination sub-module is configured to perform target determination on the target determination feature layer to output a target determination result.
EXAMPLE seven
As shown in fig. 3, a method for identifying a time-sensitive target of a remote sensing image based on a feature pyramid includes the following steps:
s110, performing multi-level convolution processing on the image to be processed, and taking each level convolution processing result as a level characteristic layer in the pyramid characteristic layer;
s120, overlapping the ith hierarchical feature layer and the (i + 1) th hierarchical feature layer in the pyramid feature layer to obtain an ith hierarchical fusion feature layer; traversing the first N-1 levels in the pyramid feature layer by the step i to obtain N-1 level fusion feature layers; taking the Nth hierarchical feature layer in the pyramid feature layer as an Nth hierarchical fusion feature layer;
s130, extracting candidate regions from the N hierarchical fusion feature layers;
s140 maps the candidate region to N hierarchical fusion feature layers to obtain a plurality of hierarchical fusion feature layers after mapping, and performs target determination on the plurality of hierarchical fusion feature layers after mapping to output a result.
Example eight
On the basis of the sixth embodiment, the following steps are further included between step S110 and step S120:
performing convolution processing on the ith level feature layer to enable the channel of the ith level feature layer to be the same as the channel of the (i + 1) th level feature layer; performing upsampling processing on the (i + 1) th level feature layer to enable the size of the (i) th level feature layer to be the same as that of the (i + 1) th level feature layer; i is more than or equal to 1 and less than or equal to N-1.
Example nine
On the basis of the sixth embodiment or the seventh embodiment, the step S140 includes the following sub-steps:
step S141: mapping the candidate region to N hierarchical fusion feature layers to obtain N hierarchical fusion feature layers after mapping processing;
step S142: fusing the N levels of fused feature layers subjected to mapping processing to obtain a target judgment feature layer;
step S143: and the target judgment characteristic layer performs target judgment and outputs a result.
The invention provides a remote sensing image time-sensitive target identification method based on a feature pyramid network for the first time, which designs a feature pyramid target detection network model based on channel weighting, improves the three-dimensional expression capability and scale adaptability of target features from two dimensions of space and channels, and fuses multi-level regional pooling features in a classification regression network, so that the scale of the network model is greatly reduced, and the lightweight network is realized.
The embodiment of the remote sensing image time-sensitive target identification method based on the characteristic pyramid network provided by the invention comprises the following specific processes:
step S110: extracting target features in images
As shown in FIG. 4, SE-MobileNet is used as a feature extraction network, and the SE structure is added into a MobileNet network structural unit to realize the extraction of target features in an image.
Step S111: squeeze operation
The input U original features are compressed into a real number sequence of 1 × C through global average pooling, U represents the global spatial feature representation of the channel, and the real number sequence represents the channel descriptor, namely the expression of the features in the channel dimension, and the specific formula is as follows:
Figure BDA0001652379560000111
wherein W, H represents the length and width of the feature map, ucA feature map representing the input.
Step S112: excitation operation
The formula for the real number sequence after extrusion operating with Excitation is shown below:
s=Fex(z,W)=σ(g(z,W))=δ(W2σ(W1z))
wherein, delta refers to a ReLU activation function, g refers to a sigmoid activation function,
Figure BDA0001652379560000112
Figure BDA0001652379560000113
two Fully Connected (FC) layers are introduced, namely a dimension reduction layer with parameters W1 and dimension reduction ratio r, then a ReLU is passed, and then a dimension increasing layer with parameters W2. Finally, the real number sequence of 1 × C is combined with the feature graph U to perform Scale operation by the following formula to obtain the final output.
Figure BDA0001652379560000114
Wherein X ═ X1,x2,...,xc]And Fscale(uc,sc) Refers to the feature mapping uc∈RW×HAnd a scalar scCorresponding to the channel product.
Step S120: obtaining different levels of fused feature layers
Step S121: bottom-up path
The feature pyramid is constructed by dividing according to levels, defining feature graphs with the same scale as a pyramid level, and taking the last feature graph of each level as the final output of the pyramid of the level. Using the last feature map of the convolutional layer of each level as output, the output of the pyramid structure is defined with reference to the feature map scale as { C2, C3, C4, C5}, which are from convolutional layers conv2, conv3, conv4, and conv5, respectively, and these feature layers have a step size of {4,8,16,32} pixels with respect to the input image.
Step S122: top-down pyramid network paths and transverse connections
And constructing an information flow channel by connecting the top-down pyramid network paths, and fusing the high-level features and the current features. The specific operation is that the characteristics of the previous layer of the convolutional network are up-sampled and then are fused with the characteristics of the previous layer through transverse connection, the semantic information of the lower layer is also enhanced, and the size of the two characteristic graphs of the transverse connection needs to be the same so as to better utilize the positioning detail information of the current characteristic graph.
The feature T1 of the previous layer is obtained by firstly performing 2 times of upsampling on T1 to obtain a feature map T2, then adjusting the number of channels of the feature of the S1 layer through a convolution operation of 1 multiplied by 1, keeping the number of the channels consistent with that of the feature of the T2 layer, and finally fusing the feature of the S2 layer after the T2 and the number of the channels are adjusted, wherein the fusion mode is to directly perform pixel addition.
Step S130: RPN network constructs characteristic pyramid structure and extracts candidate region
The signature graph input into the RPN network in the fast RCNN is a fixed 16 × 16 size, and in order to achieve a multi-scale effect, a multi-scale signature graph is required as an input. In order to combine the feature pyramid structure with the RPN network, we use the fused feature layers { P2, P3, P4, P5} as the input of the RPN, so as to realize multi-scale input, and in order to make the scale information richer, perform maximum pooling downsampling on P5 to generate P6, so the input of the RPN network is { P2, P3, P4, P5, P6 }. The RPN network has 9 candidate windows (Anchors) of different scales in fast RCNN, which are divided into three scales and three aspect ratios, however, for the input of the RPN network with multiple scales, the significance of the multiple scales is not great, so that the three aspect ratios are preserved. { P2, P3, P4, P5, P6} corresponds to {32 × 32, 64 × 64, 128 × 128, 256 × 256, 512 × 512}, plus three aspect ratios, so that FPN-based RPN networks have a total of 15 different Anchors.
Step S140: and performing target judgment on the multiple hierarchical fusion feature layers and outputting results.
S141: and mapping the candidate boxes generated by the RPN to the feature maps of various scales respectively according to the sizes of the candidate boxes generated by the RPN. Assuming that the size of the candidate box is w × h, it should be mapped to P by the following calculationkOn a characteristic diagram, whereinAnd k is taken from 2 to 5, and the specific flow is shown in FIG. 5.
Figure BDA0001652379560000121
Since 224 × 224 is the pre-trained image size during the pre-training of the network model, the size of the reference candidate box is set to 224 and is denoted as k0Since C4 is used as input for RoI pooling in Faster RCNN, k is04. Assuming that w × h of the candidate frames are 112 × 112, respectively, k ═ k0And 1-3, the candidate box should be mapped to the feature map of P3, and then subjected to RoI pooling. The specific implementation is realized by only using the P2-P5 characteristic layer, and does not contain the P6 characteristic layer.
S142: in a classification regression subnetwork, results after multi-level feature RoI pooling are fused, and then a classification regression prediction network is input, wherein the specific structure is shown in FIG. 6, network parameters are reduced, the network training and testing speed is improved, and the fusion calculation process is shown in the following formula.
Op=K×N×C×H×W
Wherein, OpFor fused output, the input signature is X ═ X1,x2,x3…,xK},xkK is the number of fused feature layers, and in this embodiment, the value is 4, H, W is the length and width of the feature map, C is the number of feature map channels, and N is the number of feature maps.
Step S143: and the target judgment characteristic layer performs target judgment and outputs a result.
In the embodiment provided by the invention, the remote sensing image time-sensitive target recognition system is trained by adopting the following steps:
step S210: data set production and analysis
1000 remote sensing images with the resolution of 0.6m in airports around the world are downloaded through Google Earth satellite remote sensing image software, and the size of the images is more than 2000 multiplied by 2000. And labeling the sample in the data set by a LabelImg image labeling tool, wherein the format of the PASCAL VOC data set is imitated.
The data set contains 1000 remote sensing image data sets with the size of 2000 multiplied by 2000, wherein 340 training samples, 330 testing samples and 330 verification samples are included, and the number of airplane targets exceeds 14000.
By analyzing the size of all the labeled targets, as shown in FIG. 7. The abscissa is K and the ordinate is the target number. The statistical calculation formula is:
Figure BDA0001652379560000131
k in FIG. 7rThe scale sizes of 1, 2 and 3 are 16 × 16,32 × 32 and 64 × 64, respectively.
Step S220: and (5) counting and designing scale hyper-parameters of the system according to the target scale distribution of the data set in the step (S210), designing other reasonable hyper-parameters aiming at system training, and inputting the training data set into a remote sensing image time-sensitive target recognition system for training.
Step S230: and inputting the test data set into a remote sensing image time-sensitive target recognition system for forward calculation, and testing the detection performance and generalization capability of the remote sensing image time-sensitive target recognition system.
In order to verify the effectiveness of the proposed feature pyramid network for remote sensing image aircraft target detection, three scenes are detected by comparing and analyzing with the existing mainstream target detection frameworks fast RCNN and R-FCN, the size of the detected image is larger than 1000 × 1000, the aircraft target detection result of the invention and the comparison graph of the detection effect with other methods are shown in fig. 8, 9(a1) to 9(a3), 9(b1) to 9(b3) and 9(c1) to 9(c3), the data sets used by the above methods are consistent with the invention, and the results are shown in table 1.
The average accuracy is used as a model evaluation index, the larger the value of the average accuracy is, the better the detection performance is, and the detection time is obtained by the test statistics of the large-format remote sensing image.
TABLE 1
Figure BDA0001652379560000141
The invention firstly designs a feature pyramid target detection network model SEM-FPN based on channel weighting, which comprises a target feature extraction sub-network SE-MobileNet based on channel weighting, a feature layer sub-network, a candidate region generation sub-network RPN and a classification regression sub-network, wherein a feature pyramid structure is respectively constructed on the sub-network RPN and the classification regression sub-network, and multi-level region pooling feature fusion is adopted in the classification regression sub-network to reduce network parameters; then, a remote sensing image data set is manufactured, and target scale distribution in the data set is analyzed; setting proper training hyper-parameters at a training level of the network, inputting a training sample into SEM-FPN for end-to-end training to obtain a time-sensitive target detection model; and at an image testing level, inputting the test samples in the data set into a sensitive target detection model, and performing forward prediction calculation.
It will be understood by those skilled in the art that the foregoing is only a preferred embodiment of the present invention, and is not intended to limit the invention, and that any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims (8)

1. A remote sensing image time-sensitive target recognition system based on a characteristic pyramid is characterized by comprising:
the target feature extraction sub-network is provided with a plurality of hierarchical output ends and is used for carrying out multi-level convolution processing on the image to be processed and outputting each hierarchical convolution processing result as a feature layer;
the feature layer sub-network is provided with a plurality of hierarchical input ends and a plurality of hierarchical output ends, one hierarchical input end is connected with one hierarchical output end of the target feature extraction sub-network and is used for superposing the previous feature layer and the current feature layer to obtain a current fusion feature layer, and the topmost fusion feature layer is the topmost feature layer;
the candidate region generation sub-network is provided with a plurality of hierarchical input ends and a plurality of hierarchical output ends, wherein one hierarchical input end is connected with one hierarchical output end of the feature layer sub-network and is used for extracting candidate regions from different hierarchical fusion feature layers;
the classification regression sub-network is provided with a plurality of hierarchy input ends and RPN input ends, wherein one hierarchy input end is connected with one hierarchy output end of the feature layer sub-network, and the RPN input end is connected with the output end of the candidate region generation sub-network and is used for mapping the candidate regions to different hierarchy fusion feature layers to obtain a plurality of hierarchy fusion feature layers after mapping processing and performing target judgment on the plurality of hierarchy fusion feature layers after mapping processing to output results;
the feature layer sub-network comprises a plurality of feature layer sub-modules, which are marked as a first feature layer sub-module, a second feature layer sub-module, … …, an ith feature layer sub-module, … … and an Nth feature layer sub-module; i is more than or equal to 1 and less than or equal to N-1;
one input end of the ith characteristic layer sub-module is used as a layer input end of the characteristic layer sub-network, the other input end of the ith characteristic layer sub-module is connected with the output end of the (i + 1) th characteristic layer sub-module, and the input end of the Nth characteristic layer sub-module is used as a layer input end of the characteristic layer sub-network;
the first N-1 characteristic layer submodules are used for carrying out superposition processing on the previous characteristic layer and the current characteristic layer to obtain a current fusion characteristic layer; and the Nth characteristic layer submodule is used for outputting the current characteristic layer as a current fusion characteristic layer, wherein the previous characteristic layer is obtained by performing convolution on the current characteristic layer.
2. The remote sensing image time-sensitive target recognition system of claim 1, wherein any one of the first N-1 feature layer submodules comprises:
the system comprises a previous layer processing subunit, a current processing subunit and a superposition unit, wherein the output end of the previous layer processing subunit is connected with the first input end of the superposition unit, and the output end of the current processing subunit is connected with the second input end of the superposition unit;
the last layer of processing subunit is used for performing up-sampling processing on the last layer of feature layer and outputting the processed last layer of feature layer, the current processing subunit is used for performing convolution processing on the current feature layer and outputting the processed current feature layer, and the superposition unit is used for performing superposition processing on the processed last layer of feature layer and the processed current feature layer and outputting the current fusion feature layer.
3. The remote sensing image time-sensitive target recognition system of claim 1 or 2, wherein the feature layer sub-module further comprises an aliasing effect processing unit, an input end of the aliasing effect processing unit is connected with an output end of the superposition unit, and the aliasing effect processing unit is configured to perform convolution processing on the current fusion feature layer and output a final current fusion feature layer.
4. The remote sensing image time-sensitive target recognition system of claim 1 or 2, wherein the feature layer sub-network further comprises an N +1 th feature layer sub-module, the candidate region generating sub-network is provided with an additional hierarchical input; the input end of the (N + 1) th feature layer submodule is connected with the output end of the Nth feature layer submodule, and the output end of the (N + 1) th feature layer submodule is connected with the additional level input end of the candidate region generation sub-network;
the (N + 1) th feature layer submodule is used for performing up-sampling on the fusion feature layer output by the (N) th feature layer submodule to output an (N + 1) th feature layer, and the candidate region generation sub-network simultaneously extracts the candidate region from the (N + 1) th feature layer.
5. The remote-sensing image time-sensitive target recognition system of claim 1, wherein the classification regression sub-network comprises a mapping sub-module, a fusion sub-module and a target judgment sub-module which are connected in sequence, the mapping sub-module is used for mapping the candidate region to different levels of fusion feature layers to obtain a plurality of levels of fusion feature layers after mapping, the fusion sub-module is used for performing fusion processing on the plurality of levels of fusion feature layers after mapping to obtain a target judgment feature layer, and the target judgment sub-module is used for performing target judgment on the target judgment feature layer to output a target judgment result.
6. The identification method of the remote sensing image time-sensitive target identification system based on claim 1 is characterized by comprising the following steps:
s110, performing multi-level convolution processing on the image to be processed, and taking each level convolution processing result as a level characteristic layer in the pyramid characteristic layer;
s120, overlapping the ith hierarchical feature layer and the (i + 1) th hierarchical feature layer in the pyramid feature layer to obtain an ith hierarchical fusion feature layer; traversing the first N-1 levels in the pyramid feature layer by the step i to obtain N-1 level fusion feature layers; taking the Nth hierarchical feature layer in the pyramid feature layer as an Nth hierarchical fusion feature layer;
s130, extracting candidate regions from the N hierarchical fusion feature layers;
s140 maps the candidate region to N hierarchical fusion feature layers to obtain a plurality of hierarchical fusion feature layers after mapping, and performs target determination on the plurality of hierarchical fusion feature layers after mapping to output a result.
7. The identification method of claim 6, further comprising the steps between step S110 and step S120 of:
performing convolution processing on the ith level feature layer to enable the channel of the ith level feature layer to be the same as the channel of the (i + 1) th level feature layer; performing upsampling processing on the (i + 1) th level feature layer to enable the size of the (i) th level feature layer to be the same as that of the (i + 1) th level feature layer; i is more than or equal to 1 and less than or equal to N-1.
8. The identification method according to claim 6 or 7, characterized in that step S140 comprises the following sub-steps:
step S141: mapping the candidate region to N hierarchical fusion feature layers to obtain N hierarchical fusion feature layers after mapping processing;
step S142: fusing the N levels of fused feature layers subjected to mapping processing to obtain a target judgment feature layer;
step S143: and the target judgment characteristic layer performs target judgment and outputs a result.
CN201810427107.2A 2018-05-07 2018-05-07 Remote sensing image time-sensitive target identification system and method based on characteristic pyramid Active CN108764063B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810427107.2A CN108764063B (en) 2018-05-07 2018-05-07 Remote sensing image time-sensitive target identification system and method based on characteristic pyramid

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810427107.2A CN108764063B (en) 2018-05-07 2018-05-07 Remote sensing image time-sensitive target identification system and method based on characteristic pyramid

Publications (2)

Publication Number Publication Date
CN108764063A CN108764063A (en) 2018-11-06
CN108764063B true CN108764063B (en) 2020-05-19

Family

ID=64010105

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810427107.2A Active CN108764063B (en) 2018-05-07 2018-05-07 Remote sensing image time-sensitive target identification system and method based on characteristic pyramid

Country Status (1)

Country Link
CN (1) CN108764063B (en)

Families Citing this family (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109671070B (en) * 2018-12-16 2021-02-09 华中科技大学 Target detection method based on feature weighting and feature correlation fusion
CN109800793B (en) * 2018-12-28 2023-12-22 广州海昇教育科技有限责任公司 Target detection method and system based on deep learning
CN109886286B (en) * 2019-01-03 2021-07-23 武汉精测电子集团股份有限公司 Target detection method based on cascade detector, target detection model and system
CN111488777A (en) * 2019-01-28 2020-08-04 北京地平线机器人技术研发有限公司 Object identification method, object identification device and electronic equipment
CN109816671B (en) * 2019-01-31 2021-09-24 深兰科技(上海)有限公司 Target detection method, device and storage medium
CN109977963B (en) * 2019-04-10 2021-10-15 京东方科技集团股份有限公司 Image processing method, apparatus, device and computer readable medium
CN109977956B (en) * 2019-04-29 2022-11-18 腾讯科技(深圳)有限公司 Image processing method and device, electronic equipment and storage medium
CN110070072A (en) * 2019-05-05 2019-07-30 厦门美图之家科技有限公司 A method of generating object detection model
CN110321805B (en) * 2019-06-12 2021-08-10 华中科技大学 Dynamic expression recognition method based on time sequence relation reasoning
CN110348311B (en) * 2019-06-13 2021-03-19 中国人民解放军战略支援部队信息工程大学 Deep learning-based road intersection identification system and method
CN110378297B (en) * 2019-07-23 2022-02-11 河北师范大学 Remote sensing image target detection method and device based on deep learning and storage medium
CN110490860A (en) * 2019-08-21 2019-11-22 北京大恒普信医疗技术有限公司 Diabetic retinopathy recognition methods, device and electronic equipment
CN110533105B (en) * 2019-08-30 2022-04-05 北京市商汤科技开发有限公司 Target detection method and device, electronic equipment and storage medium
CN110909642A (en) * 2019-11-13 2020-03-24 南京理工大学 Remote sensing image target detection method based on multi-scale semantic feature fusion
CN111191531A (en) * 2019-12-17 2020-05-22 中南大学 Rapid pedestrian detection method and system
CN111160293A (en) * 2019-12-31 2020-05-15 珠海大横琴科技发展有限公司 Small target ship detection method and system based on characteristic pyramid network
CN113658230B (en) * 2020-05-12 2024-05-28 武汉Tcl集团工业研究院有限公司 Optical flow estimation method, terminal and storage medium
CN111783772A (en) * 2020-06-12 2020-10-16 青岛理工大学 Grabbing detection method based on RP-ResNet network
CN111882581B (en) * 2020-07-21 2022-10-28 青岛科技大学 Multi-target tracking method for depth feature association
CN112330664B (en) * 2020-11-25 2022-02-08 腾讯科技(深圳)有限公司 Pavement disease detection method and device, electronic equipment and storage medium
CN113095356B (en) * 2021-03-03 2023-10-31 北京邮电大学 Light-weight neural network system and image processing method and device
CN113361375B (en) * 2021-06-02 2022-06-07 武汉理工大学 Vehicle target identification method based on improved BiFPN
CN114842365B (en) * 2022-07-04 2022-11-29 中国科学院地理科学与资源研究所 Unmanned aerial vehicle aerial photography target detection and identification method and system

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105046197B (en) * 2015-06-11 2018-04-17 西安电子科技大学 Multi-template pedestrian detection method based on cluster
US9904849B2 (en) * 2015-08-26 2018-02-27 Digitalglobe, Inc. System for simplified generation of systems for broad area geospatial object detection
CN105894045B (en) * 2016-05-06 2019-04-26 电子科技大学 A kind of model recognizing method of the depth network model based on spatial pyramid pond
CN106683091B (en) * 2017-01-06 2019-09-24 北京理工大学 A kind of target classification and attitude detecting method based on depth convolutional neural networks

Also Published As

Publication number Publication date
CN108764063A (en) 2018-11-06

Similar Documents

Publication Publication Date Title
CN108764063B (en) Remote sensing image time-sensitive target identification system and method based on characteristic pyramid
Sun et al. RSOD: Real-time small object detection algorithm in UAV-based traffic monitoring
CN112818903B (en) Small sample remote sensing image target detection method based on meta-learning and cooperative attention
Guo et al. CDnetV2: CNN-based cloud detection for remote sensing imagery with cloud-snow coexistence
CN109086668B (en) Unmanned aerial vehicle remote sensing image road information extraction method based on multi-scale generation countermeasure network
CN111291809B (en) Processing device, method and storage medium
Hang et al. Multiscale progressive segmentation network for high-resolution remote sensing imagery
CN113780211A (en) Lightweight aircraft detection method based on improved yolk 4-tiny
CN111461083A (en) Rapid vehicle detection method based on deep learning
CN110348447B (en) Multi-model integrated target detection method with abundant spatial information
Doi et al. The effect of focal loss in semantic segmentation of high resolution aerial image
CN115035361A (en) Target detection method and system based on attention mechanism and feature cross fusion
CN111985325A (en) Aerial small target rapid identification method in extra-high voltage environment evaluation
CN112766409A (en) Feature fusion method for remote sensing image target detection
CN116740344A (en) Knowledge distillation-based lightweight remote sensing image semantic segmentation method and device
CN114926693A (en) SAR image small sample identification method and device based on weighted distance
CN113538347A (en) Image detection method and system based on efficient bidirectional path aggregation attention network
CN116740516A (en) Target detection method and system based on multi-scale fusion feature extraction
CN112084897A (en) Rapid traffic large-scene vehicle target detection method of GS-SSD
CN115527098A (en) Infrared small target detection method based on global mean contrast space attention
Yang et al. Real-Time object detector based MobileNetV3 for UAV applications
Khosravian et al. Multi‐domain autonomous driving dataset: Towards enhancing the generalization of the convolutional neural networks in new environments
Wang Remote sensing image semantic segmentation algorithm based on improved ENet network
Singh et al. An enhanced YOLOv5 based on color harmony algorithm for object detection in unmanned aerial vehicle captured images
CN114494893B (en) Remote sensing image feature extraction method based on semantic reuse context feature pyramid

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20240315

Address after: Room A348, 4th Floor, Building 1, Phase III, International Enterprise Center, No. 1 Guanggu Avenue, Donghu New Technology Development Zone, Wuhan City, Hubei Province, 430073 (Wuhan Free Trade Zone)

Patentee after: AVIC Tianhai (Wuhan) Technology Co.,Ltd.

Country or region after: China

Address before: 430074 Hubei Province, Wuhan city Hongshan District Luoyu Road No. 1037

Patentee before: HUAZHONG University OF SCIENCE AND TECHNOLOGY

Country or region before: China

TR01 Transfer of patent right