CN108764063B

CN108764063B - Remote sensing image time-sensitive target identification system and method based on characteristic pyramid

Info

Publication number: CN108764063B
Application number: CN201810427107.2A
Authority: CN
Inventors: 杨卫东; 金俊波; 王祯瑞; 习思; 黄竞辉; 钟胜; 陈俊
Original assignee: Huazhong University of Science and Technology
Current assignee: Avic Tianhai Wuhan Technology Co ltd
Priority date: 2018-05-07
Filing date: 2018-05-07
Publication date: 2020-05-19
Anticipated expiration: 2038-05-07
Also published as: CN108764063A

Abstract

The invention discloses a remote sensing image time-sensitive target identification system and method based on a feature pyramid. The target feature extraction sub-network is used for carrying out multi-level convolution processing on the image to be processed and outputting the convolution processing result of each level as a feature layer. And the feature layer sub-network superposes the previous feature layer and the current feature layer to obtain a current fused feature layer, wherein the topmost fused feature layer is the topmost feature layer. The candidate region generation sub-network is used for extracting candidate regions from different levels of fusion feature layers, the classification regression sub-network maps the candidate regions to the different levels of fusion feature layers to obtain a plurality of levels of fusion feature layers after mapping processing, and target judgment is carried out on the plurality of levels of fusion feature layers after mapping processing to output results. And by utilizing the hierarchical structure of the characteristic pyramid, the characteristics of all scales have rich semantic information.

Description

Remote sensing image time-sensitive target identification system and method based on characteristic pyramid

Technical Field

The invention belongs to the field of remote sensing image processing, and particularly relates to a remote sensing image time-sensitive target identification system and method based on a feature pyramid.

Background

The detection of large-format remote sensing image moving targets is an important component in the field of remote sensing image analysis and is extremely challenging due to the characteristics of multiple scales, wide format, special visual angle and the like, and the basic task of the detection is to determine whether one or more category target objects are contained in a given aviation or satellite image and the accurate position of each category target object. The time-sensitive target detection of the large-breadth remote sensing image is used as an important component in the fields of computer vision and remote sensing image analysis, and has high research significance. With the rapid development of the current remote sensing technology, the sensor technology and the internet technology, people can utilize the remote sensing technology to realize large-format ground area or ocean area detection, so that the data obtained by remote sensing is more and more comprehensive, and the data also comprises information in various aspects such as geography, humanity and the like. The remote sensing technology has wide application in the aspects of ground aircraft target and marine vessel target searching and salvation, illegal immigration, guard soil taking, environment monitoring, military information collection, ground resource investigation and the like.

Object detection has two basic tasks, namely determining whether an object exists in an image and determining the position of the object. Target detection in natural scenes is a core research field of vision and digital image processing, and has been a hot direction for academic research and industrial application in recent years. Since 2012, the convolutional neural network makes a major breakthrough in the field of image classification, the strong feature extraction capability of deep learning also brings possibility for high-precision target detection, the target detection problem is converted into the image classification problem, the deep learning is applied to target detection, the rapid development of the technical field of target detection is promoted, and a target detection method based on the deep learning is also different day by day.

The existing deep learning target detection model only utilizes the previous layer of single-scale features, although semantic information of the previous layer of features is very strong, position information is very weak, the detection effect on multi-scale targets is poor, and meanwhile, the existing network has large scale and many parameters and has high demand on computing resources.

Disclosure of Invention

The invention provides a remote sensing image time-sensitive target identification system and method based on a characteristic pyramid, aiming at the difficult problem of large-format remote sensing image multi-scale target detection. The characteristic pyramid network SEM-FPN based on channel weighting is applied to time-sensitive target detection of remote sensing images, improvement is carried out according to problems existing in practical application of a network model, multi-scale target detection is achieved through an algorithm, and meanwhile the characteristic of light weight is achieved.

As one aspect of the present invention, the present invention provides a remote sensing image time-sensitive target recognition system based on a feature pyramid, including:

the target feature extraction sub-network is provided with a plurality of hierarchical output ends and is used for carrying out multi-level convolution processing on the image to be processed and outputting each hierarchical convolution processing result as a feature layer;

the feature layer sub-network is provided with a plurality of hierarchical input ends and a plurality of hierarchical output ends, one hierarchical input end is connected with one hierarchical output end of the target feature extraction sub-network and is used for superposing the previous feature layer and the current feature layer to obtain a current fusion feature layer, and the topmost fusion feature layer is the topmost feature layer;

the candidate region generation sub-network is provided with a plurality of hierarchical input ends and a plurality of hierarchical output ends, wherein one hierarchical input end is connected with one hierarchical output end of the feature layer sub-network and is used for extracting candidate regions from different hierarchical fusion feature layers;

and the classification regression sub-network is provided with a plurality of hierarchy input ends and RPN input ends, wherein one hierarchy input end is connected with one hierarchy output end of the feature layer sub-network, and the RPN input end is connected with the output end of the candidate region generation sub-network and is used for mapping the candidate regions to different hierarchy fusion feature layers to obtain a plurality of hierarchy fusion feature layers after mapping processing and performing target judgment on the plurality of hierarchy fusion feature layers after mapping processing to output results.

Preferably, the feature layer sub-network comprises a plurality of feature layer sub-modules, which are denoted as a first feature layer sub-module, a second feature layer sub-module, … …, an ith feature layer sub-module, … …, and an nth feature layer sub-module; i is more than or equal to 1 and less than or equal to N-1;

one input end of the ith characteristic layer sub-module is used as a layer input end of the characteristic layer sub-network, the other input end of the ith characteristic layer sub-module is connected with the output end of the (i + 1) th characteristic layer sub-module, and the input end of the Nth characteristic layer sub-module is used as a layer input end of the characteristic layer sub-network;

the first N-1 characteristic layer submodules are used for carrying out superposition processing on the previous characteristic layer and the current characteristic layer to obtain a current fusion characteristic layer; and the Nth characteristic layer submodule is used for outputting the current characteristic layer as a current fusion characteristic layer, wherein the previous characteristic layer is obtained by performing convolution on the current characteristic layer.

Preferably, any one of the first N-1 feature layer sub-modules comprises:

the system comprises a previous layer processing subunit, a current processing subunit and a superposition unit, wherein the output end of the previous layer processing subunit is connected with the first input end of the superposition unit, and the output end of the current processing subunit is connected with the second input end of the superposition unit;

the last layer of processing subunit is used for performing up-sampling processing on the last layer of feature layer and outputting the processed last layer of feature layer, the current processing subunit is used for performing convolution processing on the current feature layer and outputting the processed current feature layer, and the superposition unit is used for performing superposition processing on the processed last layer of feature layer and the processed current feature layer and outputting the current fusion feature layer.

Preferably, the feature layer sub-module further includes an aliasing effect processing unit, an input end of the aliasing effect processing unit is connected to an output end of the superposition unit, and the aliasing effect processing unit is configured to perform convolution processing on the current fusion feature layer and output a final current fusion feature layer.

Preferably, the feature layer sub-network further comprises an N +1 th feature layer sub-module, and the candidate region generation sub-network is provided with an additional hierarchical input end; the input end of the (N + 1) th feature layer submodule is connected with the output end of the Nth feature layer submodule, and the output end of the (N + 1) th feature layer submodule is connected with the additional level input end of the candidate region generation sub-network;

the (N + 1) th feature layer submodule is used for performing up-sampling on the fusion feature layer output by the (N) th feature layer submodule to output an (N + 1) th feature layer, and the candidate region generation sub-network simultaneously extracts the candidate region from the (N + 1) th feature layer.

Preferably, the classification regression sub-network includes a mapping sub-module, a fusion sub-module and a target judgment sub-module, which are connected in sequence, the mapping sub-module is configured to map the candidate region to different levels of fusion feature layers to obtain a plurality of levels of fusion feature layers after mapping, the fusion sub-module is configured to perform fusion processing on the plurality of levels of fusion feature layers after mapping to obtain a target judgment feature layer, and the target judgment sub-module is configured to perform target judgment on the target judgment feature layer to output a target judgment result.

As another aspect of the present invention, the present invention provides a method for identifying a time-sensitive target based on a remote sensing image, including the following steps:

s110, performing multi-level convolution processing on the image to be processed, and taking each level convolution processing result as a level characteristic layer in the pyramid characteristic layer;

s120, overlapping the ith hierarchical feature layer and the (i + 1) th hierarchical feature layer in the pyramid feature layer to obtain an ith hierarchical fusion feature layer; traversing the first N-1 levels in the pyramid feature layer by the step i to obtain N-1 level fusion feature layers; taking the Nth hierarchical feature layer in the pyramid feature layer as an Nth hierarchical fusion feature layer;

s130, extracting candidate regions from the N hierarchical fusion feature layers;

s140 maps the candidate region to N hierarchical fusion feature layers to obtain a plurality of hierarchical fusion feature layers after mapping, and performs target determination on the plurality of hierarchical fusion feature layers after mapping to output a result.

Preferably, the following steps are further included between step S110 and step S120:

performing convolution processing on the ith level feature layer to enable the channel of the ith level feature layer to be the same as the channel of the (i + 1) th level feature layer; performing upsampling processing on the (i + 1) th level feature layer to enable the size of the (i) th level feature layer to be the same as that of the (i + 1) th level feature layer; i is more than or equal to 1 and less than or equal to N-1.

Preferably, step S140 includes the following sub-steps:

step S141: mapping the candidate region to N hierarchical fusion feature layers to obtain N hierarchical fusion feature layers after mapping processing;

step S142: fusing the N levels of fused feature layers subjected to mapping processing to obtain a target judgment feature layer;

step S143: and the target judgment characteristic layer performs target judgment and outputs a result.

Generally, compared with the existing remote sensing image multi-scale target detection technology, the method has the following advantages:

1. the invention provides a remote sensing image time-sensitive target identification method based on a characteristic pyramid network, which is characterized in that the structural idea of the characteristic pyramid is introduced into the target detection of a remote sensing image, the hierarchical structure of the characteristic pyramid of a convolutional neural network is utilized, information among all levels is fused and applied, the top-down side connection is adopted, and high-level semantic information is transmitted downwards, so that the characteristics of all scales have rich semantic information;

2. the invention provides a remote sensing image time-sensitive target identification method based on a feature pyramid network, and provides a multilevel candidate area (RoI) pooling feature map fusion method, which is used for fusing multilevel RoI pooling features.

3. The invention provides a remote sensing image time-sensitive target identification method based on a characteristic pyramid network, which is characterized in that an SE structure is integrated into a target detection network structure unit, the interdependency relation between channels is explicitly modeled, the characteristic response of the channels is adaptively recalibrated, and the characteristic extraction capability is improved.

Drawings

Fig. 1 is a schematic structural diagram of a remote sensing image multi-scale target recognition system based on a feature pyramid according to the present invention;

FIG. 2 is a flowchart of a method for identifying a multi-scale target of a remote sensing image based on a feature pyramid according to the present invention;

FIG. 3 is a schematic diagram of an SE structure in the remote sensing image multi-scale target recognition system provided by the invention;

FIG. 4 is a schematic diagram illustrating multi-scale feature information fusion in a remote sensing image multi-scale target recognition system provided by the present invention;

FIG. 5 is a schematic diagram of a classification regression subnetwork constructed feature pyramid structure in the remote sensing image multi-scale target recognition system provided by the invention;

FIG. 6 is a schematic diagram illustrating the hierarchical region pooling feature fusion in the remote sensing image multi-scale target recognition system provided by the present invention;

FIG. 7 is a schematic view of a data set analysis target scale distribution according to the present invention;

FIG. 8 is a diagram illustrating a detection result of a time-sensitive target in a remote sensing image according to an embodiment of the present invention;

fig. 9(a1) shows Fast RCNN detection results in the recognition scenario, fig. 9(a2) shows R-FCN detection results in the recognition scenario, fig. 9(a3) shows FPN detection results in the recognition scenario, fig. 9(b1) shows Fast RCNN detection results in the recognition scenario two, fig. 9(b2) shows R-FCN detection results in the recognition scenario two, fig. 9(b3) shows FPN detection results in the recognition scenario two, fig. 9(c1) shows Fast RCNN detection results in the recognition scenario three, fig. 9(c2) shows R-FCN detection results in the recognition scenario three, and fig. 9(c3) shows FPN detection results in the recognition scenario three.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.

Example one

The invention provides a remote sensing image time-sensitive target recognition system based on a feature pyramid. The target feature extraction sub-network is provided with a plurality of hierarchical output ends, the classification regression sub-network is provided with a plurality of hierarchical input ends and RPN input ends, the feature layer sub-network and the candidate region generation sub-network are respectively provided with a plurality of hierarchical input ends and a plurality of hierarchical output ends, one hierarchical output end of the target feature extraction sub-network is connected with one hierarchical input end of the feature layer sub-network, one hierarchical output end of the feature layer sub-network is connected with one hierarchical input end of the candidate region generation sub-network, one hierarchical output end of the feature layer sub-network is connected with one hierarchical input end of the classification regression sub-network, and an output end of the candidate region generation sub-network is connected with the.

The target feature extraction sub-network is used for carrying out multi-level convolution processing on the image to be processed and outputting the convolution processing result of each level as a feature layer. And the feature layer sub-network superposes the previous feature layer and the current feature layer to obtain a current fused feature layer, wherein the topmost fused feature layer is the topmost feature layer. The candidate region generation sub-network is used for extracting candidate regions from different levels of fusion feature layers, the classification regression sub-network maps the candidate regions to the different levels of fusion feature layers to obtain a plurality of levels of fusion feature layers after mapping processing, and target judgment is carried out on the plurality of levels of fusion feature layers after mapping processing to output results.

Example two

Based on the first embodiment, the feature layer sub-network includes a plurality of feature layer sub-modules, which are denoted as a first feature layer sub-module, a second feature layer sub-module, … …, an i-th feature layer sub-module, … …, and an N-th feature layer sub-module. Wherein, one input end of the ith characteristic layer sub-module is used as a level input end of the characteristic layer sub-network, the other input end of the ith characteristic layer sub-module is connected with the output end of the (i + 1) th characteristic layer sub-module, the input end of the Nth characteristic layer sub-module is used as a level input end of the characteristic layer sub-network, i is more than or equal to 1 and less than or equal to N-1, and the first N-1 characteristic layer sub-modules are used for carrying out superposition processing on the previous characteristic layer and the current characteristic layer to obtain the current fusion characteristic layer; and the Nth characteristic layer submodule is used for outputting the current characteristic layer as a current fusion characteristic layer, and the previous characteristic layer is obtained by performing convolution on the current characteristic layer.

EXAMPLE III

As shown in fig. 2, based on the second embodiment, any one of the first N-1 feature layer sub-modules includes a previous layer processing sub-unit, a current processing sub-unit, and a superposition unit, an output end of the previous layer processing sub-unit is connected to a first input end of the superposition unit, an output end of the current processing sub-unit is connected to a second input end of the superposition unit, the previous layer processing sub-unit is configured to perform upsampling processing on the previous layer of feature layer to output a processed previous layer of feature layer T2, the current processing sub-unit is configured to perform 1 × 1 convolution processing on the current feature layer to output a processed current feature layer S2, and the superposition unit is configured to perform superposition processing on the processed previous layer of feature layer and the current layer of feature layer to output a current fused feature layer.

The feature layer submodule provided in the embodiment of the present invention performs upsampling processing on the previous feature layer to increase the size of the previous feature layer by one time, performs convolution processing on the current feature layer to make the number of channels of the current feature layer the same as the number of channels of the previous feature layer, and performs superposition processing to obtain the current fused feature layer, where the current fused feature layer has the characteristic of strong feature information of the previous feature layer and the characteristic of strong position information of the current feature layer.

Example four

On the basis of the third embodiment, the feature layer sub-module further includes an aliasing effect processing unit, an input end of the aliasing effect processing unit is connected with an output end of the superposition unit, and the aliasing effect processing unit is used for performing convolution processing on the current fusion feature layer, outputting the final current fusion feature layer, and realizing removal of an aliasing effect of upsampling.

EXAMPLE five

On the basis of any one of the second embodiment to the fifth embodiment, the feature layer sub-network further includes an N +1 th feature layer sub-module, the candidate region generating sub-network is provided with an additional hierarchical input end, an input end of the additional hierarchical input end is connected with an output end of the nth feature layer sub-module, an output end of the additional hierarchical input end is used for being connected with an additional hierarchical input end of the candidate region generating sub-network, the additional hierarchical input end is used for performing up-sampling on the fused feature layer output by the nth feature layer sub-module to output the N +1 th feature layer, and the candidate region generating sub-network simultaneously extracts the candidate region from the N +1 th feature layer.

EXAMPLE six

On the basis of any one of the first to fifth embodiments, the classification regression sub-network includes a mapping sub-module, a fusion sub-module, and a target determination sub-module, which are connected in sequence, where the mapping sub-module is configured to map the candidate region to different hierarchical fusion feature layers to obtain a plurality of hierarchical fusion feature layers after mapping, the fusion sub-module is configured to perform fusion processing on the plurality of hierarchical fusion feature layers after mapping to obtain a target determination feature layer, and the target determination sub-module is configured to perform target determination on the target determination feature layer to output a target determination result.

EXAMPLE seven

As shown in fig. 3, a method for identifying a time-sensitive target of a remote sensing image based on a feature pyramid includes the following steps:

Example eight

On the basis of the sixth embodiment, the following steps are further included between step S110 and step S120:

Example nine

On the basis of the sixth embodiment or the seventh embodiment, the step S140 includes the following sub-steps:

The invention provides a remote sensing image time-sensitive target identification method based on a feature pyramid network for the first time, which designs a feature pyramid target detection network model based on channel weighting, improves the three-dimensional expression capability and scale adaptability of target features from two dimensions of space and channels, and fuses multi-level regional pooling features in a classification regression network, so that the scale of the network model is greatly reduced, and the lightweight network is realized.

The embodiment of the remote sensing image time-sensitive target identification method based on the characteristic pyramid network provided by the invention comprises the following specific processes:

step S110: extracting target features in images

As shown in FIG. 4, SE-MobileNet is used as a feature extraction network, and the SE structure is added into a MobileNet network structural unit to realize the extraction of target features in an image.

Step S111: squeeze operation

The input U original features are compressed into a real number sequence of 1 × C through global average pooling, U represents the global spatial feature representation of the channel, and the real number sequence represents the channel descriptor, namely the expression of the features in the channel dimension, and the specific formula is as follows:

wherein W, H represents the length and width of the feature map, u_cA feature map representing the input.

Step S112: excitation operation

The formula for the real number sequence after extrusion operating with Excitation is shown below:

s＝F_ex(z,W)＝σ(g(z,W))＝δ(W₂σ(W₁z))

wherein, delta refers to a ReLU activation function, g refers to a sigmoid activation function,

two Fully Connected (FC) layers are introduced, namely a dimension reduction layer with parameters W1 and dimension reduction ratio r, then a ReLU is passed, and then a dimension increasing layer with parameters W2. Finally, the real number sequence of 1 × C is combined with the feature graph U to perform Scale operation by the following formula to obtain the final output.

Wherein X ═ X₁,x₂,...,x_c]And F_scale(u_c,s_c) Refers to the feature mapping u_c∈R^W×HAnd a scalar s_cCorresponding to the channel product.

Step S120: obtaining different levels of fused feature layers

Step S121: bottom-up path

The feature pyramid is constructed by dividing according to levels, defining feature graphs with the same scale as a pyramid level, and taking the last feature graph of each level as the final output of the pyramid of the level. Using the last feature map of the convolutional layer of each level as output, the output of the pyramid structure is defined with reference to the feature map scale as { C2, C3, C4, C5}, which are from convolutional layers conv2, conv3, conv4, and conv5, respectively, and these feature layers have a step size of {4,8,16,32} pixels with respect to the input image.

Step S122: top-down pyramid network paths and transverse connections

And constructing an information flow channel by connecting the top-down pyramid network paths, and fusing the high-level features and the current features. The specific operation is that the characteristics of the previous layer of the convolutional network are up-sampled and then are fused with the characteristics of the previous layer through transverse connection, the semantic information of the lower layer is also enhanced, and the size of the two characteristic graphs of the transverse connection needs to be the same so as to better utilize the positioning detail information of the current characteristic graph.

The feature T1 of the previous layer is obtained by firstly performing 2 times of upsampling on T1 to obtain a feature map T2, then adjusting the number of channels of the feature of the S1 layer through a convolution operation of 1 multiplied by 1, keeping the number of the channels consistent with that of the feature of the T2 layer, and finally fusing the feature of the S2 layer after the T2 and the number of the channels are adjusted, wherein the fusion mode is to directly perform pixel addition.

Step S130: RPN network constructs characteristic pyramid structure and extracts candidate region

The signature graph input into the RPN network in the fast RCNN is a fixed 16 × 16 size, and in order to achieve a multi-scale effect, a multi-scale signature graph is required as an input. In order to combine the feature pyramid structure with the RPN network, we use the fused feature layers { P2, P3, P4, P5} as the input of the RPN, so as to realize multi-scale input, and in order to make the scale information richer, perform maximum pooling downsampling on P5 to generate P6, so the input of the RPN network is { P2, P3, P4, P5, P6 }. The RPN network has 9 candidate windows (Anchors) of different scales in fast RCNN, which are divided into three scales and three aspect ratios, however, for the input of the RPN network with multiple scales, the significance of the multiple scales is not great, so that the three aspect ratios are preserved. { P2, P3, P4, P5, P6} corresponds to {32 × 32, 64 × 64, 128 × 128, 256 × 256, 512 × 512}, plus three aspect ratios, so that FPN-based RPN networks have a total of 15 different Anchors.

Step S140: and performing target judgment on the multiple hierarchical fusion feature layers and outputting results.

S141: and mapping the candidate boxes generated by the RPN to the feature maps of various scales respectively according to the sizes of the candidate boxes generated by the RPN. Assuming that the size of the candidate box is w × h, it should be mapped to P by the following calculation_kOn a characteristic diagram, whereinAnd k is taken from 2 to 5, and the specific flow is shown in FIG. 5.

Since 224 × 224 is the pre-trained image size during the pre-training of the network model, the size of the reference candidate box is set to 224 and is denoted as k₀Since C4 is used as input for RoI pooling in Faster RCNN, k is₀4. Assuming that w × h of the candidate frames are 112 × 112, respectively, k ═ k₀And 1-3, the candidate box should be mapped to the feature map of P3, and then subjected to RoI pooling. The specific implementation is realized by only using the P2-P5 characteristic layer, and does not contain the P6 characteristic layer.

S142: in a classification regression subnetwork, results after multi-level feature RoI pooling are fused, and then a classification regression prediction network is input, wherein the specific structure is shown in FIG. 6, network parameters are reduced, the network training and testing speed is improved, and the fusion calculation process is shown in the following formula.

O_p＝K×N×C×H×W

Wherein, O_pFor fused output, the input signature is X ═ X₁,x₂,x₃…,x_K}，x_kK is the number of fused feature layers, and in this embodiment, the value is 4, H, W is the length and width of the feature map, C is the number of feature map channels, and N is the number of feature maps.

In the embodiment provided by the invention, the remote sensing image time-sensitive target recognition system is trained by adopting the following steps:

step S210: data set production and analysis

1000 remote sensing images with the resolution of 0.6m in airports around the world are downloaded through Google Earth satellite remote sensing image software, and the size of the images is more than 2000 multiplied by 2000. And labeling the sample in the data set by a LabelImg image labeling tool, wherein the format of the PASCAL VOC data set is imitated.

The data set contains 1000 remote sensing image data sets with the size of 2000 multiplied by 2000, wherein 340 training samples, 330 testing samples and 330 verification samples are included, and the number of airplane targets exceeds 14000.

By analyzing the size of all the labeled targets, as shown in FIG. 7. The abscissa is K and the ordinate is the target number. The statistical calculation formula is:

k in FIG. 7_rThe scale sizes of 1, 2 and 3 are 16 × 16,32 × 32 and 64 × 64, respectively.

Step S220: and (5) counting and designing scale hyper-parameters of the system according to the target scale distribution of the data set in the step (S210), designing other reasonable hyper-parameters aiming at system training, and inputting the training data set into a remote sensing image time-sensitive target recognition system for training.

Step S230: and inputting the test data set into a remote sensing image time-sensitive target recognition system for forward calculation, and testing the detection performance and generalization capability of the remote sensing image time-sensitive target recognition system.

In order to verify the effectiveness of the proposed feature pyramid network for remote sensing image aircraft target detection, three scenes are detected by comparing and analyzing with the existing mainstream target detection frameworks fast RCNN and R-FCN, the size of the detected image is larger than 1000 × 1000, the aircraft target detection result of the invention and the comparison graph of the detection effect with other methods are shown in fig. 8, 9(a1) to 9(a3), 9(b1) to 9(b3) and 9(c1) to 9(c3), the data sets used by the above methods are consistent with the invention, and the results are shown in table 1.

The average accuracy is used as a model evaluation index, the larger the value of the average accuracy is, the better the detection performance is, and the detection time is obtained by the test statistics of the large-format remote sensing image.

TABLE 1

The invention firstly designs a feature pyramid target detection network model SEM-FPN based on channel weighting, which comprises a target feature extraction sub-network SE-MobileNet based on channel weighting, a feature layer sub-network, a candidate region generation sub-network RPN and a classification regression sub-network, wherein a feature pyramid structure is respectively constructed on the sub-network RPN and the classification regression sub-network, and multi-level region pooling feature fusion is adopted in the classification regression sub-network to reduce network parameters; then, a remote sensing image data set is manufactured, and target scale distribution in the data set is analyzed; setting proper training hyper-parameters at a training level of the network, inputting a training sample into SEM-FPN for end-to-end training to obtain a time-sensitive target detection model; and at an image testing level, inputting the test samples in the data set into a sensitive target detection model, and performing forward prediction calculation.

It will be understood by those skilled in the art that the foregoing is only a preferred embodiment of the present invention, and is not intended to limit the invention, and that any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims

1. A remote sensing image time-sensitive target recognition system based on a characteristic pyramid is characterized by comprising:

the classification regression sub-network is provided with a plurality of hierarchy input ends and RPN input ends, wherein one hierarchy input end is connected with one hierarchy output end of the feature layer sub-network, and the RPN input end is connected with the output end of the candidate region generation sub-network and is used for mapping the candidate regions to different hierarchy fusion feature layers to obtain a plurality of hierarchy fusion feature layers after mapping processing and performing target judgment on the plurality of hierarchy fusion feature layers after mapping processing to output results;

the feature layer sub-network comprises a plurality of feature layer sub-modules, which are marked as a first feature layer sub-module, a second feature layer sub-module, … …, an ith feature layer sub-module, … … and an Nth feature layer sub-module; i is more than or equal to 1 and less than or equal to N-1;

2. The remote sensing image time-sensitive target recognition system of claim 1, wherein any one of the first N-1 feature layer submodules comprises:

3. The remote sensing image time-sensitive target recognition system of claim 1 or 2, wherein the feature layer sub-module further comprises an aliasing effect processing unit, an input end of the aliasing effect processing unit is connected with an output end of the superposition unit, and the aliasing effect processing unit is configured to perform convolution processing on the current fusion feature layer and output a final current fusion feature layer.

4. The remote sensing image time-sensitive target recognition system of claim 1 or 2, wherein the feature layer sub-network further comprises an N +1 th feature layer sub-module, the candidate region generating sub-network is provided with an additional hierarchical input; the input end of the (N + 1) th feature layer submodule is connected with the output end of the Nth feature layer submodule, and the output end of the (N + 1) th feature layer submodule is connected with the additional level input end of the candidate region generation sub-network;

5. The remote-sensing image time-sensitive target recognition system of claim 1, wherein the classification regression sub-network comprises a mapping sub-module, a fusion sub-module and a target judgment sub-module which are connected in sequence, the mapping sub-module is used for mapping the candidate region to different levels of fusion feature layers to obtain a plurality of levels of fusion feature layers after mapping, the fusion sub-module is used for performing fusion processing on the plurality of levels of fusion feature layers after mapping to obtain a target judgment feature layer, and the target judgment sub-module is used for performing target judgment on the target judgment feature layer to output a target judgment result.

6. The identification method of the remote sensing image time-sensitive target identification system based on claim 1 is characterized by comprising the following steps:

7. The identification method of claim 6, further comprising the steps between step S110 and step S120 of:

8. The identification method according to claim 6 or 7, characterized in that step S140 comprises the following sub-steps: