CN110532859B

CN110532859B - Remote sensing image target detection method based on deep evolution pruning convolution net

Info

Publication number: CN110532859B
Application number: CN201910648586.5A
Authority: CN
Inventors: 焦李成; 李玲玲; 姜升; 郭雨薇; 程曦娜; 丁静怡; 张梦璇; 杨淑媛; 侯彪
Original assignee: Xidian University
Current assignee: Xidian University
Priority date: 2019-07-18
Filing date: 2019-07-18
Publication date: 2021-01-22
Anticipated expiration: 2039-07-18
Also published as: CN110532859A

Abstract

The invention discloses a remote sensing image target detection method based on a deep evolution pruning convolution network, which solves the problem that the detection speed and the detection precision are not simultaneously and globally effectively optimized in the conventional remote sensing image target detection. The method comprises the following specific steps: processing the data set; constructing a deep convolution characteristic extraction subnet; constructing a full convolution FCN detection subnet; constructing and training a deep convolution target detection network; constructing and training a target detection network based on a deep evolution pruning convolution network; carrying out target detection on the test data set by using the trained model; and outputting a test result. According to the method, the inverse residual structure is constructed by using the depth separable convolution, so that the model parameters are greatly reduced while the detection precision is high; the target detection network is combined with the evolutionary pruning, and overall acceleration is achieved. The method greatly reduces the calculated amount, obviously improves the target detection speed, has high detection precision, and is used for quickly and accurately detecting small targets such as airplanes, ships and warships in the remote sensing image.

Description

Remote sensing image target detection method based on deep evolution pruning convolution net

Technical Field

The invention belongs to the technical field of image processing, and further relates to remote sensing image target detection, in particular to a remote sensing image target detection method based on a deep evolution pruning convolution network, which can be applied to detecting surface feature targets of airplanes and ships in different areas in a remote sensing image.

Background

The target detection technology is one of core problems in the field of computer vision, and the remote sensing image target detection means that an image captured by a remote sensing satellite is used as a data source, and an image processing technology is adopted to position and classify an interested target in the image. The remote sensing image target detection is used as a key technology in the application of the remote sensing image, can capture an attack target in high-tech military countermeasure, provides accurate position and category information and the like, has great influence on the military field, and has important application and research values.

In the prior art, due to the large size, low resolution, small target size and fuzzy target edge of the remote sensing image, the characteristics of the target often cannot be well learned when the target detection of the remote sensing image is carried out by the existing method, so that the accuracy of the target detection is low, and the detection speed is greatly limited due to the huge data volume of the remote sensing image and the huge parameters of the network model.

The efficiency and the accuracy of the existing target detection technology are often not compatible. The second-order detection model such as FasterR-CNN has high accuracy and brings huge calculation amount; although the first-order detection models such as YOLO and SSD have fast calculation speed, the accuracy is not satisfactory.

Tsung-Yi Lin et al put forward a general one-stage target Detection model RetinaNet in a published paper "Focal local for sense Object Detection" (CVPR2017), the model utilizes a residual error network ResNet to complete the primary extraction of image features, a feature pyramid network FPN is added to fuse feature maps of different layers generated by the residual error network, the semantic information of output features is enhanced, small targets are easy to identify, the Detection performance is further improved, then classification and regression prediction are carried out on each layer of pyramid layer, finally, the problem of class imbalance caused by excessive background, which affects the accuracy of the one-stage target Detection model, is solved by utilizing a Focal local function, and the Detection result of the one-stage target Detection model on a COCO data set is firstly higher than that of the most advanced two-stage target Detection model at that time. However, the method still has the defects that a large amount of redundant information exists in the residual error network ResNet and the feature pyramid network FPN, the parameter quantity and the operation quantity are large, the calculation complexity and the calculation speed of the model are influenced, and the requirement of deployment in the embedded equipment is not met.

Shaohui Lin et al proposed a Global Dynamic Pruning method GDP in its published paper "additive conditional network via Global & Dynamic Filter Pruning" (IJCAI2018), first providing a Global discriminant function based on prior knowledge of each Filter, Pruning filters with low significance at all levels in a Global range, then dynamically updating the significance of the whole Pruning sparse network Filter, recoding and retraining the Filter with wrong Pruning to improve the accuracy of the model, and performing Global optimization by using a greedy algorithm-based stochastic gradient descent method. However, the method still has the disadvantages that the global discriminant function based on the filter priori knowledge needs to be designed according to specific tasks, and discriminant deviation may be introduced by using the same global discriminant function in different applications, so that the overall accuracy is lost.

When the target detection is carried out on an optical remote sensing image with large size and low resolution by the existing target detection algorithm, the detection accuracy and detection speed in the prior art cannot be optimal at the same time due to the limitation of huge data volume and model parameter quantity, and the problems of small target size, fuzzy target edge and the like, so that the rapid and accurate detection of the optical remote sensing image is difficult.

Disclosure of Invention

The invention aims to provide a remote sensing image target detection method based on a deep evolution pruning convolution network, which maintains higher accuracy, greatly reduces the computation complexity and greatly improves the overall network operation speed, aiming at the defects of the prior art.

The invention relates to a remote sensing image target detection method based on a deep evolution pruning convolution net, which is characterized by comprising the following steps:

(1) processing the training dataset and the validation dataset: selecting a plurality of optical remote sensing images containing various targets, processing the images into image blocks with 512 x 512 pixels, wherein 70% of the image blocks of the optical remote sensing images form training data, 30% of the image blocks form a verification data set, and performing data enhancement on the training data set;

(2) processing the test data set: inputting another plurality of optical remote sensing images containing various targets, and processing the images into image blocks with 512 x 512 pixels to form a test data set;

(3) constructing a deep convolution feature extraction sub-network: respectively constructing a depth separable convolution inverse residual error connecting module and a characteristic pyramid convolution module, sequentially using a 7 multiplied by 7 convolution layer and a maximum pooling layer, and alternately connecting the depth separable convolution inverse residual error connecting module and the characteristic pyramid convolution module to form a depth convolution characteristic extraction sub-network;

the specific structure of the sub-network for extracting the depth convolution features is that an original image input layer → 7 × 7 convolution layer → a first maximum pooling layer → a first depth separable convolution inverse residual connection module C1 → a second depth separable convolution inverse residual connection module C2 → a first feature pyramid convolution module P1 → a third depth separable convolution inverse residual connection module C3 → a second feature pyramid convolution module P2 → a fourth depth separable convolution inverse residual connection module C4 → a third feature pyramid convolution module P3 → a second maximum pooling layer → a fourth feature pyramid convolution module P4 → a third maximum pooling layer → a fifth feature pyramid convolution module P5 → a current stage feature map output layer;

(4) constructing a fully-convoluted FCN detection subnetwork:

(4a) constructing a full convolution FCN classification subnet: the structure is, classifying the subnet input layer → the first 3 x 3 convolution layer → the second 3 x 3 convolution layer → the third 3 x 3 convolution layer → the fourth 3 x 3 convolution layer → the fifth 3 x 3 convolution layer → classifying the subnet output layer; the classified subnet input layer takes the characteristic graph of each characteristic pyramid convolution module as the input of the classified subnet in turn, and carries out classification detection in turn;

(4b) constructing a fully-convoluted FCN regression subnet: the structure is, the regression subnet input layer → the first 3 x 3 convolution layer → the second 3 x 3 convolution layer → the third 3 x 3 convolution layer → the fourth 3 x 3 convolution layer → the fifth 3 x 3 convolution layer → the regression subnet output layer; the regression subnet input layer takes the characteristic graph of each characteristic pyramid convolution module as the input of the regression subnet in turn, and carries out regression detection in turn;

(5) constructing and training a deep convolution target detection network:

(5a) constructing a deep convolution target detection network: sequentially constructing a deep convolution target detection network by using a deep convolution feature extraction sub-network, a full convolution FCN classification sub-network and a full convolution FCN regression sub-network, wherein the structure of the deep convolution target detection network is that an original image input layer → the deep convolution feature extraction sub-network → the full convolution FCN classification regression sub-network; (5b) training a deep convolution target detection network: training the deep convolution target detection network by using the training data set and the verification data set as input to obtain a trained deep convolution target detection network, and storing a weight file of the trained deep convolution target detection network;

(6) constructing and training a target detection network based on a deep evolution pruning convolution network:

(6a) performing layer-by-layer DNA coding on a convolutional filter participating in pruning in a trained deep convolutional target detection network, and recording the coding as DNA_1,...l-1,l；

(6b) Optimizing DNA using evolutionary algorithms_1,...l-1,lCoding to obtain final optimized result coding DNA'_1,...l-1,l；

(6c) Coding of synthetic DNA 'with optimization results'_1,...l-1,lAnd pruning rule constructionThe method comprises the steps that a target detection network based on a deep evolution pruning convolutional network is obtained, a pruning rule is that a code is 0 to represent that a convolutional filter is finally pruned, a code is 1 to represent that the convolutional filter is finally reserved, a training data set is used for fine adjustment, the trained target detection network based on the deep evolution pruning convolutional network, namely a trained model is obtained, and a trained model weight file is stored;

(7) and carrying out target detection on the test data set by using the trained model:

(7a) sequentially inputting the data blocks in the test data set into a trained target detection network based on a deep evolution pruning convolutional network to obtain a candidate frame of each data block in the test data set, a classification confidence score corresponding to the candidate frame and a target category corresponding to the candidate frame;

(7b) discarding all candidate frames of the target category with the classification confidence score lower than the threshold value 0.3, and performing non-maximum suppression processing on the remaining candidate frames after retention;

(7c) and mapping the coordinates of all the reserved candidate frames, mapping the coordinates onto the optical remote sensing image before cutting, and performing secondary non-maximum suppression processing to obtain a final detection result image of the optical remote sensing image.

The invention discloses a remote sensing image target detection method based on a deep evolution pruning convolution network, which mainly solves the problem that the detection speed and the target detection precision are not simultaneously globally and effectively optimized in the existing remote sensing image target detection technology.

Compared with the prior art, the invention has the following advantages:

an optimization scheme is provided: the method provided by the invention optimizes the model accuracy and the operation speed simultaneously, has obvious advantages in the aspects of calculation complexity and operation speed, and improves the model accuracy compared with the prior art.

The target detection network is combined with a global dynamic pruning method based on an evolutionary algorithm to realize network acceleration: a brand-new global and dynamic pruning scheme based on an evolutionary algorithm is provided, and redundant filters are removed through a pruning method, so that CNN acceleration is realized. Most previous approaches tend to prune filters sequentially in a fixed layer-by-layer manner, which cannot dynamically recover previously removed filters, may ignore complex associations between filters, and may result in poor flexibility, which may result in significant degradation of network evaluation performance. The method comprises the steps of carrying out joint coding on all layers of filters to be pruned, optimizing a network to be pruned through an evolutionary algorithm, using the performance of the network in a test data set as the fitness of the evolutionary algorithm, and completing iterative optimization of a network structure in a retraining mode, so that the model achieves an ideal acceleration effect and the performance of the model is guaranteed.

The model parameter quantity is greatly reduced: the remote sensing image target detection method based on the deep evolution pruning convolution network replaces standard convolution in a ResNet network by using deep separable convolution, and does not use a ReLU activation function after 1 multiplied by 1 point-by-point convolution in a deep separable convolution unit, but uses a linear activation function to prevent characteristic information from being damaged, so that parameter quantity and calculated quantity required by fitting data are reduced while model detection precision is maintained, convergence speed of a network model is accelerated, the defect that the model operation speed is lost due to large network parameter quantity in the prior art is overcome, and the method can be applied to computing equipment with limited computing resources and storage resources.

The accuracy is ensured while the parameter quantity is reduced: a traditional residual connecting structure uses a1 x 1 convolutional layer to perform dimension reduction and dimension increase on a channel of an input feature map, but features are compressed after dimension reduction, part of useful feature information in the image is removed, and less target feature information causes the accuracy rate of model detection to be reduced. The invention designs an inverse residual connecting module, which is used for increasing the dimension by two times of the number of channels of an input feature map to obtain more image feature information, and then accessing a depth separable unit to extract features and reduce the dimension of the channels, so that the number of parameters is reduced, the network operation speed is increased, and meanwhile, the higher accuracy is maintained.

Drawings

FIG. 1 is a flow chart of the present invention;

fig. 2 is a block diagram of a depth separable convolution element of the present invention.

Detailed Description

The present invention will be described in detail below with reference to the accompanying drawings.

Example 1:

the remote sensing image target detection is an application which is of great interest in the field of remote sensing image processing and analysis, for example, whether targets such as airplanes and ships exist in a remote sensing image or not is judged, and the targets are identified, classified and accurately positioned. With the continuous development of satellite technology, the data volume of the existing optical remote sensing image is more and more huge, and compared with the vast sea area, the size of the target of an airplane and a ship is small, the target is sparse, and how to quickly and accurately detect the target from the massive optical remote sensing images is a challenging task. However, the existing remote sensing image target detection technology usually focuses on how to better learn the characteristic information of the target, so as to improve the accuracy of target detection, but the existing detection speed is greatly limited due to the huge data volume of the remote sensing image and the huge parameters of the network model.

The invention develops research aiming at the current situation, provides a remote sensing image target detection method taking detection accuracy and detection speed into consideration, in particular to a remote sensing image target detection method based on a deep evolution pruning convolution network, and the method is shown in figure 1 and comprises the following steps:

(1) processing the training dataset and the validation dataset: selecting a plurality of optical remote sensing images containing various targets, cutting the images into image blocks with 512 x 512 pixels, wherein 70% of the image blocks of the optical remote sensing images form training data, 30% of the image blocks form a verification data set, and performing data enhancement on the training data set.

(1a) Inputting a plurality of large-scale optical remote sensing images containing various targets to be processed.

(1b) And marking the targets by using a marking tool for a plurality of large-amplitude optical remote sensing images containing various targets.

(1c) And cutting the optical remote sensing image into image blocks with 512 x 512 pixels by taking each target as a center.

(1d) And naming each cut image block according to a data set naming rule, and forming a training data set and a verification data set by all named image blocks, wherein the training data set accounts for 70%, the verification data set accounts for 30%, and data enhancement is performed on the training data set.

The data set naming rule is that the file name of each remote sensing image to be cut is connected with the sliding window step number corresponding to the cutting data block by using an English underline _' symbol to generate a jpg format file.

(2) Processing the test data set: inputting a plurality of optical remote sensing images containing various targets, and cutting the images into image blocks with 512 x 512 pixels to form a test data set.

(2a) Inputting another plurality of large-scale optical remote sensing images containing various targets to be processed.

(2b) And marking the target by using a marking tool for the large-amplitude optical remote sensing image to be tested.

(2c) The overlapping pixels are set to 100 in a manner of overlapping sliding windows, and the picture is sequentially cut into image blocks of 512 × 512 pixels.

(2d) And naming each cut image block according to a data set naming rule, and forming a test data set by all the named image blocks.

(3) Constructing a deep convolution feature extraction sub-network: and respectively constructing a depth separable convolution inverse residual error connection module and a feature pyramid convolution module, sequentially using a 7 multiplied by 7 convolution layer and a maximum pooling layer, and alternately connecting the depth separable convolution inverse residual error connection module and the feature pyramid convolution module to form a depth convolution feature extraction sub-network.

The specific structure of the sub-network for extracting the depth convolution features is that an original image input layer → 7 × 7 convolution layer → a first maximum pooling layer → a first depth separable convolution inverse residual connection module C1 → a second depth separable convolution inverse residual connection module C2 → a first feature pyramid convolution module P1 → a third depth separable convolution inverse residual connection module C3 → a second feature pyramid convolution module P2 → a fourth depth separable convolution inverse residual connection module C4 → a third feature pyramid convolution module P3 → a second maximum pooling layer → a fourth feature pyramid convolution module P4 → a third maximum pooling layer → a fifth feature pyramid convolution module P5 → a current stage feature map output layer.

(4) Constructing a fully-convoluted FCN detection subnetwork:

(4a) constructing a full convolution FCN classification subnet: the structure is that the characteristic diagram of each characteristic pyramid convolution module is respectively selected as a classified subnet input layer → a first 3 x 3 convolution layer → a second 3 x 3 convolution layer → a third 3 x 3 convolution layer → a fourth 3 x 3 convolution layer → a fifth 3 x 3 convolution layer → a classified subnet output layer; the classified subnet input layer takes the feature map of each feature pyramid convolution module as the input of the classified subnet in turn, and carries out classification detection, wherein the sizes of the input feature maps are respectively 64 × 64, 32 × 32, 16 × 16, 8 × 8 and 4 × 4.

Calculating the classification features output by the fifth 3 × 3 convolutional layer to obtain the classification confidence of each default frame in each classification category, inputting the feature values of the feature map output by the fifth 3 × 3 convolutional layer into a sigmoid function, and outputting the probability that the default frame belongs to the corresponding category, namely the classification confidence of the default frame in each category, wherein the calculation formula of the sigmoid function is as follows:

wherein x represents a characteristic value of a characteristic diagram input into the sigmoid function.

(4b) Constructing a fully-convoluted FCN regression subnet: the structure is that the characteristic diagram of each characteristic pyramid convolution module is respectively selected as a regression subnet input layer → a first 3 multiplied by 3 convolution layer → a second 3 multiplied by 3 convolution layer → a third 3 multiplied by 3 convolution layer → a fourth 3 multiplied by 3 convolution layer → a fifth 3 multiplied by 3 convolution layer → a regression subnet output layer; the regression subnet input layer takes the feature map of each feature pyramid convolution module as the input of the regression subnet in turn, and carries out regression detection, wherein the sizes of the input feature maps are respectively 64 × 64, 32 × 32, 16 × 16, 8 × 8 and 4 × 4.

(5) Constructing and training a deep convolution target detection network:

(5a) constructing a deep convolution target detection network: the method comprises the steps of sequentially building a deep convolution target detection network by using a deep convolution feature extraction sub-network, a full convolution FCN classification sub-network and a full convolution FCN regression sub-network, wherein the structure of the deep convolution target detection network is that an original image input layer → the deep convolution feature extraction sub-network → the full convolution FCN classification regression sub-network.

(5b) Training a deep convolution target detection network: and training the deep convolution target detection network by using the training data set and the verification data set as input to obtain the trained deep convolution target detection network, and storing a weight file of the trained deep convolution target detection network.

(6a) performing layer-by-layer DNA coding on a convolutional filter participating in pruning in a trained deep convolutional target detection network, and recording the coding as DNA_1,...l-1,l。

(6b) Optimizing DNA using evolutionary algorithms_1,...l-1,lCoding to obtain final optimized result coding DNA'_1,...l-1,l. (6c) Incorporation of optimized result-encoding DNA'_{1，...l-1，l}And constructing a target detection network based on the deep evolution pruning convolutional network by using a pruning rule, wherein the pruning rule is that the code is 0 to indicate that the convolutional filter is finally pruned, the code is 1 to indicate that the convolutional filter is finally reserved, and a training data set is used for fine adjustment to obtain a trained target detection method network based on the deep evolution pruning convolutional network, namely a trained model, and storing a trained model weight file.

(7a) and sequentially inputting the data blocks in the test data set into a trained target detection network based on the deep evolution pruning convolutional network to obtain a candidate frame of each data block in the test data set, a classification confidence score corresponding to the candidate frame and a target category corresponding to the candidate frame.

(7b) And discarding all candidate frames of the target category with the classification confidence score lower than the threshold value 0.3, and performing non-maximum suppression processing on the remaining retained candidate frames.

The invention optimizes the detection accuracy and the operation speed of the target detection network simultaneously, provides an optimization scheme, achieves the best accuracy and speed simultaneously, has obvious advantages in the aspects of calculation complexity and operation speed, and improves the model accuracy compared with the prior art.

The idea of the invention is as follows: firstly, constructing an inverse residual error network based on a depth separable convolution unit capable of greatly reducing the parameter quantity of a model to extract the basic characteristics of an input image, taking the basic characteristics as the input of a characteristic pyramid convolution network to perform more precise characteristic extraction, using a full convolution FCN classification subnet and a full convolution FCN regression subnet to perform detection, and finally performing global evolution pruning optimization to realize network acceleration. The extracted features are more suitable for the remote sensing image target detection task, the accuracy rate of remote sensing image target detection can be improved, and the network operation speed is greatly improved.

Example 2:

the remote sensing image target detection method based on the deep evolution pruning convolution network is the same as the method in the embodiment 1, and the step (3) of constructing the deep convolution feature extraction sub-network comprises the following specific steps:

(3a) constructing a depth separable convolution inverse residual connecting module: the module structure is that the characteristic diagram input layer of the previous stage → 1 × 1 convolution layer → depth separable convolution unit → point-by-point addition layer → characteristic diagram output layer of the current stage.

The 1 × 1 convolution layer in the inverse residual connecting module and the depth separable convolution unit appear in a group, and the point-by-point addition layer is a feature processing layer formed by adding the output feature map of the depth separable convolution unit of the previous layer and the feature map from the input layer of the inverse residual connecting module point-by-point.

The inverse residual connecting module means that the traditional residual connecting structure firstly reduces and then increases the dimension of the channel of the input feature map, and the inverse residual connecting module firstly increases and then decreases the dimension of the channel of the input feature map, wherein the 1 × 1 convolutional layer performs 2 times of dimension increase on the channel of the input feature map, and the depth separable convolution unit performs feature extraction and 2 times of channel dimension reduction on the channel of the input feature map, so that the number of the channels of the output feature map of the constructed depth separable convolution inverse residual connecting module is consistent with that of the input feature map.

(3b) Constructing a characteristic pyramid convolution module: the feature pyramid convolution module is a two-layer input and single-layer output structure, and the module structure is input feature map 1 → the first convolution layer of input feature map 1 → two times of upsampling layer → output feature map 1, input feature map 2 → the first convolution layer of input feature map 2 → output feature map 2, point-by-point addition layer → second convolution layer → current stage feature map output layer.

The input feature map 1 is a stage feature map with the same size as an input feature map and an output feature map in a depth separable convolution inverse residual connection module, the input feature map 2 is a feature map with the same spatial size as the output feature map 1 in the inverse residual connection module, the point-by-point addition layer is a feature processing layer formed by point-by-point addition of the output feature map 1 and the output feature map 2, and the double-up sampling layer amplifies the scale of the input feature map 1 processed by the first convolution layer through a bilinear interpolation algorithm.

(3c) Sequentially using 7 multiplied by 7 convolutional layers and maximum pooling layers, alternately connecting a depth separable convolutional inverse residual error connecting module with a characteristic pyramid convolutional module to construct a depth convolutional characteristic extraction sub-network, the specific structure is that the original image input layer → 7 × 7 convolution layer → the first largest pooling layer → the first depth separable convolution inverse residual connection module C1 → the second depth separable convolution inverse residual connection module C2 → the first feature pyramid convolution module P1 → the third depth separable convolution inverse residual connection module C3 → the second feature pyramid convolution module P2 → the fourth depth separable convolution inverse residual connection module C4 → the third feature pyramid convolution module P3 → the second largest pooling layer → the fourth feature pyramid convolution module P4 → the third largest pooling layer → the fifth feature pyramid convolution module P5 → the current stage feature map output layer.

The accuracy is ensured while the parameter quantity is reduced: the invention provides an inverse residual connecting module.A traditional residual connecting structure uses a1 multiplied by 1 convolutional layer to perform dimension reduction and dimension increase on a channel of an input feature map, but features are compressed after dimension reduction, part of useful feature information in the image is removed, and less target feature information causes the accuracy rate of model detection to be reduced. The invention designs an inverse residual connecting module, which is used for increasing the dimension by two times of the number of channels of an input feature map to obtain more image feature information, and then accessing a depth separable unit to extract features and reduce the dimension of the channels, so that the higher detection accuracy is maintained while the parameter number is reduced and the network operation speed is increased.

Example 3:

the remote sensing image target detection method based on the deep evolution pruning convolutional network is the same as that in embodiment 1-2, and the deep separable convolution unit in the step (3a) is shown in fig. 2, and the unit structure is that a feature map input layer in the last stage → 3 × 3 deep convolutional layer → first batch normalization layer → ReLU activation function layer → 1 × 1 point-by-point convolutional layer → second batch normalization layer → linear activation function layer → output feature map layer.

The depth separable convolution unit divides the standard convolution into depth convolution and point-by-point convolution to realize the space and channel separation and respective processing of the features, thereby greatly reducing the parameter quantity and the calculation complexity.

The ReLU activation function is not used after the activation function after the 1 x 1 point-by-point convolution layer, but a linear activation function is used, so that the ReLU activation function is prevented from causing large information loss to a tensor with a lower channel number, and characteristic information is prevented from being damaged.

Assume that the input feature map size is H_in×W_in×C_in，H_in、W_in、C_inThe width, height and channel number of the input feature map are respectively, a convolution kernel with the width and height of K multiplied by K is used, and the size of the output feature map is represented as H_out×W_out×C_out，H_out、W_out、C_outThe width, the height and the channel number of the output characteristic diagram are respectively, the space and channel information of the input characteristic diagram are considered simultaneously by standard convolution, and the calculation amount is as follows:

K×K×C_in×H_out×W_out×C_out

the depth separable convolution separates the channels of the input feature map from space using a depth convolution of 3 × 3 and a point-by-point convolution of 1 × 1, which are processed separately, and the amount of computation of the depth separable convolution is:

K×K×C_in×H_out×W_out+C_in×H_out×W_out×C_out

the depth separable convolution of the present invention is computationally less intensive than the standard convolution

For a convolution kernel of size 3 x 3, the computational effort is reduced by a factor of about 9.

Example 4:

the remote sensing image target detection method based on the deep evolution pruning convolutional network is the same as that of the embodiment 1-3, and the step (6a) of performing layer-by-layer DNA coding on the convolutional filter participating in pruning in the trained deep convolutional target detection network refers to the following steps:

(6a1) performing convolution operation on the output characteristic diagram: recording the l-th layer characteristic diagram of the trained deep convolution target detection network structure, wherein the height of the l-th layer characteristic diagram is H_lWidth of W_lThe number of channels is C_lIs output characteristic diagram of

And note that the characteristic subgraph of the kth channel in the characteristic graph of the l layer is

Then Z_l ^(k)By parameters of corresponding convolution filters

And a characteristic diagram Z of the front layer_l-1Obtained by performing a convolution operation (. + -.), f representing an activation function, Z_l ^(k)The calculation formula is as follows:

Z_l ^(k)＝f(Z_l-1*W_l ^(k))；

in a common deep learning framework such as TensorFlow and Caffe, the convolution operation of tensor is converted into matrix multiplication by transforming the input dimensionality and transposing a convolution filter, and the l-th layer characteristic diagram Z of the output characteristic diagram is processed by the convolution operation_lThe formula is as follows:

wherein

Is the l-1 level characteristic diagram of the output characteristic diagram after being processed by the convolution operation, W_lAnd representing a parameter matrix of a convolution filter corresponding to the l-th layer characteristic diagram after convolution operation processing.

(6a2) Mask-coding the convolution filter that needs pruning or reservation: detecting the l-th layer output characteristic diagram C of the network structure for the trained deep convolution target_lIntroducing masks to convolution filters requiring pruning or retention

Coding, coding as 0 indicates that the convolution filter is pruned, and coding as 1 indicates that the convolution filter is reserved; convolution operation formula with inner product as shown and global feature channel pruning

The changes are as follows:

(6a3) coding the convolution filter participating in pruning layer by layer: using a trained target detection network based on deep separable convolution to code the convolution filter participating in pruning layer by layer, wherein the convolution filter comprises coding DNA of all layers to be pruned_1,...l-1,lIs marked as

Wherein the DNA_1,...l-1,lDNA codes for 1 st to l th layers to be pruned,

representing the coding symbols in the l-1 level profile.

According to the remote sensing image target detection method based on the deep evolution pruning convolution network, a pruning algorithm is used for the trained deep convolution target detection network to remove a large number of redundant convolution filters existing in the target detection network, so that the overfitting risk of the network is reduced, the network structure is greatly simplified, the remote sensing image target detection method is easier to deploy in embedded equipment due to less parameter quantity, and meanwhile, the reasoning speed is remarkably accelerated.

Most previous approaches tend to prune filters sequentially in a fixed layer-by-layer manner, which cannot dynamically recover previously removed filters, may ignore complex associations between filters, and may result in poor flexibility, which may result in significant degradation of network evaluation performance. The invention jointly codes the filters to be pruned of all layers, has strong flexibility, fully utilizes the relevance among the filters, and improves the network performance while accelerating the network.

Example 5:

the remote sensing image target detection method based on the deep evolution pruning convolution net is the same as that of the remote sensing image target detection method in the embodiment 1-4, and the step (6b) of optimizing DNA by using the evolutionary algorithm_1,...l-1,lCoding to obtain final optimized DNA₁′_,...l-1,lThe specific method comprises the following steps:

(6b1) initialization: setting the evolution algebra counter T to be 0, setting the maximum evolution algebra T, and setting the pruning ratio_cut0.5, according to ratio_cutRandomly generated with DNA_1,...l-1,lEncoded M individuals as an initial population

Wherein

Encodes the DNA of the m-1 st individual in the 1 st to l-th filters.

(6b2) Adjusting network parameters using the training data set: for population P of the t-th round_tConvolution formula using training data set and global feature channel pruning after adding mask code

And retraining the generated network and adjusting network parameters.

(6b3) And (3) fitness calculation: convolution formula using verification dataset and global feature channel pruning with mask code added

Computing population P_tFitness of each individual

Wherein

To verify the loss of the data set.

(6b4) Generating a new individual: using fitness of individuals

Selecting individuals with higher fitness for crossover and mutation to generate new individuals, wherein the crossover operation is based on the crossover probability p_mCarrying out cross operation on the parent individuals randomly when the probability p of mutation is 0.9, and carrying out mutation operation according to the probability p of mutation_cRandomly mutating the parent individuals at 0.9, and performing the steps (6b1) to (6b4) to obtain the population P_tObtaining the next generation group P after selection, crossing and mutation operations_t+1。

(6b5) Judging whether to terminate the evolution: if T ═ T, the individual with the maximum fitness obtained in the evolution process is taken as the optimal solution output, the calculation is terminated, and the code thereof is recorded as DNA'_1,...l-1,lAnd (6) executing the step (6c) to construct a target detection network based on the deep evolution pruning convolution network. Otherwise, if T is less than T, returning to the step (6b2), repeating the steps (6b2) to (6b5) and continuing to perform evolution optimization of the codes.

The invention combines a target detection network with a global dynamic pruning method based on an evolutionary algorithm, and realizes network acceleration: redundant filters are removed by a pruning method, so that CNN acceleration is realized. Different from the comparison file 2, the method does not need to design a prior function in advance, can optimize the pruning process through a global dynamic evolutionary algorithm, and reduces the implementation difficulty of the algorithm. The invention jointly codes all layers of filters to be pruned, optimizes the network to be pruned through an evolutionary algorithm, uses the performance of the network in a test set as the fitness of the evolutionary algorithm, and completes iterative optimization of a network structure through a retraining mode, so that the model finally achieves an ideal acceleration effect and ensures the performance of the model.

A more complete and thorough example is given below to further describe the present invention.

Example 6:

the remote sensing image target detection method based on the deep evolution pruned roll network is the same as the embodiments 1 to 5, referring to FIG. 1,

step 1, processing and determining a training data set and a verification data set:

inputting a plurality of large-amplitude optical remote sensing images to be processed, wherein the large-amplitude optical remote sensing images comprise a plurality of targets, and marking the targets by using a marking tool; cutting the optical remote sensing image into image blocks with 512 x 512 pixels by taking each target as a center; and naming each cut image block according to a data set naming rule, forming a training data set and a verification data set by all named image blocks, wherein the training data set accounts for 70%, the verification data set accounts for 30%, and performing data enhancement operations such as image scale transformation, image translation, image rotation, image mirroring, image contrast and brightness adjustment, image noise addition and the like on the image blocks in the training data set at a time to form a final training data set.

Step 2, processing and determining a test data set:

inputting another plurality of large-scale optical remote sensing images containing various targets to be processed, and marking the targets by using a marking tool for the large-scale optical remote sensing images to be tested; in a mode of overlapping sliding windows, overlapping pixels are set as 100, and the picture is sequentially cut into image blocks of 512 x 512 pixels; and naming each cut image block according to a data set naming rule, and forming a test data set by all the named image blocks.

And 3, constructing a deep convolution feature extraction sub-network, which comprises the following specific steps:

The unit structure of the deep separable convolution unit is that a feature map input layer at the upper stage → 3 multiplied by 3 deep convolution layer → a first batch normalization layer → a ReLU activation function layer → 1 multiplied by 1 point-by-point convolution layer → a second batch normalization layer → a linear activation function layer → an output feature map layer; the ReLU activation function is no longer used after the activation function after the 1 × 1 point-by-point convolution layer, but is used linearly to prevent the ReLU activation function from corrupting the feature information.

Assume that the input feature map size is denoted as H_in×W_in×C_in，H_in、W_in、C_inThe width, height and channel number of the input feature map are respectively, a convolution kernel with the width and height of K multiplied by K is used, and the size of the output feature map is represented as H_out×W_out×C_out，H_out、W_out、C_outThe width, the height and the channel number of the output characteristic diagram are respectively, the space and channel information of the input characteristic diagram are considered simultaneously by standard convolution, and the calculation amount is as follows:

K×K×C_in×H_out×W_out×C_out

K×K×C_in×H_out×W_out+C_in×H_out×W_out×C_out

For a convolution kernel of size 3 x 3, the computational effort is reduced by a factor of about 9, so that a speed increase of 7 to 9 times can be achieved.

The inverse residual connecting module means that the conventional residual connecting structure firstly reduces and then increases the dimension of the channel of the input feature map, and the inverse residual connecting module firstly increases and then decreases the dimension of the channel of the input feature map, wherein the 1 × 1 convolutional layer performs 2 times of dimension increase on the channel of the input feature map, the 3 × 3 depth convolutional layer in the depth separable convolution unit performs feature extraction, and the 1 × 1 point-by-point convolutional layer in the depth separable convolution unit performs 2 times of dimension reduction on the channel of the input feature map, so that the number of channels of the output feature map of the constructed depth separable convolution inverse residual connecting module is consistent with that of the input feature map.

(3b) Constructing a characteristic pyramid convolution module: the module is a double-layer input single-layer output structure, and the module structure is as follows, input feature diagram 1 → the first convolution layer of input feature diagram 1 → two times of upsampling layer → output feature diagram 1, input feature diagram 2 → the first convolution layer of input feature diagram 2 → output feature diagram 2, point-by-point addition layer → second convolution layer → current stage feature diagram output layer.

The specific parameters of the characteristic pyramid convolution module are set as follows: the filter size of the first convolutional layer relative to the input signature fig. 1 is 1 × 1, the convolution step size is 1; the filter size of the first convolutional layer relative to the input signature of fig. 2 is 1 × 1, the convolution step is 1, the filter size of the second convolutional layer is 3 × 3, and the convolution step is 1.

The feature pyramid convolution module can effectively take depth separable inverse residual convolution as input and extract features stage by stage, and combines semantic features from higher layers by an up-sampling method, so that a network can effectively combine deep features and shallow features and overcome semantic gaps of feature maps in different stages, the deep features and the shallow features can be effectively and simultaneously applied to classification and regression, and the accuracy of detecting small targets such as small airplanes and ships in a sensed image is integrally improved.

And 4, constructing a full convolution FCN detection sub-network:

constructing a full convolution FCN classification subnet: the structure is that the characteristic diagram of each characteristic pyramid convolution module is respectively selected as a classified subnet input layer → a first 3 x 3 convolution layer → a second 3 x 3 convolution layer → a third 3 x 3 convolution layer → a fourth 3 x 3 convolution layer → a fifth 3 x 3 convolution layer → a classified subnet output layer.

The classified subnet input layer takes the feature map of each feature pyramid convolution module as the input of the classified subnet in turn, and carries out classification detection, wherein the sizes of the input feature maps are respectively 64 × 64, 32 × 32, 16 × 16, 8 × 8 and 4 × 4.

The parameters of the fully-convoluted FCN classification subnet are set as follows:

performing 3 × 3 convolution calculation on the first four layers, wherein the convolution step length of each layer is 1;

performing convolution calculation on the output of the fourth 3 × 3 convolutional layer to obtain classification features, wherein the convolution step is 1, the number of filters is 9 × 2, wherein "9" represents the number of default frames corresponding to each pixel point of the feature map output by the fourth 3 × 3 convolutional layer, and "2" represents the number of classification categories of the classification subnets.

Constructing a fully-convoluted FCN regression subnet: the structure is that the characteristic diagram of each characteristic pyramid convolution module is respectively selected as a regression subnet input layer → the first 3 x 3 convolution layer → the second 3 x 3 convolution layer → the third 3 x 3 convolution layer → the fourth 3 x 3 convolution layer → the fifth 3 x 3 convolution layer → the regression subnet output layer.

The regression subnet input layer takes the feature map of each feature pyramid convolution module as the input of the regression subnet in turn, and carries out regression detection, wherein the sizes of the input feature maps are respectively 64 × 64, 32 × 32, 16 × 16, 8 × 8 and 4 × 4.

The parameters of the fully-convolved FCN regression subnet are set as:

and performing convolution calculation on the first four layers, wherein the convolution step length of each layer is 1.

And performing convolution calculation on the output of the fourth 3 × 3 convolutional layer by one layer to obtain a default frame position offset, wherein the convolution step is 1, the number of the filters is 9 × 4, wherein "9" represents the number of the default frames corresponding to each pixel point of the feature map output by the fourth 3 × 3 convolutional layer, and "4" represents the position offsets of 4 coordinate values at the upper left corner and the lower right corner of the default frames.

The two fully-convoluted FCN subnetworks are independent of each other and do not share parameters with each other.

Step 5, constructing and training a deep convolution target detection network:

constructing a deep convolution target detection network: the method comprises the steps of sequentially building a deep convolution target detection network by using a deep convolution feature extraction sub-network, a full convolution FCN classification sub-network and a full convolution FCN regression sub-network, wherein the structure of the deep convolution target detection network is that an original image input layer → the deep convolution feature extraction sub-network → the full convolution FCN classification regression sub-network.

Training a deep convolution target detection network: and training the deep convolution target detection network by using the training data set and the verification data set as input to obtain the trained deep convolution target detection network, and storing a weight file of the trained deep convolution target detection network.

Step 6, constructing and training a target detection method network based on the deep evolution pruning convolution network:

(6a) performing layer-by-layer DNA coding on a convolutional filter participating in pruning in a trained deep convolutional target detection network, and recording the coding as DNA_1,...l-1,lThe specific method comprises the following steps:

Then Z_l ^(k)By parameters of corresponding convolution filters

Z_l ^(k)＝f(Z_l-1*W_l ^(k))；

in a common deep learning framework such as TensorFlow and Caffe, the convolution operation of tensor is converted into matrix multiplication by transforming the input dimensionality and transposing a convolution filter, and the l-th layer characteristic diagram Z of the output characteristic diagram is processed by the convolution operation_l ^*The formula of (a) is as follows:

wherein

Is the l-1 level characteristic diagram of the output characteristic diagram after being processed by the convolution operation, W_l ^*And representing a parameter matrix of the convolution filter corresponding to the l-th layer characteristic diagram after convolution operation processing.

The changes are as follows:

(6a3) coding the convolution filter participating in pruning layer by layer: using the trained deep convolution target detection network to code the convolution filter participating in pruning layer by layer, wherein the convolution filter contains the coding DNA of all the layers to be pruned_1,...l-1,lIs marked as

Wherein the DNA_1,...l-1,lDNA codes for 1 st to l th layers to be pruned,

indicating the coding symbols in layer l-1.

(6b) Optimizing DNA using evolutionary algorithms_1,...l-1,lCoding to obtain final optimized result coding DNA'_1,...l-1,lThe specific method comprises the following steps:

Wherein

Encodes the DNA of the m-1 st individual in the 1 st to l-th filters.

For generation ofThe network is retrained and network parameters are adjusted.

Computing population P_tFitness of each individual

Wherein

To verify the loss of the data set.

(6b4) Generating a new individual: using fitness of individuals

(6b5) Judging whether to terminate the evolution: if T is T, the individual with the maximum fitness obtained in the evolution process is taken as the optimal solution output, the calculation is stopped, and the code of the individual is marked as DNA₁′_,...l-1,lAnd (6) executing the step (6c) to construct a target detection network based on the deep evolution pruning convolution network. Otherwise, if T is less than T, returning to the step (6b2), repeating the steps (6b2) to (6b5) and continuing to perform evolution optimization of the codes.

(6c) Incorporation of optimized result-encoding DNA'_{1，...l-1，l}And constructing a target detection network based on a deep evolution pruning convolution network by using a pruning rule, wherein the pruning rule is that the code of the pruning rule is 0 to indicate that the convolution filter is finally pruned, the code of the pruning rule is 1 to indicate that the convolution filter is finally reserved, and a training data set is used for fine adjustment to obtainAnd storing the trained model weight file in the trained target detection network based on the deep evolution pruning convolution network, namely the trained model.

And 7, performing target detection on the test data set by using the trained model:

and sequentially inputting the data blocks in the test data set into a trained target detection network based on the deep evolution pruning convolutional network to obtain a candidate frame of each data block in the test data set, a classification confidence score corresponding to the candidate frame and a target category corresponding to the candidate frame.

Discarding all candidate frames of the target category with the classification confidence score lower than the threshold value 0.3, and performing non-maximum suppression processing on the remaining candidate frames, wherein the non-maximum suppression processing refers to: and sorting all the detection frames from high to low according to the classification confidence score, reserving the candidate frames with low overlapping degree and high score between the detection frames, discarding the candidate frames with high overlapping degree and low score between the detection frames, and repeating the steps until the detection frame with the lowest classification confidence of the current detection frame sequence is traversed, so that the detection result has higher accuracy and lower false alarm rate. The selection of the threshold value can be adjusted according to actual conditions.

And mapping the coordinates of all the reserved candidate frames, mapping the coordinates onto the optical remote sensing image before cutting, and performing secondary non-maximum suppression processing to obtain a final detection result image of the optical remote sensing image.

According to the method, the inverse residual connection structure is constructed by using the depth separable convolution, the model parameters and the calculated amount are greatly reduced while the higher detection accuracy is maintained, the global evolution pruning is carried out on the basis of the depth convolution target detection network, the network acceleration is realized, the overall detection speed of the model is greatly improved, and meanwhile, the detection precision of the optical remote sensing image airplane and ship is effectively improved.

The effect of the present invention is further explained by combining the simulation experiment as follows:

simulation conditions are as follows:

the simulation experiment of the invention is carried out under the hardware environment of Intel Xeon E5-2697v4 x 2 with main frequency of 2.4GHz, the hardware environment of an internal memory 64G and a GeForce GTX 1080 x 2 and the software environment of Darknet under a Linux system.

Simulation content and result analysis:

the simulation experiment of the invention is to respectively adopt the method of the invention and the RetinaNet method of the prior art and utilize the global dynamic pruning GDP to carry out target detection on the remote sensing image of the hong Kong international airport area in the Quickbird satellite.

The two indexes of accuracy and Average precision mAP (mean Average precision) are adopted to evaluate two optical remote sensing image target detection results of the invention and RetinaNet + GDP in the prior art respectively, and the accuracy and Average precision mAP of the optical remote sensing image target detection results of the invention and RetinaNet + GDP in the prior art are calculated respectively by the following formulas:

recall is total number of detected correct targets/total number of actual targets

Accuracy rate is total number of detected correct targets/total number of detected targets

And drawing an accuracy-recall rate curve, obtaining the detection precision AP of the target detection according to the area of the curve, and averaging the APs of multiple categories to obtain the average precision mAP.

The airplane detection precision, ship test precision and mAP indexes of the RetinaNet + GDP in the invention and the prior art are respectively listed in Table 1.

Table 1 summary of test results and testing accuracy of simulation experiment

Method of producing a composite material	RetinaNet+GDP	The method of the invention
			Aircraft detection accuracy	0.9236	0.9575
Detection precision of ship	0.6319	0.6508
			Average precision mAP	0.7778	0.8042

It can be seen from table 1 that the mAP of the RetinaNet + GDP in the prior art is 77.78%, the mAP of the method of the present invention is 80.42%, and the detection accuracy of the method of the present invention is higher when detecting the targets of the aircraft and the ship. The existing remote sensing image target detection technology falls into a bottleneck in the aspect of improving the detection precision, the detection precision of small targets in the remote sensing image, particularly targets such as airplanes and ships and warships, is difficult to effectively improve, and the method provided by the invention can be used for greatly improving the detection speed and also effectively improving the detection precision of small targets such as airplanes and ships in the remote sensing image.

Table 2 shows FPS of the present invention and the detecting speed of RetinaNet + GDP in the prior art per second

Table 2 simulation FPS results

Method of producing a composite material	RetinaNet+GDP	The method of the invention
			FPS (frame per second) detection frame number	23	35

In table 2, it can be seen that the detection speed of retinaNet + GDP in the prior art is 23FPS, the detection speed of the method of the present invention is 35FPS, and the detection speed of the method of the present invention is faster when detecting the targets of the aircraft and the ship. The existing remote sensing image target detection technology usually sacrifices the detection speed when detecting a target, but is very important for real-time and rapid detection of the remote sensing image in practical application, particularly in the military field.

In summary, the remote sensing image target detection method based on the deep evolution pruning convolution network disclosed by the invention mainly solves the problem that the detection speed and the target detection precision are not simultaneously globally and effectively optimized in the conventional remote sensing image target detection acceleration technology. The method comprises the following specific steps: constructing a training data set and a verification data set; constructing a test data set; constructing a deep convolution feature extraction sub-network; constructing a complete convolution FCN detection sub-network; constructing and training a deep convolution target detection network; constructing and training a target detection network based on a deep evolution pruning convolution network; carrying out target detection on the test data set by using the trained model; and outputting a test result. According to the method, the inverse residual error connection structure is constructed by using the depth separable convolution, so that the model parameters and the calculated quantity are greatly reduced while the higher detection accuracy is maintained; the target detection network is combined with a global dynamic pruning method based on an evolutionary algorithm to realize network acceleration. The method has the advantages of greatly reducing the calculation complexity and the model parameter, obviously improving the target detection speed of the optical remote sensing image and simultaneously having high target detection precision, and can be used for quickly and accurately detecting the ground object targets of the airplane and the ship in different areas of the remote sensing image.

Claims

1. A remote sensing image target detection method based on a deep evolution pruning convolution net is characterized by comprising the following steps:

(1) processing the training dataset and the validation dataset: selecting a plurality of optical remote sensing images containing various targets, cutting the images into image blocks with 512 x 512 pixels, wherein 70% of the image blocks of the optical remote sensing images form training data, 30% of the image blocks form a verification data set, and performing data enhancement on the training data set;

(2) processing the test data set: inputting another plurality of optical remote sensing images containing various targets, and cutting the images into image blocks with 512 x 512 pixels to form a test data set;

(3) constructing a deep convolution feature extraction sub-network: respectively constructing a depth separable convolution inverse residual error connection module and a characteristic pyramid convolution module, and alternately connecting the depth separable convolution inverse residual error connection module and the characteristic pyramid convolution module by using a 7 multiplied by 7 convolution layer and a maximum pooling layer in sequence to form a depth convolution characteristic extraction sub-network;

(4) constructing a fully-convoluted FCN detection subnetwork:

(5) constructing and training a deep convolution target detection network:

(5a) constructing a deep convolution target detection network: sequentially constructing a deep convolution target detection network by using a deep convolution feature extraction sub-network, a full convolution FCN classification sub-network and a full convolution FCN regression sub-network, wherein the structure of the deep convolution target detection network is that an original image input layer → the deep convolution feature extraction sub-network → the full convolution FCN classification regression sub-network;

(5b) training a deep convolution target detection network: training the deep convolution target detection network by using the training data set and the verification data set as input to obtain a trained deep convolution target detection network, and storing a weight file of the trained deep convolution target detection network;

(6c) Incorporation of optimized result-encoding DNA'_1,...l-1,lConstructing a target detection network based on the deep evolution pruning convolutional network by using a pruning rule, wherein the pruning rule is that the code of 0 represents that the convolutional filter is finally pruned, the code of 1 represents that the convolutional filter is finally reserved, and a training data set is used for fine adjustment to obtain a trained target detection network based on the deep evolution pruning convolutional network, namely a trained model, and storing a trained model weight file;

2. The method for detecting the target of the remote sensing image based on the deep evolution pruning convolution network as claimed in claim 1, wherein the step (3) of constructing the deep convolution feature extraction sub-network comprises the following specific steps:

(3a) constructing a depth separable convolution inverse residual connecting module: the module structure is that the characteristic diagram input layer of the previous stage → 1 multiplied by 1 convolution layer → depth separable convolution unit → point-by-point addition layer → characteristic diagram output layer of the current stage;

the 1 x 1 convolution layer and the depth separable convolution unit in the inverse residual connecting module are grouped, and the point-by-point addition layer is a feature processing layer formed by performing point-by-point addition on the output feature map of the depth separable convolution unit in the previous layer and the feature map from the input layer of the inverse residual connecting module;

the inverse residual error connecting module firstly increases the dimension of a channel of an input feature map and then decreases the dimension, wherein a1 multiplied by 1 convolution layer increases the dimension of the channel of the input feature map by 2 times, and a depth separable convolution unit extracts the feature of the channel of the input feature map and decreases the dimension of the channel by 2 times, so that the number of the channels of the output feature map of the constructed depth separable convolution inverse residual error connecting module is consistent with that of the channel of the input feature map;

(3b) constructing a characteristic pyramid convolution module: the module is a double-layer input single-layer output structure, and the module structure is that an input feature diagram 1 → a first convolution layer of the input feature diagram 1 → a double upsampling layer → an output feature diagram 1, an input feature diagram 2 → a first convolution layer of the input feature diagram 2 → an output feature diagram 2, point-by-point addition layers → a second convolution layer → a feature diagram output layer at the current stage;

the input feature map 1 is a stage feature map with the same size as an input feature map and an output feature map in a depth separable convolution inverse residual connection module, the input feature map 2 is a feature map with the same spatial size as the output feature map 1 in the inverse residual connection module, the point-by-point addition layer is a feature processing layer formed by point-by-point addition of the output feature map 1 and the output feature map 2, and the double-up sampling layer amplifies the scale of the input feature map 1 processed by the first convolution layer through a bilinear interpolation algorithm;

(3c) and (3) sequentially using the 7 x 7 convolution layer and the maximum pooling layer, and alternately connecting the depth separable convolution inverse residual error connecting module and the feature pyramid convolution module to construct a depth convolution feature extraction sub-network.

3. The method for remotely sensing image targets based on the deep evolution pruning convolution net according to the claim 2, characterized in that the deep separable convolution unit in the step (3a) has the unit structure of upper stage feature map input layer → 3 x 3 deep convolution layer → first batch normalization layer → ReLU activation function layer → 1 x 1 point-by-point convolution layer → second batch normalization layer → linear activation function layer → output feature map layer;

the depth separable convolution unit divides the standard convolution into depth convolution and point-by-point convolution so as to realize the separation and the respective processing of the space and the channel of the characteristics;

the ReLU activation function is no longer used after the activation function after the 1 x 1 point-by-point convolution layer, but instead a linear activation function is used.

4. The remote sensing image target detection method based on the deep evolution pruning convolutional network as claimed in claim 1, wherein the step (6a) of performing layer-by-layer DNA coding on the convolutional filter participating in pruning in the trained deep convolutional target detection network is that:

(6a1) performing convolution operation on the output characteristic diagram: recording the l-th layer characteristic diagram of the trained deep convolution target detection network structure, and the height H of the l-th layer characteristic diagram_lWidth W_lThe number of channels is C_lIs output characteristic diagram of

Then Z_l ^(k)By parameters of corresponding convolution filters

Z_l ^(k)＝f(Z_l-1*W_l ^(k))

wherein

Is the l-1 level characteristic diagram of the output characteristic diagram after being processed by the convolution operation, W_l ^*Representing a parameter matrix of a convolution filter corresponding to the l layer characteristic diagram after convolution operation processing;

(6a2) mask-coding the convolution filter that needs pruning or reservation: to training wellThe output characteristic diagram C of the l layer of the deep convolution target detection network structure_lIntroducing masks to convolution filters requiring pruning or retention

The changes are as follows:

Wherein the DNA_1,...l-1,lDNA codes for layers 1 to l,

representing the coding symbols in the l-1 level profile.

5. The method for detecting the target of the remote sensing image based on the deep evolution pruning convolution net is characterized in that the evolutionary algorithm is used for optimizing the DNA in the step (6b)_1,...l-1,lCoding to obtain final optimized result DNA'_1,...l-1,lThe specific method comprises the following steps:

Wherein

Encoding DNA of the m-1 st individual in the 1 st to the l-th filters;

Retraining the generated network and adjusting network parameters;

Computing population P_tFitness of each individual

Wherein

To verify loss of the data set;

(6b4) generating a new individual: using fitness of individuals

Selecting individuals with higher fitness for crossover and mutation to generate new individuals, wherein the crossover operation is based on the crossover probability p_mCarrying out cross operation on the parent individuals randomly when the probability p of mutation is 0.9, and carrying out mutation operation according to the probability p of mutation_cRandomly mutating the parent individuals at 0.9, and performing the steps (6b1) to (6b4) to obtain the population P_tObtaining the next generation group P after selection, crossing and mutation operations_t+1；

(6b5) Judging whether to terminate the evolution: if T ═ T, the individual with the maximum fitness obtained in the evolution process is taken as the optimal solution output, the calculation is terminated, and the code thereof is recorded as DNA'_1,...l-1,lExecuting the step (6c) to construct a target detection network based on the deep evolution pruning convolution network; otherwise, if T is less than T, returning to the step (6b2), repeating the steps (6b2) to (6b5) and continuing to perform evolution optimization of the codes.