CN114005045A

CN114005045A - Rotating frame remote sensing target detection method based on lightweight deep neural network

Info

Publication number: CN114005045A
Application number: CN202111283479.0A
Authority: CN
Inventors: 贾海鹏; 范达; 贺杨; 史也; 令狐鼎; 刘晓磊; 刘翠
Original assignee: China Academy of Space Technology CAST
Current assignee: China Academy of Space Technology CAST
Priority date: 2021-11-01
Filing date: 2021-11-01
Publication date: 2022-02-01

Abstract

The invention relates to a rotating frame remote sensing target detection method based on a lightweight deep neural network, which comprises the following steps: acquiring a remote sensing target detection data set, preprocessing the data set, and dividing the data set into a training set and a test set; building a single-stage target detection network model based on a rotating frame; performing iterative training on the rotating frame-based single-stage target detection network model by using the training set to make the rotating frame-based single-stage target detection network model converge; inputting the test set into the single-stage target detection network model based on the rotating frame, and verifying the performance; and carrying out structured pruning compression treatment on the single-stage target detection network model based on the rotating frame to reduce the weight of the single-stage target detection network model. The method can improve the detection and identification precision and speed of the slender and close-proximity targets of ships in the port area similar to the remote sensing image.

Description

Rotating frame remote sensing target detection method based on lightweight deep neural network

Technical Field

The invention relates to the technical field of deep learning and remote sensing image processing, in particular to a rotating frame remote sensing target detection method based on a lightweight deep neural network.

Background

With the rapid development of the remote sensing earth observation technology, a large number of optical remote sensing satellites with high resolution are emerged, and the application of the space remote sensing technology in a plurality of fields such as military, national economy, world environmental protection and the like is promoted. The rise of artificial intelligence brings a new paradigm for the processing of massive remote sensing data, the on-orbit processing capability of the remote sensing satellite data is improved by utilizing an artificial intelligence algorithm, and the instantaneized and intelligent space-based information service can be realized.

However, the accuracy of the artificial intelligence algorithm at the fine granularity level is insufficient, and the computing power of the satellite-borne computing platform is limited, so that the formation and development of the space intelligent perception capability are limited. Although the current deep learning technology is greatly improved compared with the traditional method, and has a good effect on detection tasks with large inter-class difference, such as identification of airplanes, ships and bridges, on the fine granularity level, such as detection and identification of ship-type granularity, the MAP value of the current deep learning technology does not cross an actual application threshold, and the distance from the index expected by a user is large. Therefore, first, it is necessary to develop a high-performance fine-grained smart detection recognition algorithm with on-track application capability. Secondly, the on-orbit resources (calculation, storage, energy consumption and the like) of the spacecraft are limited, the general processing capability of the spacecraft is 1-2 orders of magnitude lower than that of the ground, the deep neural network model is large in scale and often has the characteristics of intensive calculation and intensive access and storage, and the real-time requirements of tasks cannot be met by directly applying the conventional deep learning method to on-orbit information processing. Therefore, under the condition that deep learning faces the aerospace field, the problem becomes more and more prominent how to keep the performance of the existing neural network basically unchanged, and meanwhile, the network calculation amount is greatly reduced, the storage of the neural network model is greatly reduced, and the neural network model can efficiently run on resource-limited satellite-borne computing equipment.

Disclosure of Invention

The invention provides a rotating frame remote sensing target detection method based on a lightweight deep neural network, which aims to improve the detection and identification capability of a slender and close-proximity ship target in a harbor area of a remote sensing image.

In order to achieve the purpose, the technical scheme of the invention is as follows:

the invention provides a rotating frame remote sensing target detection method based on a lightweight deep neural network, which comprises the following steps: acquiring a remote sensing target detection data set, preprocessing the data set, and dividing the data set into a training set and a test set; building a single-stage target detection network model based on a rotating frame; performing iterative training on the single-stage target detection network model based on the rotating frame by using the training set to make the single-stage target detection network model converge; inputting the test set into the single-stage target detection network model based on the rotating frame, and verifying the performance of the network model; and carrying out structured pruning compression treatment on the network model to form a lightweight single-stage target detection network model based on a rotating frame.

According to one aspect of the invention, the pre-processing comprises:

carrying out data annotation on the images in the data set to obtain corresponding labels and annotation files;

carrying out overlapped cutting on the image, and enabling the cut image to correspond to the label;

converting the processed label format into a six-tuple format label;

converting the updated data set into a database file in an LMDB format;

target characterization is performed on the data set used for training.

According to one aspect of the invention, the process of data annotation comprises: four-point method is used for labeling four corner point coordinates (p1, p2, p3 and p4) of a minimum bounding rectangle frame of an image target to determine the position of the target, and labeling image category labels, wherein p1, p2, p3 and p4 respectively represent an upper left vertex, an upper right vertex, a lower right vertex and a lower left vertex of the bounding rectangle frame arranged in a clockwise direction.

According to one aspect of the invention, the process of overlapping cuts comprises: and cutting the sub-images of the image from the upper left corner in a sliding window mode, setting an overlapping area between the adjacent sub-images, and removing incomplete targets with the intersection ratio (IoU) of the residual targets and the original targets being smaller than a threshold value.

According to an aspect of the invention, the process of converting the six-element format label comprises: and encoding a rotating bounding box by using a six-tuple (cls, cx, cy, w, h, angle) for the image target, and determining the category information and the position information of the rotating bounding box in the original image, wherein cls represents the category of the target, cx and cy represent the coordinates of the central point of the real bounding box of the target in the original image, w and h respectively represent the width and the height of the real bounding box of the target, and angle represents the included angle between the width w corresponding to the positive direction of the real bounding box of the target and the positive direction of an x axis, and the range of the included angle is in a [0,2 pi ] interval.

According to one aspect of the invention, the process of target characteristic analysis comprises: and (3) analyzing the class distribution of the target through statistics, and knowing the size distribution of the target through clustering and averaging, wherein the size is the maximum width, the maximum height, the average width and the aspect ratio.

According to one aspect of the invention, the process of building the rotating frame-based single-stage target detection network model comprises the following steps:

extracting the image features after preprocessing by using an improved VGG16 neural network as a backbone network to obtain feature maps with different sizes to form a feature pyramid network model;

setting a rotary target preselection box (prior rotation box) prediction target type and regression position information by using a SSD detection algorithm based on a rotary box and taking a feature point on the feature map as a central point;

and (3) building a single-stage target detection network model based on a rotating frame by using a convolutional neural network Caffe deep learning framework.

According to an aspect of the invention, the process of iteratively training the network model using the training set comprises: and training the single-stage target detection network model based on the rotating frame by using a training set image and an annotation file obtained by the overlapped segmentation, and stopping iterative training when the total loss function of the single-stage target detection network model based on the rotating frame is converged.

According to one aspect of the invention, the process of verifying the performance of the rotating-box-based single-stage target detection network model comprises:

carrying out overlapped cutting on the images of the test set according to the input size to obtain sub-images, inputting the sub-images into a trained single-stage target detection network model based on a rotating frame to obtain target detection results with rotating information, and merging all the results;

and (4) suppressing and eliminating the redundant detection frames by utilizing the non-maximum value of the rotating frame to obtain the target detection result information of the original test image.

According to an aspect of the present invention, the process of performing structured pruning compression processing on the rotating-frame-based single-stage target detection network model to form a lightweight network model includes:

selecting a single-stage target detection network model with the best performance based on a rotating frame, and readjusting the number of channels and the number of layers of the network model to reduce the number of the channels and the number of the layers;

carrying out sensitivity analysis on parameters of the rotating frame-based single-stage target detection network model to measure the importance degree of neurons;

pruning the single-stage target detection network model based on the rotating frame and cutting unimportant neurons;

trimming (fine-tune) the trimmed single-stage target detection network model based on the rotating frame after pruning to enable the model to obtain lost performance again;

and (4) carrying out post-processing on the single-stage target detection network model based on the rotating frame obtained by fine-tune, and deleting the part with the parameter of 0 during compression.

Has the advantages that:

according to the scheme of the invention, the cross interference of adjacent targets and the cross interference between the targets and the background are reduced by using the rotating frame instead of the horizontal frame as the preset frame. The target characteristic analysis is carried out to obtain the target scale distribution of a given data set, so that the parameters of a single-stage target detection network model based on a rotating frame are designed in a self-adaptive mode, and the target detection precision is improved. By constructing a characteristic pyramid network model, classification and position regression tasks are simultaneously carried out on a plurality of different characteristic graphs, and the identification of targets with different scales is realized. In order to deploy the network model to hardware equipment, a Caffe framework is adopted to build a single-segment target detection network model based on a rotating frame, meanwhile, the trained network model is subjected to structured pruning compression, and the lightweight model is used for accelerating the reasoning speed of the network model.

Drawings

FIG. 1 is a flow chart of a rotating frame remote sensing target detection method based on a lightweight deep neural network according to an embodiment of the invention;

FIG. 2 is a schematic diagram of a data set labeling format of a rotating frame remote sensing target detection method based on a lightweight deep neural network according to an embodiment of the present invention;

FIG. 3 is a schematic diagram showing a data set target characteristic analysis result of a rotating frame remote sensing target detection method based on a lightweight deep neural network according to an embodiment of the present invention;

FIG. 4 is a schematic diagram showing a design of a rotating default box of a rotating box remote sensing target detection method based on a lightweight deep neural network according to an embodiment of the invention;

FIG. 5 is a schematic diagram showing an overall structure of a target detection network model of a rotating frame remote sensing target detection method based on a lightweight deep neural network according to an embodiment of the present invention;

fig. 6 schematically shows a structured pruning flow chart of a rotating frame remote sensing target detection method based on a lightweight deep neural network according to an embodiment of the present invention.

Detailed Description

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the embodiments will be briefly described below. It is obvious that the drawings in the following description are only some embodiments of the invention, and that for a person skilled in the art, other drawings can be derived from them without inventive effort.

The present invention is described in detail below with reference to the drawings and the specific embodiments, which are not repeated herein, but the embodiments of the present invention are not limited to the following embodiments.

Fig. 1 schematically shows each execution step and flow of a rotating frame remote sensing target detection method based on a lightweight deep neural network according to the present embodiment. As shown in fig. 1, the method for detecting a rotating frame remote sensing target based on a lightweight deep neural network of the present embodiment specifically includes the following steps:

firstly, a remote sensing ship data set is obtained and preprocessed, and the data set is divided into a training set and a testing set according to a certain proportion, such as 8: 2. The data set of the present embodiment includes 1780 remote sensing ship images. The pretreatment process specifically comprises the following steps: firstly, data annotation is carried out on an image of a remote sensing ship data set by using a four-point method, and a label and an annotation file corresponding to the image are obtained. Then, the data set contains 1780 remote sensing ship images and their corresponding labels (the annotation file contains). And then, carrying out overlapped cutting on the remote sensing ship image, corresponding the cut image with the label, and converting the label format expressed by the four-point method into a six-element group format label. Then, the sorted data set is converted into a Database file in a flash-Mapped Database (LMDB) format required in a Convolutional neural network (Convolutional Architecture for Fast Feature Embedding, task) training process. And finally, performing target characteristic analysis on the data set used for training to refine the parameter design of the network model.

Fig. 2 is a schematic diagram of a data set labeling format of the rotating frame remote sensing target detection method based on the lightweight deep neural network according to the embodiment. As shown in fig. 2(a), the process labeled by the four-point method is: the position of the target is marked using four corner coordinates (p1, p2, p3, p4) of the minimum bounding rectangle frame of the target, and the image class label is marked according to the kind of ship. The designations (p1, p2) indicate two vertexes in the direction in which the bow of the ship is pointed, p1 indicates the upper left point of the bow, p2 indicates the upper right point of the bow, arranged in the clockwise direction, p3 indicates the lower right point of the bow, and p4 indicates the lower left point of the bow.

The process of the overlap cutting comprises the following steps: cutting an original oversized remote sensing ship image into sub-images with proper sizes from the upper left corner in a sliding window mode, setting an overlapping area with a certain size between adjacent sub-images, removing incomplete targets with the Intersection over Union (IoU) of the remaining targets and the original targets being smaller than a certain threshold value, namely directly deleting marking information from the incomplete targets, directly deleting the sub-images without the targets, and simultaneously processing the sub-images corresponding to the marked files after cutting.

The cut and sorted annotation file is converted into a label in a six-tuple (cls, cx, cy, w, h, angle) format as shown in fig. 2(b), the rotating bounding box of the target is encoded by the six-tuple (cls, cx, cy, w, h, angle), and the category information and the position information of the target in the original image are determined. Wherein cls represents the target category, cx and cy represent the coordinates of the central point of the target real bounding box in the original image, w and h respectively represent the width and the height of the target real bounding box, and angle represents the included angle between the width w corresponding to the positive direction of the target real bounding box and the positive direction of the x axis, and the included angle range is in the interval of [0,2 pi ]. That is, as shown in FIG. 2(a), the width w is the distance between p1 and p2, the height h is the distance between p1 and p4, and angle is the vector

Angle to the x-axis. In particular toIn the third embodiment, the label format shown in fig. 2(a) is converted into the label format shown in fig. 2(b) by using the pythagorean theorem and the trigonometric function formula.

The data set and the label are converted into the LMDB format database file required by Caffe training. The LMDB format data set is a very fast and flash memory mapping type database, and time cost of disk IO (input/output) when a system accesses a large number of small files is greatly reduced, so that images and labels are converted into an LMDB database format.

Fig. 3 is a schematic diagram illustrating a data set target characteristic analysis result of the rotating frame remote sensing target detection method based on the lightweight deep neural network according to the embodiment. The target characteristic analysis process is to use the target category distribution in the statistical analysis data set to know the size distribution of the target by clustering, averaging and other methods, wherein the size includes information such as target angle distribution, target aspect ratio distribution, maximum width, maximum height, average width and the like. As shown in histogram (a) of fig. 3, the number of targets in the data set is very unbalanced, which makes training of the model difficult, and a classification loss function is designed using class information. The angle information of the target shown in the histogram (b) of fig. 3 is widely distributed in the whole angle range, the height-width ratio of the target shown in the histogram (c) of fig. 3 is mostly concentrated in a narrow interval, and the rotating target pre-selection box (prior rotation box) parameters of the SSD detection algorithm are designed by using the characteristic information, so that the distribution of the pre-selection box can cover the actual target box. The target characteristic analysis is carried out to obtain the target scale distribution of a given data set, so that the parameters of a single-stage target detection network model based on a rotating frame are designed in a self-adaptive mode, and the target detection precision is improved.

Secondly, a Caffe deep learning framework is used for building a single-stage target detection network model based on a rotating frame, and a loss function is designed for training the network model. The specific process comprises the following steps:

and the improved SSD-VGG-16 is adopted for target detection. Firstly, an improved VGG16 neural network is used as a backbone network to extract the image features after preprocessing, so as to obtain feature maps with different sizes and form a feature pyramid network model. Specifically, the improved VGG16 neural network is to change the full-join operation of FC6 and FC7 of the original VGG16 into a convolution operation, and the improved SSD is to change the original horizontal box operator into a rotating box operator, and specifically includes: extracting the characteristics of the rotating target frame, calculating the IOU of the rotating target frame, inhibiting the Non-Maximum value of the rotating target frame (NMS), and the like, forming a characteristic pyramid network model by utilizing a characteristic diagram obtained by a plurality of convolution layers, and simultaneously performing Softmax classification and position regression on the characteristic layers with different scales. Next, a rotation target preselection box is set with a feature point on the feature map as a center point using a rotation box-based SSD detection algorithm to predict a target category and regression position information. Specifically, the SSD generates a set of rotating target preselection frames with different aspect ratios, areas and angles in the different scale feature layers by using the prior rotation box with each feature point as a central point, as shown in fig. 4 (a). And judging whether a rotary target preselection box (prior rotation box) contains the target or not, and then obtaining the accurate position of the target through bounding box regression. Since the position information of the ship target has an angle component and the angle distribution of the target in the entire data set is wide, various angles are set to cover the target at each angle as much as possible when a rotary target preselection box (prior rotation box) is laid at each feature point, as shown in fig. 4(b), so that the rotary target preselection box (prior rotation box) can better cover a real box. Fig. 4 schematically shows a design diagram of a rotating default box of the rotating box remote sensing target detection method based on the lightweight deep neural network according to the embodiment. The final target prediction outputs a target candidate box with angle information and a category, and the target candidate box includes (cls, cx, cy, w, h, angle) these 6 parameters. The rotation candidate area prediction network model is formed by dividing a multitask loss function into a classification loss function and a position loss function, wherein the loss functions are as follows:

wherein the classification loss function is a typical softmax loss, as follows:

position loss function in traditional smooth_L1The angular component is added to loss as follows:

on one hand, the embodiment reduces the cross interference between adjacent targets and the cross interference between the targets and the background by using the rotating frame instead of the horizontal frame as the preset frame. On the other hand, classification and position regression tasks are simultaneously carried out on a plurality of different feature graphs by constructing a feature pyramid network model, so that the identification of targets with different scales is realized.

Fig. 5 is a schematic view showing an overall structure of a target detection network model of the rotating frame remote sensing target detection method based on the lightweight deep neural network according to the embodiment. As shown in fig. 5, the target detection network model of the present embodiment is designed in an end-to-end manner, which improves the speed of the target detection process.

And then inputting the generated training set into a single-stage target detection network model based on a rotating frame, and performing iterative training on the network model until the network converges. The specific process comprises the following steps: and training a built network model by using the training set image and the label file obtained by the overlapped segmentation, and stopping iterative training when the total loss function of the network is converged.

And then, inputting the test sample into the trained network model, and verifying the model performance according to the ship image classification with the rotation angle information and the target detection result. The specific process comprises the following steps: and performing overlapped cutting on the test set image according to the input size of the network model to obtain sub-images with a certain size, inputting each sub-image into the trained network model to obtain a target detection result with rotation information, summarizing all results, merging, and eliminating the detection result of a redundant target frame by using a rotating frame NMS method to finally obtain the target detection result information of the original test image.

Then, in the iterative training process, different training hyper-parameters influence the training result of the model, and meanwhile, as the training time is prolonged, the detection precision of the model also presents a low-high-low change process, so that the multiple training results need to be evaluated, and the model with the highest detection precision is selected as the final model. And finally, performing compression operation of structured pruning on the final model to form a lightweight model, reducing the storage space of the original model on the premise of ensuring the precision, and improving the running speed of the model, so that the lightweight model can be deployed in embedded equipment with limited hardware resources. Fig. 6 schematically shows a structured pruning flow chart of the rotating frame remote sensing target detection method based on the lightweight deep neural network according to the embodiment. As shown in fig. 6, the operation steps of the above-mentioned structured pruning are as follows:

based on the over-parameterized model obtained by training, pruning the large trained model according to a specific standard, namely readjusting the number of channels and the number of layers in the network model structure to obtain a simplified network structure;

carrying out sensitivity analysis on parameters of the network model to measure the importance degree of neurons in the neural network;

setting a pruning proportion according to network parameters, calculating a pruning threshold according to the pruning proportion, and pruning off neurons smaller than the threshold;

fine-tuning (fine-tune) the pruned model to make the model regain lost performance;

and (4) carrying out post-processing on the model obtained by fine-tune, and deleting the part with the parameter of 0 during compression. Fine tuning here refers to retraining the pruned model.

According to the embodiment, the trained and verified model is subjected to compression processing of structured pruning, so that the weight of the target detection deep neural network model is reduced, and the reasoning speed of the model is greatly accelerated. According to the scheme of the embodiment, aiming at the problems that the existing remote sensing image ship detection model cannot finely classify different ships and the hardware deployment is not friendly, the ship target in the remote sensing image can be classified with fine granularity while being detected, and a single-stage network model is built by using a caffe framework to be friendly to the hardware.

The above description is only one embodiment of the present invention, and is not intended to limit the present invention, and it is apparent to those skilled in the art that various modifications and variations can be made in the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A rotating frame remote sensing target detection method based on a lightweight deep neural network comprises the following steps:

acquiring a remote sensing target detection data set, preprocessing the data set, and dividing the data set into a training set and a test set;

building a single-stage target detection network model based on a rotating frame;

performing iterative training on the rotating frame-based single-stage target detection network model by using the training set to make the rotating frame-based single-stage target detection network model converge;

inputting the test set into the single-stage target detection network model based on the rotating frame, and verifying the performance of the single-stage target detection network model based on the rotating frame;

and carrying out structured pruning compression treatment on the rotating frame-based single-stage target detection network model to lighten the rotating frame-based single-stage target detection network model.

2. The remote sensing target detection method of claim 1, wherein the preprocessing comprises:

converting the format of the processed label into a six-element format label;

converting the label after format conversion and the marked image into a database file in a flash mapping type database format;

and carrying out target characteristic analysis on the training set.

3. The remote sensing target detection method of claim 2, wherein the data labeling process comprises: four-point method is used for labeling four corner point coordinates (p1, p2, p3 and p4) of a minimum bounding rectangle frame of an object of an image to determine the position of the object, and labeling image class labels, wherein p1, p2, p3 and p4 respectively represent an upper left vertex, an upper right vertex, a lower right vertex and a lower left vertex of the bounding rectangle frame arranged in a clockwise direction.

4. The remote sensing target detection method of claim 2, wherein the overlaid cutting process comprises: and cutting the sub-images of the image from the upper left corner in a sliding window mode, setting an overlapping area between the adjacent sub-images, and removing incomplete targets with the intersection ratio of the residual targets and the original target being less than a threshold value.

5. The remote sensing target detection method of claim 2, wherein the six-tuple format label conversion process comprises: and encoding a rotating bounding box by using a six-tuple (cls, cx, cy, w, h, angle) for a target of the image, and determining the category information and the position information of the rotating bounding box in the original image, wherein cls represents the category of the target, cx and cy represent the coordinates of the central point of the real bounding box of the target in the original image, w and h respectively represent the width and the height of the real bounding box of the target, and angle represents the included angle between the width w corresponding to the positive direction of the real bounding box of the target and the positive direction of an x axis, and the range of the included angle is in a [0,2 pi ] interval.

6. The remote sensing target detection method of claim 2, wherein the target characteristic analysis process comprises: and carrying out statistical analysis on the class distribution of the target, and obtaining the size distribution of the target through clustering and averaging, wherein the size is the maximum width, the maximum height, the average width and the aspect ratio.

7. The remote sensing target detection method according to claim 1, wherein the process of building a single-stage target detection network model based on a rotating frame comprises the following steps:

extracting the preprocessed image features by using a neural network as a backbone network to obtain feature graphs of different sizes to form a feature pyramid network model;

setting a rotary target preselection frame prediction target category and regression position information by using a SSD detection algorithm based on a rotary frame and taking a feature point on the feature map as a central point;

and (3) building a single-stage target detection network model based on a rotating frame by using a convolutional neural network deep learning framework.

8. The remote sensing target detection method of claim 1, wherein the process of iteratively training the network model using the training set comprises: and training the single-stage target detection network model based on the rotating frame by using the images and the labeled files in the training set obtained by the overlapped segmentation, and stopping iterative training when the total loss function of the single-stage target detection network model based on the rotating frame is converged.

9. The remote sensing target detection method of claim 1, wherein the process of verifying the performance of the rotating-box-based single-stage target detection network model comprises:

and eliminating redundant detection frames by using a rotating frame non-maximum value inhibition method to obtain target detection result information of the original test image.

10. The method for detecting the remote sensing target according to claim 1, wherein the step of performing structured pruning compression processing on the rotating frame-based single-stage target detection network model to reduce the weight of the rotating frame-based single-stage target detection network model comprises the following steps:

adjusting the number of channels and the number of layers of the single-stage target detection network model based on the rotating frame;

carrying out sensitivity analysis on parameters of the single-stage target detection network model based on the rotating frame;

pruning the rotating frame-based single-stage target detection network model;

fine-tuning the trimmed single-stage target detection network model based on the rotating frame;

and carrying out post-processing on the fine-tuned single-stage target detection network model based on the rotating frame.