CN115937672A

CN115937672A - Remote sensing rotating target detection method based on deep neural network

Info

Publication number: CN115937672A
Application number: CN202211468221.2A
Authority: CN
Inventors: 沈雨晨; 宋智豪; 业巧林
Original assignee: Nanjing Forestry University
Current assignee: Nanjing Forestry University
Priority date: 2022-11-22
Filing date: 2022-11-22
Publication date: 2023-04-07

Abstract

The invention relates to the field of image recognition, in particular to a remote sensing rotary target detection method based on a deep neural network, which aims to improve and solve the problem of neglecting global information in a classical network based on a characteristic fusion part of the deep neural network, improve the fusion effect of a characteristic diagram through connection fusion operation on the channel dimension of the characteristic diagram, enhance the information contained in the characteristic diagram and improve the precision of the network on a remote sensing image target detection task; according to the invention, through improving the network module, the universal remote sensing image public data set in the global scope is tested on the DOTA data set, compared with other methods, the method can achieve higher precision, the detection capability of small targets and large targets is improved, and meanwhile, the network parameters are not greatly increased.

Description

Remote sensing rotating target detection method based on deep neural network

Technical Field

The invention relates to the field of image recognition, in particular to a remote sensing rotary target detection method based on a deep neural network.

Background

Target detection based on remote sensing images is a major branch of the field of computer vision, which is also one of the most basic but challenging research topics. The method has a wide application prospect in target detection of remote sensing images, and accurate boundary frame identification plays an important role in many fields, such as forest disturbance dynamic monitoring, land resource management and urban environment evaluation.

In recent years, with the great development of deep convolutional neural networks, the target detection based on deep learning has been greatly developed, and various detection methods can be roughly divided into two types, wherein one type refers to, for example, dynamic R-CNN, CSL, PP-PicoDet, and the like, which explicitly optimize the training process to focus on high-quality samples by finely adjusting the label distribution standard (i.e., ioU threshold) and the loss function. Another category refers to, for example, DAL, aabo, ATSS, etc., which automatically adjust the configuration of the anchors by proposing new hyper-parameter optimization methods to customize more suitable anchors for a given data set. Many network researches can well complete simple remote sensing image target detection tasks, but most methods focus on feature extraction of a main network and processing of a classifier, most of the methods ignore importance of a feature fusion part, and the feature fusion part has great improvement space.

In a target detection task based on a remote sensing image, the following problems to be solved mostly exist: the problems that the detected targets are mostly small targets and are dense, the rotation angle of a target frame is arbitrary, when the length and the width of the target are large, no appropriate corresponding detection anchor exists, a large amount of instance-level noise exists in the remote sensing image and the like are solved, the precision of a detection network is directly or indirectly influenced, and the development of a target detection task of the remote sensing image is hindered.

For a feature map fusion part of a deep neural network, a classical network mostly focuses on fusion and extraction of local features, and ignores the role of global information in detection, so that loss of a small target in a feature map after multi-layer down-sampling can be caused, which is unfavorable for a remote sensing image detection task, especially for small target detection in a remote sensing image.

Disclosure of Invention

The invention aims to provide a remote sensing rotating target detection method based on a deep neural network, so as to solve the problems in the background technology.

In order to solve the technical problems, the invention provides the following technical scheme:

a remote sensing rotary target detection method based on a deep neural network is characterized by comprising the following steps:

s1, collecting a remote sensing image data set, acquiring data through a universal remote sensing image public data set in a global scope, and preprocessing the collected remote sensing image data set;

s2, reading the preprocessed image data, and performing online data enhancement on the data set by adopting a complex data enhancement method;

s3, inputting the extracted multilayer abstract features in the original image into an improved feature pyramid module for processing through a backbone extraction network;

s4, extracting the feature map of the processed image, and processing the feature map through a global information processing module;

s5, obtaining the result of the global information processing module according to the S4, and performing convolution processing on the feature map through a feature thinning module;

s6, transmitting the feature maps of all layers into a detector of a rotating target, wherein the detector of the rotating target is a deep learning module comprising a fully-connected network, the input of the detector of the rotating target is the feature maps, the output is the coordinates x and y of the central point of the detected target, the height h and w of the width of a detection frame and the rotation angle theta, obtaining a network prediction result through the feature maps, and constructing a network loss function according to the comparison result of the prediction result and a real sample label;

and S7, based on the minimization of the loss function, carrying out back propagation iteration through a driving quantity random gradient descent algorithm, and updating the weight of the training parameters in the network to realize the training of the deep neural network.

Further, the method for preprocessing the acquired remote sensing image data set in the S1 includes the following steps:

s1.1, acquiring a remote sensing image data set;

s1.2, cutting the original image into uniform size by adopting geometric transformation, wherein the geometric transformation comprises cutting, zooming, rotating and overturning;

according to the formula

The sizes of the cut images are 1024 multiplied by 1024, wherein S _hou (x, y) represents a pixel point corresponding to the position information of the cut image, S _qian Representing the original image, W _qian Width of image before cutting, H _qian Representing the height of the image before cutting;

and S1.3, taking the cut images as a set and recording the set as a training sample set.

Further, the method for performing online data enhancement on the read preprocessed image data and the data set by using a complex data enhancement method in S2 includes the following steps:

s2.1, obtaining a training sample set after cutting according to S1.3,

and S2.2, obtaining a training sample and carrying out rotation and turnover pretreatment operation on the sample.

Further, the method for inputting the extracted multilayer abstract features in the original image to the improved feature pyramid module for processing through the backbone extraction network in S3 includes the following steps:

s3.1, extracting multilayer abstract features from an original image by using a pre-trained residual error network as a backbone network, and constructing a feature pyramid module, wherein a feature pyramid is a network structure which is generally constructed in a mode of being narrow at the top and wide at the bottom, and the multilayer abstract features extracted from the backbone network become larger layer by layer and accord with the structural features of the feature pyramid;

and S3.2, inputting the extracted multi-layer abstract features into a feature pyramid module for processing, and performing feature fusion between two adjacent layers in the feature pyramid module.

Further, the method for extracting the feature map of the processed image in S4 and processing the feature map by the global information processing module includes the following steps:

s4.1, obtaining a characteristic pyramid module according to the S3.1, and processing the image characteristic graph through a global information processing module in the characteristic pyramid module;

s4.2, acquiring a convolution kernel D with the size of H multiplied by W and the number of channels of the original image, and taking a feature map with the size of 1024 multiplied by 1024 and the number of channels of C as input;

s4.3, using F epsilon as R ^H×W×C Denotes the convolution kernel, M ∈ R ^{1024×1024×C} Represents the input, O ∈ R ^R×T×D Representing an output characteristic diagram according to a formula

Obtaining a corresponding output characteristic mapping channel of the jth convolution kernel, wherein an x represents a two-dimensional convolution operator, M _：，：n Characteristic diagram representing an nth channel with dimensions 1024 × 1024>

Is shown as F ^(j) The jth channel size of (a) is 1024 × 1024 feature map.

S4.4, normalization processing O _：，：j To obtain

Where μ j represents the batch normalized channel mean, σ j represents the batch normalized channel standard deviation, γ j represents the scaling factor, and β j represents the offset;

s4.5, according to the formula

To obtain 1 xk and kx 1 convolution kernels and output convolution kernels obtained by fusing the results, wherein F' ^(j) Representing the global processing output result, b _j The offset is represented by the number of bits in the bit,

represents the output of a 1 xk convolution kernel>

Representing the k × 1 convolution kernel output.

The invention processes the feature map through the global information processing module, and uses the combined strip-shaped symmetrical convolution of 1 multiplied by k and multiplied by 1 to replace the traditional square convolution kernel of multiplied by k, thereby greatly reducing the parameters of the network, effectively increasing the receptive field, extracting more context details and providing data reference for the subsequent use of the feature refining module.

Further, in S5, a result of the global information processing module is obtained according to S4, and the method for performing convolution processing on the feature map by the feature refining module includes the following steps:

s5.1, obtaining a global processing output result F 'according to S4.5' ^(j) ；

S5.2, outputting a result F 'through global processing' ^(j) Inputting a corresponding feature map into a feature refining module;

s5.3, performing dimensionality reduction on the feature map by adopting a 1 × 1 convolution method;

s5.4, performing feature fusion processing by adopting a 3 x 3 convolution method;

s5.5, unifying the number of channels of the fused features through continuous 5-time convolution, and inputting the channels into a detector, wherein a detection head is a detection head.

According to the invention, the feature map is secondarily processed through the feature thinning module, the traditional feature superposition is changed into the combination on the channel on the basis of the feature pyramid, and in addition, the number of channels and the fusion features are reduced by using 1 × 1 convolution and 3 × 3, so that the number of parameters can be effectively reduced while more feature information is kept, and data reference is provided for the subsequent prediction object type and detection frame.

Further, the method for constructing the network loss function according to the comparison result between the prediction result and the real sample label in S6 includes the following steps:

s6.1, obtaining the feature map information processed by the feature refining module and transmitted to a detector of a rotating target in S5.5, outputting the coordinates x and y of the center point of the detecting target, the width and height h and w of a detecting frame and the rotating angle theta by the detector, and obtaining a network prediction result by processing;

s6.2, comparing the prediction result with the original image, and constructing a network loss function according to the comparison result

Where N represents the number of original images after the segmentation process, P _q，w The probability of predicting the qth sample as the w-th label is shown, Y is the label of the original image information recorded as the real sample, M is the number of label values, L _log (Y, P) is the total dataset loss function.

Further, the method for realizing the training of the deep neural network based on the minimization of the loss function in the S7 through back propagation iteration of the momentum stochastic gradient descent algorithm and updating the weight of the training parameter in the network comprises the following steps:

s7.1, obtaining a data set loss function L according to S6.2 _log (Y，P)；

S7.2, according to the formula Min = minL _log ) Y, P) to obtain a value corresponding to the loss function minimization, and based on the loss function minimization, using a random gradient descent method of the momentum as an optimizer to reversely propagate and iteratively update the weight of the training parameters in the network so as to realize the training of the deep neural network.

A remote sensing rotary target detection system based on a deep neural network is characterized by comprising the following modules:

the data preprocessing and data enhancing module: the data preprocessing and data enhancing module is used for performing data enhancing processing on the original image, expanding a data set and realizing the preprocessing of the data set;

a trunk feature extraction network module: the trunk feature extraction network module is used for extracting multilayer abstract features from an original image, inputting multilayer feature maps in the trunk feature extraction network into the improved feature pyramid module and processing the multilayer feature maps;

the global information processing module: the global information processing module adopts a large convolution kernel to process and enlarge the receptive field of the network and enhance the network performance, thereby obtaining the global information of the characteristic diagram;

a characteristic refining module: the feature refinement module adopts the dimension combination of adjacent feature layer channels, and enhances the fusion of feature maps and extracts more available information while reducing the number of channels and parameters through five continuous convolutions after combination;

loss function back propagation module: and transmitting the characteristic graphs of all layers into a rotating detector to obtain a prediction result of the network, comparing the prediction result with a real sample label to construct a network loss function, carrying out back propagation iteration through a driving quantity random gradient descent algorithm based on minimization of the loss function, and updating the weight of training parameters in the network to realize the training of the deep neural network.

The method tests the universal remote sensing image public data set in the global scope on the DOTA data set by improving the network module, can achieve higher precision compared with other methods, particularly solves the problems of missing detection and false detection in the traditional method, improves the detection capability of small targets and large targets, and does not greatly increase network parameters.

Drawings

FIG. 1 is a schematic flow chart of a remote sensing rotary target detection method based on a deep neural network according to the present invention;

FIG. 2 is a network overall structure diagram of the remote sensing rotary target detection method based on the deep neural network of the present invention;

FIG. 3 is a block diagram of the global information processing module of the remote sensing rotary target detection system based on the deep neural network;

FIG. 4 is a diagram of a characteristic refining module of a remote sensing rotary target detection system based on a deep neural network;

FIG. 5 is an overall network diagram of the remote sensing rotary target detection method based on the deep neural network;

FIG. 6 is an algorithm effect diagram of the remote sensing rotating target detection method based on the deep neural network.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Referring to fig. 1-5, in the embodiment of the present invention: a remote sensing rotating target detection method based on a deep neural network comprises the following steps:

s6, transmitting the characteristic graphs of all layers into a detector of a rotating target to obtain a network prediction result, and constructing a network loss function according to the prediction result and a real sample label comparison result; (ii) a

The method for preprocessing the acquired remote sensing image data set in the S1 comprises the following steps:

s1.1, acquiring a remote sensing image data set;

according to the formula

The sizes of the cut images are 1024 multiplied by 1024, wherein S _hou (x, y) represents a pixel point corresponding to the position information of the cut image, S _qian Representing an original image, W _qian Width of image before cutting, H _qian Representing the height of the image before cutting;

The method for performing online data enhancement on the read preprocessed image data and the data set by adopting a complex data enhancement method in the S2 comprises the following steps:

s2.1, obtaining the cut training sample set according to S1.3,

and S2.2, obtaining a training sample and carrying out rotation and turnover pretreatment on the sample.

In the step S3, the method for inputting the extracted multilayer abstract features in the original image to the improved feature pyramid module for processing through the backbone extraction network includes the following steps:

s3.1, extracting a plurality of layers of abstract features from the original image by using a pre-trained residual error network as a backbone network, and constructing a feature pyramid module;

and S3.2, inputting the extracted multilayer abstract features into a feature pyramid module for processing, and performing feature fusion between two adjacent layers in the feature pyramid module.

The method for extracting the feature map of the processed image in the S4 and processing the feature map through the global information processing module comprises the following steps of:

s4.3, using F ∈ R ^H×W×C Denotes the convolution kernel, M ∈ R ^{1024×1024×C} Represents the input, O ∈ R ^R×T×D Representing an output characteristic diagram according to a formula

Obtaining a corresponding output characteristic mapping channel of the jth convolution kernel, wherein the star represents a two-dimensional convolution operator, M _：，：n Characteristic diagram representing an nth channel with dimensions 1024 × 1024>

Is represented by F ^(j) The jth channel size of (a) is a 1024 x 1024 signature.

S4.4, normalization processing O _：，：j To obtain

Wherein μ j represents the batch normalized channel mean, σ j represents the batch normalized channel standard deviation, γ j represents the scaling factor, and β j represents the offset;

s4.5 according to the formula

To obtain 1 xk and kx 1 convolution kernels and output convolution kernels obtained by fusing the results, wherein F' ^(j) Representing the global processing output result, b _j Indicates a bias, <' > or>

Represents the output of a 1 xk convolution kernel>

Representing the k × 1 convolution kernel output.

In the step S5, the result of the global information processing module is obtained according to the step S4, and the method for performing convolution processing on the feature map by the feature refining module includes the following steps:

S5.2, outputting the global processing output result F' ^(j) Inputting a corresponding feature map into a feature refining module;

s5.3, performing dimension reduction processing on the feature map by adopting a 1 × 1 convolution method;

and S5.5, unifying the number of channels of the fused features through continuous 5-time convolution, and inputting the unified number of channels into a detector.

The method for constructing the network loss function according to the comparison result of the prediction result and the real sample label in the S6 comprises the following steps:

s6.1, obtaining the characteristic graph information processed by the characteristic thinning module and transmitted to a detector of a rotating target to be processed to obtain a network prediction result by the S5.5;

Based on the loss function minimization, the method for realizing the training of the deep neural network by back propagation iteration of the momentum stochastic gradient descent algorithm and updating the weight of the training parameters in the network in the S7 comprises the following steps:

s7.1, obtaining a data set loss function L according to S6.2 _log (Y，P)；

S7.2, according to the formula Min = minL _log And (Y, P) obtaining a value corresponding to the loss function minimization, based on the loss function minimization, using a random gradient descent method with momentum as an optimizer, and performing back propagation to iteratively update the weight of the training parameters in the network so as to realize the training of the deep neural network.

A remote sensing rotary target detection system (as shown in fig. 2) based on a deep neural network, characterized in that the system comprises the following modules:

the data preprocessing and data enhancing module: the data preprocessing and data enhancing module is used for performing data enhancement processing on the original image, and the data enhancement is to expand a limited data set by some methods, increase the number and diversity of training sets and improve the generalization capability of the model;

global information processing module (as shown in fig. 3): the global information processing module adopts a large convolution kernel for processing, the large convolution kernel can enlarge the receptive field of the network and enhance the network performance, but the large convolution kernel can cause the network parameters to be greatly increased, so that the invention uses the combined strip convolution of 1 xk and kx 1 to replace the traditional kxk square convolution kernel, greatly reduces the network parameters and simultaneously obtains the global information of the characteristic diagram;

feature refinement module (as shown in fig. 4): the feature refinement module adopts dimension combination of adjacent feature layer channels, can retain more useful information and is richer in interpretability compared with the traditional direct addition combination. After merging, through five continuous convolutions, the number of channels is reduced, the number of parameters is reduced, meanwhile, the fusion of the characteristic graphs is enhanced, and more available information is extracted;

loss function back propagation module: the feature maps of the layers are transmitted to a rotating detector to obtain a prediction result of the network, the prediction result is compared with a real sample label to construct a network loss function, the network loss function is minimized, back propagation iteration is performed through a driving quantity random gradient descent algorithm, and the weight of training parameters in the network is updated to achieve training of the deep neural network, and the deep neural network is shown as an integral network map in fig. 5.

In this embodiment, the remote sensing rotating target detection method based on the deep neural network performs prediction by performing preprocessing on an acquired original image, using a pre-trained residual error network as a feature extraction network, performing global information processing and feature refinement processing on an extracted feature map, performing fusion feature by 3 × 3, unifying the number of channels by convolution on the fused feature, inputting the unified number of channels into a detection head network, and predicting the type and detection frame of a target, as shown in fig. 6, which is a specific prediction effect map.

The network improved by the network module provided by the invention is tested on a universal remote sensing image public data set-DOTA data set in the global scope, and compared with other methods, the network improved by the network module can achieve higher precision. Particularly, the problems of missed detection and false detection in the traditional method are solved well, the detection capability of small targets and large targets is improved, and network parameters cannot be increased greatly. The average precision of the module provided by the invention is improved to 79.37 percent, compared with the average precision of 73.09 percent of the classical method without the method provided by the invention, the average precision is improved by 6.28 percent, and compared with other classical methods, the average precision is also improved.

It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned.

It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.

Finally, it should be noted that: although the present invention has been described in detail with reference to the foregoing embodiments, it will be apparent to those skilled in the art that changes may be made in the embodiments and/or equivalents thereof without departing from the spirit and scope of the invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A remote sensing rotary target detection method based on a deep neural network is characterized by comprising the following steps:

s5, obtaining the result of the global information processing module according to the S4, and performing convolution processing on the feature map through the feature thinning module;

s6, transmitting the characteristic graphs of all layers into a detector of a rotating target to obtain a network prediction result, and constructing a network loss function according to the prediction result and a real sample label comparison result;

2. The method for detecting the remote sensing rotating target based on the deep neural network as claimed in claim 1, wherein the method for preprocessing the acquired remote sensing image data set in the step S1 comprises the following steps:

s1.1, acquiring a remote sensing image data set;

according to the formula

3. The remote sensing rotating target detection method based on the deep neural network as claimed in claim 2, wherein the method for performing online data enhancement on the data set by reading the preprocessed image data and adopting a complex data enhancement method in the S2 comprises the following steps:

s2.1, obtaining the cut training sample set according to S1.3,

4. The method for detecting the remote sensing rotating target based on the deep neural network as claimed in claim 3, wherein in the step S3, the method for inputting the extracted multilayer abstract features in the original image into the improved feature pyramid module for processing through the trunk extraction network comprises the following steps:

5. The method for detecting the remote sensing rotating target based on the deep neural network as claimed in claim 4, wherein the method for extracting the feature map of the processed image in the S4 and processing the feature map through the global information processing module comprises the following steps:

s4.3, using F epsilon as R ^H×W×C Denotes the convolution kernel, M ∈ R ^{1024×1024×C} Represents the input, O ∈ R ^R×T×D Representing output characteristic diagram according to formula

S4.4, normalization processing O _：，：j To obtain

s4.5, according to the formula

Obtaining convolution kernels of which the output results of the 1 xk convolution kernels and the kx 1 convolution kernels are fused, wherein F' ^(j) Representing the global processing output result, b _j Indicates a bias, <' > or>

Represents the output of a 1 xk convolution kernel>

Representing the k × 1 convolution kernel output.

6. The method for detecting the remote sensing rotating target based on the deep neural network as claimed in claim 5, wherein the result of the global information processing module is obtained in the step S5 according to the step S4, and the method for performing convolution processing on the feature map through the feature refining module comprises the following steps:

S5.2, outputting a result F 'through global processing' ^(j) Corresponding characteristic diagram input characteristic refining module；

7. The method for detecting the remote sensing rotating target based on the deep neural network as claimed in claim 6, wherein the method for constructing the network loss function according to the comparison result between the prediction result and the real sample label in the step S6 comprises the following steps:

s6.1, obtaining the characteristic graph information processed by the characteristic thinning module and transmitted to a rotating detector in S5.5, and processing to obtain a network prediction result;

8. The remote sensing rotating target detection method based on the deep neural network of claim 7, wherein the method for realizing the training of the deep neural network based on the minimization of the loss function in S7 through back propagation iteration of a driving amount stochastic gradient descent algorithm and updating the weight of the training parameters in the network comprises the following steps:

s7.1, obtaining a data set loss function L according to S6.2 _log (Y，P)；

S7.2, according to a formula Min = minL _log And (Y, P) obtaining a value corresponding to the loss function minimization, based on the loss function minimization, using a random gradient descent method of the momentum as an optimizer, and carrying out back propagation to iteratively update the weight of the training parameters in the network so as to realize the training of the deep neural network.

9. A remote sensing rotary target detection system based on a deep neural network is characterized by comprising the following modules:

the data preprocessing and data enhancing module: the data preprocessing and data enhancing module is used for performing data enhancement processing on the original image, expanding a data set and realizing the preprocessing of the data set;

a global information processing module: the global information processing module adopts a large convolution kernel to process and enlarge the receptive field of the network and enhance the network performance, thereby obtaining the global information of the characteristic diagram;

a feature refining module: the feature refinement module adopts the dimension combination of adjacent feature layer channels, and enhances the fusion of feature maps and extracts more available information while reducing the number of channels and parameters through five continuous convolutions after combination;