CN111797697B

CN111797697B - Angle high-resolution remote sensing image target detection method based on improved CenterNet

Info

Publication number: CN111797697B
Application number: CN202010521896.3A
Authority: CN
Inventors: 王鑫; 戴慧凤; 石爱业
Original assignee: Hohai University HHU
Current assignee: Hohai University HHU
Priority date: 2020-06-10
Filing date: 2020-06-10
Publication date: 2022-08-05
Anticipated expiration: 2040-06-10
Also published as: CN111797697A

Abstract

The invention discloses an angle high-resolution remote sensing image target detection method based on improved CenterNet, which comprises the steps of adopting HRNet as a main network under a CenterNet frame to obtain the improved CenterNet frame, giving a plurality of remote sensing target images as training samples, inputting the improved CenterNet frame to train to obtain a remote sensing image target detection frame, cutting the remote sensing image to be detected into a plurality of unit images with the same size, respectively inputting each unit image into the remote sensing image target detection frame to perform target detection, determining the target detection frame of each unit image, performing edge splicing on each unit image according to the target detection frame of each unit image, determining the detection target of the remote sensing image to be detected, and improving the precision of detecting the corresponding target in the remote sensing image to be detected.

Description

Angle high-resolution remote sensing image target detection method based on improved CenterNet

Technical Field

The invention relates to the technical field of digital image processing, in particular to an angular high-resolution remote sensing image target detection method based on improved CenterNet.

Background

The remote sensing technology is one of important marks for measuring the scientific and technological level and the comprehensive national strength of a country, and is widely applied to various fields of military and civil use. The essence of the remote sensing technology is that more effective information needs to be extracted from complicated remote sensing images. The high-resolution remote sensing image is an important analysis object of the remote sensing technology. Today, the remote sensing technology is rapidly developed, and higher requirements are put forward on the precision and speed of target detection of remote sensing images. The on-satellite self-detection mode can directly acquire the remote sensing image and process the image, greatly improves the processing speed of the remote sensing image, but has higher space requirement on the computing unit. Therefore, a remote sensing target detection algorithm which can realize quick detection, has a small operation space and is not poor in precision is needed.

The method for detecting the remote sensing image target of the publication number CN110490069A comprises the steps of firstly utilizing an additive operator splitting algorithm to the remote sensing image, and constructing a nonlinear scale space by stably converging the remote sensing image on any step length. And then, screening the image in each scale space by adopting the response value of the Hessian matrix to detect the characteristic points. And then dividing the neighborhood of the characteristic point, performing down-sampling by taking the current scale parameter as a sampling step length, and calculating the gray average value and the first-order gradient in the horizontal and vertical directions of each grid sampling point. Carrying out binary comparison on the calculation results of each grid to obtain a feature descriptor; and finally, performing feature matching detection by using the Hamming distance as similarity measurement. The method has good effect on the real-time detection of the remote sensing image, but the detection precision is slightly poor, and the mobility is general.

The method for detecting the remote sensing image target of the publication number CN110084093A comprises the steps of firstly adopting multilayer output of a convolutional neural network, extracting high-level characteristics of remote sensing images in training data sets, adopting a four-point marking method to mark any quadrangle, generating multiple candidate frames with multiple areas and multiple aspect ratios on the high-level characteristics, and screening the candidate frames; then, performing feature fusion on candidate regions screened by different layers of the convolutional neural network, obtaining classification errors and positioning errors according to fusion results, and performing deep learning training on the screened candidate regions by adopting an optimization function to obtain a training optimization model; and finally, carrying out target discrimination and positioning on the remote sensing image to be detected through the training optimization model. The method can realize the detection and detection of small object targets, high aspect ratio targets and multi-class targets in the remote sensing image, but has higher requirement on space and is not suitable for on-satellite self-detection.

There is also a document that proposes an aircraft target detection algorithm based on a combination of saliency maps and a deep confidence network. Firstly, extracting a significant target in an image by adopting a histogram-based contrast method; secondly, positioning a candidate target through a positioning connected region; and then extracting color moments, Hu invariant moments, Tamura texture features and edge direction histograms of the candidate targets. And finally, applying the standardized features to the deep confidence network to detect the target. The method can be used for accurately detecting the airplane well, but the extraction process is complicated, and the process is not intelligent enough for processing a large-scale data set.

In summary, the existing high-resolution remote sensing image target detection method has many limitations mainly expressed in that: (1) the detection precision is poor, and the process is not intelligent enough for processing large-scale data sets; (2) the detection process is too complicated and is not suitable for on-satellite self-detection. Therefore, the traditional remote sensing image target detection scheme has limitations and easily causes the problem of low detection precision.

Disclosure of Invention

Aiming at the problems, the invention provides an angular high-resolution remote sensing image target detection method based on an improved CenterNet.

In order to realize the purpose of the invention, the invention provides an angular high-resolution remote sensing image target detection method based on improved CenterNet, which comprises the following steps:

s10, adopting HRNet as a backbone network under the CenterNet framework to obtain an improved CenterNet framework;

s20, giving a plurality of remote sensing target images as training samples, inputting the training samples into an improved CenterNet frame, and training to obtain a remote sensing image target detection frame;

s30, cutting the remote sensing image to be detected into a plurality of unit images with the same size, respectively inputting each unit image into a remote sensing image target detection frame for target detection, and determining a target detection frame of each unit image;

and S40, performing edge splicing on each unit image according to the target detection frame of each unit image to determine the detection target of the remote sensing image to be detected.

In one embodiment, the unit images are respectively input into a remote sensing image target detection frame for target detection, and determining the target detection frame of the unit image comprises:

respectively inputting the unit images into a remote sensing image target detection frame, calculating a feature graph of the unit images by using HRNet, performing convolution operation on the feature graph, calculating a thermodynamic diagram of the images, obtaining a central point of the unit images by searching a peak value of the thermodynamic diagram, obtaining a target central point, a target size and a target orientation angle of the unit images by regression calculation, and determining the target detection frame according to the target central point, the target size and the target orientation angle.

Specifically, the convolution operation is performed on the feature map, the thermodynamic diagram of the image is calculated, the central point of the unit image is obtained by searching the peak value of the thermodynamic diagram, the target central point, the target size and the target orientation angle of the unit image are obtained through regression calculation, and the target detection frame is determined according to the target central point, the target size and the target orientation angle, and the method comprises the following steps:

performing convolution operation on the feature map to calculate thermodynamic diagram of the image

Wherein R represents the size scaling, C is the type number of key points, W and H are the width and height of the image, the peak point of the thermodynamic diagram is the central point, and the peak point position of each characteristic diagram predicts the width and height information of the target;

training key points of a target through thermodynamic diagrams to determine central points of unit images;

and obtaining a target central point, a target size and a target orientation angle of the unit image through regression calculation, and determining a target detection frame according to the target central point, the target size and the target orientation angle.

Specifically, assume that the target key point position on the target true value graph of the unit image is set as p ∈ R ² Continuously down-sampling the original input image through the HRNet network to obtain a corresponding feature map, wherein the corresponding key points on the feature map are

Wherein R represents a size scaling;

by the Gaussian kernel formula

Dispersing GT point on a truth diagram into a thermodynamic diagram

Where σ is _p Is the adaptive standard deviation of the target size, if more than two Gaussian functions in the target size are overlapped, the largest one is selected,

is composed of

The horizontal and vertical coordinates of (1);

the training objective function of the key points of the target thermodynamic diagram is set as follows:

wherein alpha and beta are hyper-parameters, which are usually set to 2 and 4 in a specific experiment, N is the number of key points in a pixel point,

the main purpose of (1) is to perform normalization;

adding local offsets to the prediction of the center point

All classes c use the same offset prediction, this offset is trained with a loss of L1:

it follows that the local offset will only be at the critical point location

Making a prediction;

when searching for target pixel points, key points are used

Get all centers, assume

For a target block diagram of class k, then the center position is

The regression of the target block diagram for each target is equal to

For fast target block diagram acquisition, a single size prediction is used for calculation, and a loss function of L1 is added at the center position:

when the angle orientation of the target is searched, key points are adopted

To perform regression, suppose

For a class k object block diagram, then the object is oriented

Add the loss function of L1 at the center position:

the network target loss function in the whole target detection process consists of four parts:

L _det ＝L _k +λ _size L _size +λ _angle L _angle +λ _off L _off

in the experiment, lambda _size ＝0.1，λ _angle ＝0.1，λ _off 1 the whole network predicts that c +5 values will be output at each position, i.e. the key pointClass c, target center (x, y), size (w, h), target angle, all of which share the same back bone of the deep convolutional neural network.

The peak point for each category on the thermodynamic diagram is extracted to determine the target center point, target size, and target orientation angle.

Specifically, extracting the peak point of each category on the thermodynamic diagram includes:

and comparing all the response points on the thermodynamic diagram with the eight adjacent points, if the value of the corresponding point is greater than or equal to the maximum value of the eight adjacent points, keeping the corresponding point, and otherwise, eliminating the corresponding point, and keeping all the top 100 peak points meeting the requirement of the corresponding point to obtain the peak point of each category on the thermodynamic diagram.

The method for detecting the target of the angle high-resolution remote sensing image based on the improved CenterNet comprises the steps of obtaining the improved CenterNet frame by adopting HRNet as a main network under the CenterNet frame, giving a plurality of remote sensing target images as training samples, inputting the improved CenterNet frame to train to obtain the target detection frame of the remote sensing image, cutting the remote sensing image to be detected into a plurality of unit images with the same size, inputting each unit image into the target detection frame of the remote sensing image respectively to perform target detection, determining the target detection frame of each unit image, performing edge splicing on each unit image according to the target detection frame of each unit image, determining the detection target of the remote sensing image to be detected, and improving the precision of detecting the corresponding target in the remote sensing image to be detected.

Drawings

FIG. 1 is a flow diagram of an embodiment of an angular high-resolution remote sensing image target detection method based on improved CenterNet;

FIG. 2 is a schematic diagram of an image cropping and stitching process according to one embodiment;

FIG. 3 is a schematic diagram of an overall algorithm framework of an angular high-resolution remote sensing image target detection method based on the improved CenterNet according to an embodiment;

FIG. 4 is a schematic diagram of an experimental result of a confusion matrix for remote sensing image target identification according to an embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.

Today, the remote sensing technology is rapidly developed, and higher requirements are put forward on the accuracy and speed of remote sensing image target identification. The on-satellite self-recognition mode can directly acquire the remote sensing image and process the image, greatly improves the processing speed of the remote sensing image, but has higher space requirement on a calculation unit. Therefore, a remote sensing target identification algorithm which can quickly identify, has a small operation space and is not poor in precision is needed. The invention provides an angular high-resolution remote sensing image target detection method based on an improved CenterNet, which is characterized in that angle identification is carried out on the basis of the CenterNet, and a backsbone is replaced, and experiments show that on the basis of comprehensively considering space efficiency and experiment precision, in one embodiment, referring to fig. 1, fig. 1 is a flow chart of the angular high-resolution remote sensing image target detection method based on the improved CenterNet, and the method comprises the following steps:

s10, adopting HRNet as a Backbone network (Backbone) under the CenterNet framework to obtain the improved CenterNet framework.

And S20, giving a plurality of remote sensing target images as training samples, and inputting the training samples into an improved CenterNet frame to train to obtain a remote sensing image target detection frame.

The steps give a plurality of remote sensing target images as training samples, and the training samples are input into an improved CenterNet framework to calculate a target Feature map (Feature map). Compared with the traditional CenterNet target detection framework which usually adopts ResNet, DLA, Hourglass and the like as a Backbone network (Backbone), the improved CenterNet target detection framework adopts lightweight HRNet as the Backbone network to calculate the feature map of the remote sensing target image, and compared with other Backbone networks, the HRNet has the advantages of parallel high-resolution network architecture and repeated multi-scale fusion mode, and can calculate the target feature map more quickly and accurately.

Specifically, a plurality of remote sensing target images can be given as training samples and input into an improved CenterNet framework; and secondly, calculating a characteristic map of each remote sensing target image by using the HRNet. HRNet adopts repeated multi-scale fusion under the same scale and the same resolution. The feature maps of the three scales can be respectively changed into three feature maps of different scales through sampling and upsampling, then the three feature maps are respectively fused on different scales, and finally three representations of fusion of different levels are obtained. The high-resolution images use the low-resolution images to obtain more accurate classification features, and the low-resolution images can obtain more accurate position features through the high-resolution images. The repeated fusion of different layers can obtain a plurality of intermediate products between the standard image and the noisy image, the more products are used as guide images for obtaining the standard image, and the learning capability of the convolutional neural network is added to obtain more standard feature image output. Finally, after each input image block passes through HRNet, a plurality of feature maps with lower resolution than the input image are output.

And S30, cutting the remote sensing image to be detected into a plurality of unit images with the same size, inputting each unit image into the remote sensing image target detection frame respectively for target detection, and determining the target detection frame of each unit image.

The above steps perform convolution operation on the feature map obtained by HRNet calculation, calculate a thermodynamic diagram (Heat map) of the image, then obtain a central point of the image by finding a peak value of the thermodynamic diagram, and obtain a target central point (x, y), a target size (w, h), and a target orientation angle by regression calculation, and based on the central point, the size, and the orientation angle, obtain a target detection frame, i.e. a target detection result.

Considering that the remote sensing image acquired on the satellite generally has a large size and a high resolution, and the direct processing time efficiency and the direct processing space efficiency are both low, the embodiment proposes to perform overlapping division (or called clipping) on the large-format remote sensing image to be detected into a plurality of smaller image blocks with the same size for processing. When the target to be detected is located on the cutting line, the target can be divided into two parts, so that the position, the size or the direction can be difficultly fused in the final detection, and therefore the problem is solved by the splicing method aiming at the edge target, and the detection precision of the target on the cutting line is improved.

In one example, as shown in FIG. 2, (a) is an original image; (b) the image is a small block of the cut image, and the airplane positioned on the cutting line in the original image can be seen to be divided into two parts by the left image and the right image, and the cut airplane is taken as an edge target; the graph (c) is a result of directly splicing the left graph and the right graph, and it can be seen that the recognition results of the left graph and the right graph cannot be fused at all, and the sizes, the dimensions and the directions of the two graphs have different degrees, so that the recognition accuracy of the edge target is seriously influenced; the graph (d) is the splicing result of the splicing strategy for the edge target provided by the invention, overlapping cutting is carried out in the cutting process to obtain a left image small block, a middle image small block and a right image small block, the edge target is completely placed into the image small block overlapped and cut in the middle, and overlapping splicing is also carried out in the splicing process, so that the strategy completely avoids the identification error of the edge target caused by cutting, and the identification precision of the edge target is greatly improved. It is noted that this embodiment only shows one edge of the image, and in actual operation, the process of overlap cropping is for four edges.

As an embodiment, performing a convolution operation on the feature map, calculating a thermodynamic diagram of the image, obtaining a center point of the unit image by finding a peak of the thermodynamic diagram, and obtaining a target center point, a target size, and a target orientation angle of the unit image by a regression calculation, and determining the target detection frame according to the target center point, the target size, and the target orientation angle includes:

Wherein R represents the size scaling, C is the number of key point types, W, H are the image width and height, the peak point of the thermodynamic diagram is the center point, and the peak point position of each characteristic diagram predicts the target widthHigh information;

As an embodiment, assume that the target key point position on the target true value graph of the unit image is p ∈ R ² Continuously down-sampling the original input image through the HRNet network to obtain a corresponding feature map, wherein the corresponding key points on the feature map are

Wherein R represents a size scaling;

by the Gaussian kernel formula

Dispersing GT point on a truth diagram into a thermodynamic diagram

is composed of

The horizontal and vertical coordinates of (1);

the main purpose of (1) is to perform normalization;

adding local offsets to the prediction of the center point

it follows that the local offset will only be at the critical point location

Making a prediction;

when searching for target pixel points, key points are used

Get all centers, assume

For a target block diagram of class k, then the center position is

The regression of the target block diagram for each target is equal to

when the angle orientation of the target is searched, key points are adopted

To perform regression, suppose

For a class k object block diagram, then the object is oriented

Add the loss function of L1 at the center position:

L _det ＝L _k +λ _size L _size +λ _angle L _angle +λ _off L _off

in the experiment, lambda _size ＝0.1，λ _angle ＝0.1，λ _off 1 the whole network prediction will output c +5 values at each position, namely the keypoint class c, the target center point (x, y), the size (w, h), the target angle, all of which share the same back bone of the deep convolutional neural network.

And extracting a peak value point of each category on the thermodynamic diagram to determine a target center point, a target size and a target orientation angle.

Suppose that

Is the set of N center points of the detected category c, as shown in the following formula:

each keypoint being in the form of a shaped coordinate (x) _i ,y _i ) It is given.

As a measured detection confidence, a target block diagram of the following formula is generated:

wherein the content of the first and second substances,

in order to shift the result of the prediction,

is the result of the scale prediction and is,

is the result of the angle prediction.

In one example, the above-mentioned convolution operation on the feature map obtained by HRNet calculation, calculating a thermodynamic diagram (Heat map) of the image, then obtaining a central point of the image by finding a peak of the thermodynamic diagram, and obtaining a target central point (x, y), a target size (w, h), and a target orientation angle by regression calculation, and based on the central point, the size, and the orientation angle, a detection frame of the target, that is, a result of target detection, may be obtained by:

first, a feature map obtained by HRNet calculation is convolved to calculate a thermodynamic map of an image

Where R represents the size scaling and C is the key point classThe number of types (namely the number of channels of the output feature map), W and H are the width and the height of the image, the peak point of the thermodynamic diagram is the central point, and the peak point position of each feature map predicts the width and height information of the target;

second, key points of the target are trained by thermodynamic diagrams. Assuming that the position of a target key point on a target true value graph (Ground route) is set as p ∈ R ² The GT points contain all effective information in the feature graph and are key nodes for feature extraction. Continuously down-sampling the original input image through the HRNet network to obtain corresponding feature maps, wherein the corresponding key points on the feature maps are

Wherein R represents a size scaling;

by the Gaussian kernel formula

Dispersing GT point on a truth diagram into a thermodynamic diagram

Where σ is _p Is the adaptive standard deviation of the target size. If more than two gaussian functions overlap in the target size, the one with the largest value is selected.

Is composed of

The horizontal and vertical coordinates of (2);

wherein alpha and beta are hyper-parameters, which are usually set to 2 and 4 in a specific experiment, N is the number of key points in pixel points,

the main purpose of (1) is to perform normalization;

thirdly, since the GT point is biased by the low pixels in the down-sampling of the image, a local offset is added to the prediction of the center point

it follows that the local offset will only be at the critical point location

The operation is not used in other positions when prediction is made;

fourthly, when searching for a target pixel point, the key point is used

All centers are obtained. Suppose that

For a target block diagram of class k, then the center position is

The regression of the target block diagram for each target is equal to

To get the target block diagram quickly, we use a single size prediction to compute, i.e.

Where W, H are image width and height, R represents size scaling, and the loss function of L1 is added at the center position:

fifthly, key points are also adopted when the angle orientation of the target is searched

To perform regression. The same assumption is made

For a class k object block diagram, then the object is oriented

Similarly, add the loss function of L1 at the center position:

sixthly, the network target loss function in the whole target detection process consists of four parts:

L _det ＝L _k +λ _size L _size +λ _angle L _angle +λ _off L _off

Seventh, a peak point of each category on the thermodynamic diagram is extracted. To obtain these peak points, we compare all the response points on the thermodynamic diagram with the eight neighboring points, and if the value of the corresponding point is greater than or equal to the maximum value of the eight neighboring points, that corresponding point is retained, otherwise the corresponding point is eliminated. Finally, we will retain all the first 100 peak points that meet their requirements. Suppose that

wherein the content of the first and second substances,

in order to shift the result of the prediction,

is the result of the scale prediction and is,

is the prediction result of the angle.

The embodiment has the following beneficial effects:

the CenterNet framework has simple steps and strong sensitivity to the direction, can quickly identify the target of the high-resolution remote sensing image, and can accurately identify the direction of the target.

The HRNet network is light in weight, but the characteristic extraction effect is good, and the HRNet network is used in the on-satellite self-identification operation environment and can take the precision and space limitations into consideration.

The splicing strategy aiming at the edge target can effectively improve the accuracy of edge target identification and integrally realize the target identification of the end-to-end large-size remote sensing image.

In one embodiment, the overall algorithm framework of the angular high-resolution remote sensing image target detection method based on the improved centret can be referred to fig. 3, and includes:

step one, a plurality of remote sensing target images are given and used as training samples, and the training samples are input into an improved CenterNet framework to calculate a target Feature map (Feature map). Compared with the traditional CenterNet target detection framework which usually adopts ResNet, DLA, Hourglass and the like as a Backbone network (Backbone), the improved CenterNet target detection framework adopts light-weight HRNet as the Backbone network to calculate the feature map of the remote sensing target image, and compared with other Backbone networks, the HRNet has the advantages that the parallel high-resolution network architecture and the repeated multi-scale fusion mode can calculate the target feature map more quickly and accurately;

firstly, giving a plurality of remote sensing target images as training samples, and inputting the training samples into a CenterNet framework;

and secondly, calculating a characteristic map of each remote sensing target image by using the HRNet. The HRNet adopts repeated multi-scale fusion under the same scale and resolution, wherein the feature maps of three scales can be respectively changed into three feature maps of different scales through sampling and up-sampling modes, then the three feature maps are respectively fused on different scales, and finally three representations of different-level fusion are obtained. The high-resolution images use the low-resolution images to obtain more accurate classification features, and the low-resolution images can obtain more accurate position features through the high-resolution images. The repeated fusion of different layers can obtain a plurality of intermediate products between the standard image and the noisy image, the more products are used as guide images for obtaining the standard image, and the learning capability of the convolutional neural network is added to obtain more standard feature image output. Finally, after each input image block passes through HRNet, a plurality of feature maps with lower resolution than the input image are output.

Performing convolution operation on the feature map obtained by HRNet calculation, calculating a Heat map of the image, obtaining a central point of the image by searching a peak value of the Heat map, obtaining a target central point (x, y), a target size (w, h) and a target orientation angle by regression calculation, and obtaining a target detection frame, namely a target detection result, based on the central point, the size and the orientation angle;

Wherein R represents the size scaling, C is the number of types of key points (namely the number of channels of the output feature map), W and H are the width and height of the image, the peak point of the thermodynamic map is the central point, and the peak point position of each feature map predicts the width and height information of the target;

Wherein R represents a size scaling;

by the Gaussian kernel formula

Dispersing GT point on a truth diagram into a thermodynamic diagram

Is composed of

The horizontal and vertical coordinates of (1);

the main purpose of (1) is to perform normalization;

it follows that the local offset will only be at the critical point location

The operation is not used in other positions when prediction is made;

fourthly, when searching for target pixel points, the key points are used

All centers are obtained. Suppose that

For a target block diagram of class k, then the center position is

The regression of the target block diagram for each target is equal to

To perform regression. The same assumption is made

For a class k object block diagram, then the object is oriented

Similarly, add a loss function of L1 at the center position:

L _det ＝L _k +λ _size L _size +λ _angle L _angle +λ _off L _off

in the experiment, lambda _size ＝0.1，λ _angle ＝0.1，λ _off 1 the entire network predicts that at each location c +5 values, i.e. keypoint class c,target center point (x, y), size (w, h), target angle, all of these outputs share the same back bone of the deep convolutional neural network.

wherein the content of the first and second substances,

in order to shift the result of the prediction,

is the result of the scale prediction and is,

is the prediction result of the angle.

And step three, considering that the remote sensing images acquired on the satellite are generally large in size, high in resolution and low in direct processing time efficiency and space efficiency, the invention provides that the large-format remote sensing images to be detected are subjected to overlapping division (or called cutting) into a plurality of small image blocks with the same size for processing. When the target to be detected is located on the cutting line, the target is divided into two parts, so that the position, the size or the direction are difficult to fuse in the final detection process, and therefore, a splicing method for the edge target is provided to solve the problem, so that the detection accuracy of the target on the cutting line is improved.

As shown in fig. 2, (a) is an original image; (b) the image is a small block of the cut image, and the airplane positioned on the cutting line in the original image can be seen to be divided into two parts by the left image and the right image, and the cut airplane is taken as an edge target; the graph (c) is a result of directly splicing the left graph and the right graph, and it can be seen that the recognition results of the left graph and the right graph cannot be fused at all, and the sizes, the dimensions and the directions of the two graphs have different degrees, so that the recognition accuracy of the edge target is seriously influenced; the graph (d) is the splicing result of the splicing strategy for the edge target provided by the invention, overlapping cutting is carried out in the cutting process to obtain a left image small block, a middle image small block and a right image small block, the edge target is completely placed into the image small block overlapped and cut in the middle, and overlapping splicing is also carried out in the splicing process, so that the strategy completely avoids the identification error of the edge target caused by cutting, and the identification precision of the edge target is greatly improved. It is worth noting that this example only shows one edge of the image, and in actual practice, the process of overlap cropping is for four edges.

In order to verify the algorithm provided by the invention, a DOTA public data set is adopted, the data set is an aerial image data set made by cooperation of emphatic laboratory Xiaguasong in remote sensing country of Wuhan university and white flying in China telecom college, and 2806 aerial images are collected from different sensors and platforms, and the data set comprises 188282 examples in 15 categories. Wherein, the plane, the ship, the storage tank, the baseball Diamond, the tenis court, the swimming pool, the ground track field, the harbor, the bridge, the large court, the small court, the helicopters, the round-about, the socker court field, the baseball court pool totally 15 classes, of which 14 main classes, small court and large court are subclasses of the court. Different from the traditional labeling method, the data set adopts an angled target detection labeling mode, so that the positioning is more accurate.

The original dataset was cropped into 512 x 512 images for training. Specifically, the data set is defined as a training set and a prediction set, although the DOTA data set has extremely unbalanced class distribution, it is necessary to ensure that the training set and the prediction set both include the above 15 classes, and the proportion of the training set to the verification set is about 3: 1. a target detection method of CenterNet is adopted, training is carried out through a network with backbone HRNet, then a model of the position, size and orientation of a target is obtained through regression, a verification set is predicted according to the model, and finally a target output result is obtained through a splicing strategy aiming at an edge target.

In one example, the running environment for the respective experiment was ubuntu14.04, framed by pyrroch 1.2.0, trained using a computer platform configured as NVIDIA Tesla K40C. The improved algorithm adopts a cross entropy loss function, an Adam optimizer, and the batch _ size is set to 16, 100 epoach are trained in total, and the learning rate is 0.000001.

FIG. 4 shows the results of confusion matrix experiments in remote sensing image target identification. It can be seen that the method adopted by us can well detect the plane, ship, baseball court, storage tank, swamming pool, etc., but the neglect of large targets such as ground track field, harbor, baseball diamond, tenis court, and sodium ball field can be neglected to a certain extent, because these large sites are similar to background and are easily neglected in the learning of neural network, thereby generating false negative results. In addition, the detection result is still relatively accurate for the category of the vehicle, but the detection results are confused to a certain extent due to the similarity between the large vehicle and the small vehicle. It is worth pointing out that background is a category that takes the false detection data into account and directly uses the false detection as the output of the confusion matrix.

The experimental result of the remote sensing image target detection covers 15 types of targets in the DOTA data set. By visually observing the recognition result, the recognition effects of targets with obvious characteristics, such as plane, ship, storage tank, baseball Diamond, Tenis court, swimming pool, large vessel, small vessel, helicopters, round-about, and baseball court, are good, the direction and the size are both suitable, and the false detection is carried out to a certain degree; for field identification with a large target proportion such as a capacitor ball field and a ground track field but with few characteristics, the effect is poor, and particularly, the direction has a certain degree of deviation; for the strip-shaped objects such as the hardor and bridge, the problem of the identified angle is worse, and therefore the strip-shaped objects are obtained, so that the deviation of the angle of a point has a great influence on the overall accuracy of the category, and therefore the angles of the hardor and bridge relative to the original objects show large deviation. In general, the algorithm provided by the invention has good expressions on the position, size and direction of remote sensing image recognition, and is particularly different from the traditional target detection.

In the experiment, original backbone in the CenterNet and HRNet _30 are selected to be used as backbone for comparison, and the backbone is introduced into an improved CenterNet-based angular remote sensing image target identification algorithm framework for experiment. In a specific operation, ResNet _50, DLA _34 and hourglass were used for the comparison test. Because the invention is dedicated to the on-satellite self-learning remote sensing image target detection, a lighter network is needed as a backbone, and the space utilization rate of the network is an important measurement index. Space efficiency of the various networks as shown in table 1, it can be seen that HRNet _30 is much smaller than hourglass and slightly smaller than the ResNet _50 network, but HRNet _30 is slightly larger than the DLA _34 network, both in terms of network parameters and network computation.

TABLE 1 comparison of back bone space efficiencies

Table 2 shows the accuracy comparison of the final experimental results of different backbones. It should be noted that the evaluation indexes AP and AR used in this document are both evaluated with an angle. It can be seen that the accuracy of HRNet _30 is much better than ResNet _50 and DLA _34 in AP, AR, AP50, AP75, but is less accurate than the accuracy of ourglass.

TABLE 2 backsbone experiment precision comparison

By comprehensively comparing the space efficiency and the experimental precision of the backhaul, the HRNet _30 space occupation is much smaller than that of a hourglass network, but the precision is slightly different and is just a little inferior; the DLA _34 network occupies less space than HRNet _30, but has much poorer precision, which is far less than HRNet _ 30; ResNet _50 is comparable to HRNet _30, regardless of space usage or experimental accuracy. In conclusion, on the basis of comprehensively considering the space efficiency and the experimental precision, the algorithm taking the HRNet as the backbone is more suitable for self-learning of satellite remote sensing image target identification and is the optimal choice.

In addition, it is noted that the algorithm and the IENet provided by the invention are both applied to directional target detection of remote sensing images, and experiments are performed on DOTA data sets. In terms of accuracy, the accuracy of the AP50 in the algorithm provided by the invention is 0.5829, and the accuracy of the AP50 of IENet is 0.5714, so that the algorithm provided by the invention is slightly better than the accuracy of the IENet; in terms of network parameters, the network parameter of the backhaul in the algorithm provided by the invention is 25.42MB, and the network parameter of the IENet is 212MB, so that the algorithm provided by the invention has lower space occupancy rate. Therefore, compared with IENet, the algorithm provided by the invention is better, and is more suitable for on-satellite self-identification of remote sensing images.

The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

It should be noted that the terms "first \ second \ third" referred to in the embodiments of the present application merely distinguish similar objects, and do not represent a specific ordering for the objects, and it should be understood that "first \ second \ third" may exchange a specific order or sequence when allowed. It should be understood that "first \ second \ third" distinct objects may be interchanged under appropriate circumstances such that the embodiments of the application described herein may be implemented in an order other than those illustrated or described herein.

The terms "comprising" and "having" and any variations thereof in the embodiments of the present application are intended to cover non-exclusive inclusions. For example, a process, method, apparatus, product, or device that comprises a list of steps or modules is not limited to the listed steps or modules but may alternatively include other steps or modules not listed or inherent to such process, method, product, or device.

The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. An angular high-resolution remote sensing image target detection method based on improved CenterNet is characterized by comprising the following steps:

s30, cutting the remote sensing image to be detected into a plurality of unit images with the same size, inputting each unit image into the remote sensing image target detection frame respectively for target detection, and determining the target detection frame of each unit image, wherein the method comprises the following steps: respectively inputting the unit images into a remote sensing image target detection framework, calculating a feature graph of the unit images by using HRNet, performing convolution operation on the feature graph, calculating a thermodynamic diagram of the images, obtaining a central point of the unit images by searching a peak value of the thermodynamic diagram, obtaining a target central point, a target size and a target orientation angle of the unit images by regression calculation, and determining a target detection frame according to the target central point, the target size and the target orientation angle;

2. The improved centret-based angular high-resolution remote sensing image target detection method according to claim 1, wherein the convolution operation is performed on the feature map, the thermodynamic diagram of the image is calculated, the central point of the unit image is obtained by searching the peak value of the thermodynamic diagram, the target central point, the target size and the target orientation angle of the unit image are obtained by regression calculation, and the target detection frame is determined according to the target central point, the target size and the target orientation angle, and comprises the following steps:

3. The improved CenterNet based angular high-resolution remote sensing image target detection method according to claim 2,

assuming that the position of a target key point on a target true value graph of a unit image is set as p ∈ R ² Continuously down-sampling the original input image through the HRNet network to obtain a corresponding feature map, wherein the corresponding key points on the feature map are

Wherein R represents a size scaling;

by the Gaussian kernel formula

Dispersing GT point on a truth diagram into a thermodynamic diagram

is composed of

The horizontal and vertical coordinates of (1);

the main purpose of (1) is to perform normalization;

adding local offsets to the prediction of the center point

it follows that the local offset will only be at the critical point location

Making a prediction;

when searching for target pixel points, key points are used

Get all centers, assume

For a target block diagram of class k, then the center position is

The regression of the target block diagram for each target is equal to

when finding the angle orientation of the targetBy using key points

To perform regression, suppose

For a class k object block diagram, then the object is oriented

Add the loss function of L1 at the center position:

L _det ＝L _k +λ _size L _size +λ _angle L _angle +λ _off L _off

in the experiment, lambda _size ＝0.1，λ _angle ＝0.1，λ _off 1, the whole network prediction outputs c +5 values at each position, namely a key point category c, a target central point (x, y), a size (w, h) and a target angle, and all the outputs share the backbone of the same deep convolutional neural network;

4. The improved CenterNet-based angular height remote sensing image target detection method according to claim 3, wherein extracting the peak point of each category on the thermodynamic diagram comprises:

and comparing all the response points on the thermodynamic diagram with the eight adjacent points, if the value of the response point is greater than or equal to the maximum value of the eight adjacent points, retaining the response point, and otherwise, eliminating the response points, and retaining all the top 100 peak points meeting the requirement of the response point to obtain the peak point of each category on the thermodynamic diagram.