CN114842353A

CN114842353A - Neural network remote sensing image target detection method based on self-adaptive target direction

Info

Publication number: CN114842353A
Application number: CN202210484478.0A
Authority: CN
Inventors: 董志鹏; 刘焱雄; 冯义楷; 王艳丽; 陈义兰
Original assignee: First Institute of Oceanography MNR
Current assignee: First Institute of Oceanography MNR
Priority date: 2022-05-06
Filing date: 2022-05-06
Publication date: 2022-08-02
Anticipated expiration: 2042-05-06
Also published as: CN114842353B

Abstract

The invention relates to a neural network remote sensing image target detection method based on a self-adaptive target direction, and belongs to the technical field of remote sensing image target identification and information extraction. The invention firstly provides an anchor point and five-parameter-based self-adaptive target direction region regression method, which can realize the regression of a target region in any direction in a high-resolution remote sensing image; secondly, based on the idea of self-adaptive target direction regional regression, a convolutional neural network remote sensing image target detection method based on self-adaptive target direction is provided, the method can realize target region regression in any direction and accurate classification of target categories, and accurate high-resolution remote sensing image target detection results are obtained. The invention has the characteristics of simplicity, reliability, high precision and easy realization. The method can be widely applied to remote sensing image target identification and information extraction occasions.

Description

Neural network remote sensing image target detection method based on self-adaptive target direction

Technical Field

The invention relates to a neural network remote sensing image target detection method based on a self-adaptive target direction, and belongs to the technical field of remote sensing image target identification and information extraction.

Background

The high-resolution remote sensing image target detection is used as a key technology for automatically extracting, analyzing and understanding image information in a high-resolution earth observation system, and plays an important role in aspects of military reconnaissance, ocean monitoring, precision striking and the like based on the high-resolution remote sensing image. The high-resolution remote sensing image target detection refers to a process of accurately positioning a target area of interest in an image and accurately classifying target categories. For the detection of the target of the high-resolution remote sensing image, scholars at home and abroad develop a large amount of research, wherein a plurality of research methods mainly adopt a mode of extracting a target candidate region- > obtaining the characteristics of the target candidate region- > classifying the characteristics of the target candidate region to realize the detection of the image target. In the mode, firstly, a sliding window, a selective search algorithm, an edge boxes algorithm and the like are used for obtaining a target candidate region in an image; then extracting target candidate region characteristics such as gradient histograms, local binary patterns, scale invariant characteristic transformation and the like through artificially designed characteristics; and finally, inputting the characteristics into a traditional classifier in a characteristic vector form, such as a support vector machine, AdaBoost, a decision tree and the like for classification, and realizing the detection of the target in the image. The mode achieves better effect in a specific target detection task. However, due to the complex and changeable shooting conditions of the remote sensing satellite and the fact that a large number of remote sensing images are generated every day, the mode is difficult to adapt to the target detection of the remote sensing images with large data volume under different conditions, and the robustness and universality of the algorithm are weak.

In recent years, deep learning has attracted extensive attention from researchers in different fields. Convolutional Neural Networks (CNNs) are the hottest deep learning models, and because image features do not need to be designed artificially and the same image feature map shares convolution kernel parameters of "local receptive fields", the convolutional neural networks have fewer network parameters than other network models. In addition, the convolutional neural network can automatically extract and learn effective image features according to mass data and labels based on a special network structure; in addition, under the condition that the training data is sufficient, the model has good generalization capability, and can maintain good robustness and universality under the condition of complexity and variability. Therefore, the convolutional neural network model has been widely used in digital image processing. Since the global CNN (R-CNN) target detection architecture based on the convolutional neural network in 2014 achieved better detection results than the conventional target detection method (employing artificial design image features) in the PASCAL VOC target detection challenge race, different target detection architectures based on the convolutional neural network were rapidly developed, such as Fast-RCNN, SSD (single shot multi-detector), and yolo (you only look) architectures. However, in all of these network detections, a horizontal frame is adopted to regress a target area, which is difficult to effectively apply to the detection of remote sensing image targets with large length and width and dense distribution, and many missing detections occur.

Aiming at the problems, the invention provides a convolutional neural network remote sensing image target detection method based on a self-adaptive target direction.

Disclosure of Invention

The invention provides a convolutional neural network remote sensing image target detection method based on a self-adaptive target direction, aiming at the problem that the existing convolutional neural network target detection framework is difficult to effectively adapt to target detection with large length, large width and dense distribution.

The invention relates to a neural network remote sensing image target detection method based on a self-adaptive target direction, which comprises the following steps of:

s1, self-adaptive direction target region regression: expressing the target area by using five parameters, and realizing regression on the target area in any direction based on the anchor point;

expressing the target region by using five parameters (x, y, w, h and theta), wherein (x, y) is the coordinate of the central point of the target region, w and h are the width and the height of the target region respectively, theta is an included angle between an angle point with the minimum value of y and an x axis in four angle points of the target region, and theta belongs to (0, pi/2);

and (3) regressing the target area in any direction based on the anchor point and the five parameters, wherein the calculation formula is as follows:

wherein: (O) _x ,O _y ,O _w ,O _h ,O _θ ) Five parameter (x, y, w, h, theta) values for the target region regression; a is _w And a _h Respectively the width and height of the anchor point; (x) ₀ ,x ₁ ,x ₂ ,x ₃ ,x ₄ ) (ii) a network output value for the convolutional neural network at the location of the feature map (i, j);

and calculating and obtaining the coordinates of four corner points of the target region based on the five parameters (x, y, w, h and theta) of the target region, wherein the calculation formula is as follows:

wherein: (x) _P1 ,y _P1 )、(x _P2 ,y _P2 )、(x _P3 ,y _P3 ) And (x) _P4 ,y _P4 ) Coordinates of four corner points P1, P2, P3 and P4 of the target area respectively;

s2, convolutional neural network target detection of self-adaptive target direction: the target detection architecture of the convolutional neural network based on the self-adaptive target direction can realize target region regression in any direction and accurate classification of target categories; the training loss of the target detection architecture is calculated as follows:

Loss＝L _coord +L _class +L _obj

(7)

wherein: loss is the training Loss of the target detection architecture; l is _coord 、L _class And L _obj Target coordinates, categories and confidence loss, respectively; m is the width or height of the feature map; n is the number of anchor points at each location of the feature map;

indicating whether the anchor point labeled k at the location of the feature map (i, j) is a positive sample, if so

Is 1, otherwise is 0; w is a _ij And h _ij Width and height of the target region for the true value corresponding to anchor point labeled k at the location of feature map (i, j);(x _ij ,y _ij ,w _ij ,h _ij ,θ _ij ) The five parameters of the target area are true values;

a predicted value of a network architecture of a target area five-parameter generated for an anchor point with a label of k; w is a _a And h _a Width and height of anchor point labeled k; r is the classification number of the network architecture;

generating predicted values of different categories of the target area for the network measurement architecture pair;

a confidence predictor targeting the target region generated based on the anchor point labeled k.

Preferably, in step S2, the darknet-53 is used to extract a feature map of the image in the target detection architecture, and the target region is trained and regressed on three feature maps with different scales.

Preferably, in step S2, the target region is trained and regressed based on three anchor points at each position of the feature map with the size of 13 × 13 pixels, where the sizes of the anchor points are 116 × 90 pixels, 156 × 198 pixels and 373 × 326 pixels, respectively;

upsampling the feature map with the size of 13 pixels multiplied by 13 pixels to 26 pixels multiplied by 26 pixels, and combining the upsampled feature map with the size of 26 pixels multiplied by 26 pixels in a network architecture to form a new feature map with the size of 26 pixels multiplied by 26 pixels;

training and regressing the target area based on three anchor points at each position of a new feature map with the size of 26 pixels multiplied by 26 pixels, wherein the sizes of the three anchor points are respectively 30 pixels multiplied by 61 pixels, 62 pixels multiplied by 45 pixels and 59 pixels multiplied by 119 pixels;

upsampling the new feature map with the size of 26 pixels multiplied by 26 pixels to 52 pixels multiplied by 52 pixels, and combining the upsampled feature map with the size of 52 pixels multiplied by 52 pixels in the network architecture to form a new feature map with the size of 52 pixels multiplied by 52 pixels;

the target area is trained and regressed at each position of the new feature map with dimensions of 52 pixels × 52 pixels based on three anchor points, which have dimensions of 10 pixels × 13 pixels, 16 pixels × 30 pixels and 33 pixels × 23 pixels, respectively.

Preferably, in the step S2, a multi-scale training concept is adopted to train the target detection architecture, and if the intersection ratio of one anchor point to the true-value target region is the largest of the intersection ratios of all anchor points to the true-value target region in the training process, the anchor point region is marked as a positive sample; the anchor points remaining that are not marked as positive samples are marked as negative samples.

Preferably, in the step S2, in the target detection architecture testing stage, all the x, y, θ, confidence level and category prediction values in the network are processed by using formula (11).

Preferably, in the step S2, five parameters of the anchor point-based target generation region are obtained using equations (1) - (5); if the confidence coefficient of the generated target area is larger than the set threshold value, retaining, otherwise, removing; in order to reduce the redundancy of the target detection result, a non-maximum value suppression algorithm is used for suppressing the reserved target area, wherein the intersection ratio threshold value is set to be 0.3; and the target area remained after the inhibition of the non-maximum value is the target detection result of the target detection framework.

The beneficial effects of the invention are: the neural network remote sensing image target detection method based on the self-adaptive target direction can effectively solve the problem that the existing convolutional neural network target detection framework adopts a horizontal frame to miss detection of the remote sensing image targets which are large in length-width ratio and densely distributed, and obtains an accurate high-resolution remote sensing image target detection result.

Drawings

FIG. 1 is a diagram of an object detection architecture of the present invention.

Fig. 2 is a schematic diagram of the present invention based on five-parameter representation of target areas.

FIG. 3 is a schematic diagram of the regression of the target region based on the anchor point according to the present invention.

Detailed Description

The technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention.

Example (b):

the technical scheme of the invention can adopt a computer software mode to support the automatic operation process. The technical scheme of the invention is explained in detail in the following by combining the drawings and the embodiment.

(1) Self-adaptive direction target area regression: and expressing the target area by using five parameters, and realizing regression on the target area in any direction based on the anchor point.

In the invention, the target area is represented by using five parameters (x, y, w, h and theta), as shown in FIG. 2; wherein (x, y) is the coordinate of the central point of the target area, w and h are the width and height of the target area respectively, theta is the included angle between the angle point with the minimum y value in the four angle points of the target area and the x axis, and theta is belonged to (0, pi/2).

The target area in any direction is regressed based on the anchor point and the five parameters, as shown in fig. 3, the calculation formula is as follows:

wherein: (O) _x ,O _y ,O _w ,O _h ,O _θ ) Five parameter (x, y, w, h, theta) values for the target region regression; a is _w And a _h Respectively the width and height of the anchor point; (x) ₀ ,x ₁ ,x ₂ ,x ₃ ,x ₄ ) And (4) outputting the network output value of the convolutional neural network at the position of the feature map (i, j).

Coordinates of four corner points of the target area can be calculated and obtained based on five parameters (x, y, w, h and theta) of the target area, and the calculation formula is as follows:

wherein: (x) _P1 ,y _P1 )、(x _P2 ,y _P2 )、(x _P3 ,y _P3 ) And (x) _P4 ,y _P4 ) The coordinates of the four corner points P1, P2, P3 and P4 of the target area, respectively.

(2) And (3) convolutional neural network target detection of self-adaptive target direction: the target detection architecture of the convolutional neural network based on the self-adaptive target direction can realize target region regression in any direction and accurate classification of target categories.

The target detection architecture of the convolutional neural network based on adaptive target direction is shown in fig. 1. In the target detection framework, the darknet-53 is used for extracting the feature map of the image, and the target region is trained and regressed on the feature maps of three different scales. The target region is trained and regressed at each position of a feature map of size 13 pixels by 13 pixels based on three anchor points of size 116 pixels by 90 pixels, 156 pixels by 198 pixels and 373 pixels by 326 pixels, respectively.

The feature map with the size of 13 pixels × 13 pixels is up-sampled to 26 pixels × 26 pixels, and is combined with the feature map with the size of 26 pixels × 26 pixels in the network architecture to form a new feature map with the size of 26 pixels × 26 pixels. The target region is trained and regressed at each position of the new feature map with dimensions of 26 pixels × 26 pixels based on three anchor points, which have dimensions of 30 pixels × 61 pixels, 62 pixels × 45 pixels and 59 pixels × 119 pixels, respectively.

The new feature map with the size of 26 pixels × 26 pixels is up-sampled to 52 pixels × 52 pixels, and is combined with the feature map with the size of 52 pixels × 52 pixels in the network architecture to form a new feature map with the size of 52 pixels × 52 pixels. The target area is trained and regressed at each position of the new feature map with dimensions of 52 pixels × 52 pixels based on three anchor points, which have dimensions of 10 pixels × 13 pixels, 16 pixels × 30 pixels and 33 pixels × 23 pixels, respectively.

Training the target detection architecture by adopting a multi-scale training thought, wherein if the intersection ratio of one anchor point to a true-value target area is the largest of the intersection ratios of all the anchor points to the true-value target area in the training process, the anchor point area is marked as a positive sample; the anchor points remaining that are not marked as positive samples are marked as negative samples. The training loss of the architecture of the present invention is calculated as follows:

Loss＝L _coord +L _class +L _obj (7)

Is 1, otherwise is 0; w is a _ij And h _ij Width and height of the target region for the true value corresponding to anchor point labeled k at the location of feature map (i, j); (x) _ij ,y _ij ,w _ij ,h _ij ,θ _ij ) The five parameters are true values of the target area;

In the target detection architecture test stage, all x, y, theta, confidence degrees and category predicted values in the network are processed by using a formula (11). Five parameters of the anchor point-based target generation region are obtained using equations (1) - (5). And if the confidence of the generated target region is greater than the set threshold value, retaining the target region, and otherwise, removing the target region. In order to reduce redundancy of target detection results, the remaining target area is suppressed using a non-maximum suppression algorithm, wherein the cross-over ratio threshold is set to 0.3. And the target area remained after the inhibition of the non-maximum value is the target detection result of the target detection framework.

The method can be widely applied to remote sensing image target identification and information extraction occasions.

The specific examples described herein are merely illustrative of the spirit of the invention. Various modifications or additions may be made or substituted in a similar manner to the specific embodiments described herein by those skilled in the art without departing from the spirit of the invention or exceeding the scope thereof as defined in the appended claims.

Claims

1. A neural network remote sensing image target detection method based on a self-adaptive target direction is characterized by comprising the following steps:

s1, self-adaptive direction target region regression: expressing the target area by using five parameters, and realizing regression of the target area in any direction based on the anchor point;

Is 1, otherwise is 0; w is a _ij And h _ij Width and height of the target region for the true value corresponding to anchor point labeled k at the location of feature map (i, j); (x) _ij ,y _ij ,w _ij ,h _ij ,θ _ij ) The five parameters of the target area are true values;

2. The method for target detection based on adaptive target direction neural network remote sensing image of claim 1, wherein in step S2, a feature map of an image is extracted by using dark net-53 in a target detection architecture, and a target region is trained and regressed on three feature maps of different scales.

3. The method for target detection based on the adaptive target direction neural network remote sensing image of claim 2, wherein in step S2, the target region is trained and regressed based on three anchor points at each position of the feature map with the size of 13 × 13 pixels, the three anchor points having the sizes of 116 × 90 pixels, 156 × 198 pixels and 373 × 326 pixels, respectively;

upsampling the feature map with the size of 13 pixels multiplied by 13 pixels to 26 pixels multiplied by 26 pixels, and combining the upsampled feature map with the size of 26 pixels multiplied by 26 pixels in the network architecture to form a new feature map with the size of 26 pixels multiplied by 26 pixels;

4. The method for target detection based on the adaptive target direction remote sensing image of the neural network as claimed in claim 3, wherein in the step S2, a multi-scale training concept is adopted to train the target detection architecture, and in the training process, if the intersection ratio of an anchor point to the true value target area is the largest of the intersection ratios of all anchor points to the true value target area, the anchor point area is marked as a positive sample; the anchor points remaining that are not marked as positive samples are marked as negative samples.

5. The method for detecting the target of the neural network remote sensing image based on the adaptive target direction as claimed in claim 1, wherein in the step S2, in the target detection architecture testing stage, all x, y, θ, confidence and category prediction values in the network are processed by using formula (11).

6. The method for target detection based on adaptive target direction neural network remote sensing image of claim 5, wherein in step S2, five parameters of anchor point-based target generation area are obtained using formulas (1) - (5); if the confidence coefficient of the generated target area is larger than the set threshold value, retaining, otherwise, removing; in order to reduce the redundancy of the target detection result, a non-maximum value suppression algorithm is used for suppressing the reserved target area, wherein the intersection ratio threshold value is set to be 0.3; and the target area remained after the inhibition of the non-maximum value is the target detection result of the target detection framework.