CN111462050A

CN111462050A - Improved YO L Ov3 minimum remote sensing image target detection method, device and storage medium

Info

Publication number: CN111462050A
Application number: CN202010172524.4A
Authority: CN
Inventors: 张孙杰; 陈磊; 肖寒臣
Original assignee: University of Shanghai for Science and Technology
Current assignee: University of Shanghai for Science and Technology
Priority date: 2020-03-12
Filing date: 2020-03-12
Publication date: 2020-07-28
Anticipated expiration: 2040-03-12
Also published as: CN111462050B

Abstract

The invention relates to the technical field of target detection, in particular to a method, a device and a storage medium for improving the target detection of a minimum remote sensing image of YO L Ov3, and provides a method, a device and a storage medium for improving the performance of low-resolution characteristics by adding extra paths which are connected from bottom to top and transversely on an FPN module, constructing a characteristic pyramid network which is from top to bottom and from bottom to top, fusing a pyramid characteristic layer which is combined in two directions, applying the pyramid characteristic layer to the target detection of the remote sensing image, reducing the dimensionality of a network model by adopting convolution of 1 × 1, and improving the detection speed of the network, and finally performing quantitative and qualitative comparison analysis on a VEDAI and NWPU VHR remote sensing vehicle data set and an advanced YO L Ov3 network.

Description

Improved YO L Ov3 minimum remote sensing image target detection method, device and storage medium

Technical Field

The invention relates to the technical field of target detection, in particular to a method, a device and a storage medium for improving the target detection of a minimal remote sensing image of YO L Ov 3.

Background

In recent years, target detection has become a research hotspot of computer vision, and is widely applied to various fields of robot navigation, intelligent video monitoring, industrial detection, aerospace and the like, and the category of a target in an image needs to be determined and the accurate position of the target needs to be given.

In the past years, researchers found that designing a deep convolutional neural network can greatly improve the performance of image classification and target detection, because the deep convolutional neural network can not only extract high-semantic high-level features of a target from an image, but also integrate the feature extraction, the feature selection and the feature classification into a model, and perform function optimization on the whole through end-to-end training, so that the separability of the features is enhanced.

Disclosure of Invention

Aiming at the defects of the prior art, the invention discloses a method, a device and a storage medium for detecting a minimum remote sensing image target of improved YO L Ov3, which are used for solving the problems of low target detection rate, high false alarm rate and low detection speed of the minimum remote sensing image at the present stage.

The invention is realized by the following technical scheme:

in a first aspect, the invention discloses a method for detecting a target of a minimal remote sensing image with improved YO L Ov3, which comprises the following steps:

s1, acquiring target data, and fusing the feature outputs of the convolution layers of the YO L Ov3 network to form a pyramid feature layer;

s2, combining the output of YO L Ov3 shallow convolutional layer features to form a pyramid feature layer;

s3 fusing the two-way combined pyramid feature layers;

s4, changing the down-sampling layer of YO L Ov3 network into a convolution layer of 3 × 3;

s5 reduces the dimensionality of the network model and outputs data using a convolution of 1 × 1.

Furthermore, in S1, the last convolutional layer feature output of the YO L Ov3 network is merged with the convolutional layer feature output of the next previous layer to form a top-down pyramid feature layer.

Furthermore, in S2, the convolutional layer feature output of the YO L Ov3 shallow layer is combined with the convolutional layer feature output of the next layer to form a bottom-up pyramid feature layer.

Furthermore, in S4, the first downsampling layer of the YO L Ov3 network is changed to two convolution layers of 3x3, so that the YO L Ov3 network retains more small target location feature information in the initial stage.

Still further, the method adds additional bottom-up, cross-connect paths on the FPN module.

In a second aspect, the invention discloses a minimal remote sensing image object detection device of improved YO L Ov3, which comprises an FPN module, a memory, a processor and a computer program stored on the memory and operable on the processor, wherein when the computer program is executed by the processor, the minimal remote sensing image object detection method of improved YO L Ov3 according to the first aspect is realized.

Furthermore, additional bottom-up and cross-connect paths are added to the FPN module.

In a third aspect, the present invention discloses a storage medium having stored thereon a computer program which, when executed by a processor, implements the minimal remote sensing image object detection method of the improved YO L Ov3 of the first aspect.

The invention has the beneficial effects that:

the invention provides a method for improving the performance of low-resolution characteristics by adding additional bottom-up and transverse connection paths on an FPN module, constructing a top-down and bottom-up characteristic pyramid network, fusing two-way combined pyramid characteristic layers, and applying the pyramid characteristic layers to target detection of a remote sensing image, wherein the method mainly comprises the following steps of firstly changing a first down-sampling layer into two 3 × 3 convolution layers, and facilitating the first down-sampling layer of the network layer to retain more small target position information, then changing three different scale outputs of YO L Ov3 into a characteristic diagram output aiming at a small target, and secondly fusing the high-low level characteristics of a two-way pyramid to facilitate the detection of the small target.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 is a diagram of a modified YO L Ov3 network structure according to an embodiment of the present invention;

FIG. 2 is a comparison (VEDAI) graph of the detection results of the improved YO L Ov3 and the original network in the embodiment of the present invention;

FIG. 3 is a graph of the comparison (NWPU VHR) between the improved YO L Ov3 and the detection result of the original network according to the embodiment of the present invention;

fig. 4 is a diagram of a conventional YO L Ov3 network architecture according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Example 1

The embodiment discloses a method for detecting a target of an improved YO L Ov3 minimal remote sensing image, which comprises the following steps:

s3 fusing the two-way combined pyramid feature layers;

In S1, the last convolutional layer feature output of the YO L Ov3 network is merged with the convolutional layer feature output of the next previous layer to form a top-down pyramid feature layer.

In S2, the convolution layer feature output of the YO L Ov3 shallow layer is combined with the convolution layer feature output of the next adjacent layer to form a bottom-up pyramid feature layer.

In S4, the first downsampling layer of the YO L Ov3 network is changed to two convolution layers of 3x3, so that the YO L Ov3 network retains more small target location feature information in the initial stage.

The method comprises the steps of firstly, fusing the feature output of a convolution layer at the last layer of the network with the feature output of a convolution layer at the upper adjacent layer to form a pyramid feature layer from top to bottom, further combining the feature output of a convolution layer at a shallow layer with the feature output of a convolution layer at the lower adjacent layer to form a pyramid feature layer from bottom to top, and fusing the two-way combined pyramid features, wherein the global semantic information of a deep layer feature is combined, the detail local texture information of shallow layer convolution is also added, the first sampling layer of an original network is changed into two convolution layers of 3 × 3, so that the network can retain more small target position feature information at the initial stage, and finally, in order to improve the detection speed of the network, the VHI (very high frequency) model of 1 convolutional × is adopted to reduce the VHI (very high frequency response) of the network, and the remote sensing image 395 is compared with the current remote sensing image set of the remote sensing network for realizing the remote sensing of the remote sensing target.

Example 2

The embodiment discloses a minimum remote sensing image target detection device of improved YO L Ov3, which comprises an FPN module, a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein when the computer program is executed by the processor, the minimum remote sensing image target detection device of improved YO L Ov3 described in the above embodiment is realized.

Additional bottom-up, cross-connect paths are added to the FPN modules.

The embodiment also discloses a storage medium, wherein a computer program is stored on the storage medium, and when the computer program is executed by a processor, the method for detecting the target of the minimal remote sensing image of the improved YO L Ov3 in the embodiment is realized.

Example 3

In order to solve the problem, a pyramid feature method for representing a target feature layer in a multi-scale mode is provided, a feature pyramid (feature pyramid) is one of representative model architectures for generating pyramid feature representation for target detection, the feature pyramid is constructed in a mode of connecting and combining two adjacent feature layers in a backbone network model from top to bottom and transversely, high-resolution and strong semantic features are generated, YO L Ov3 takes the idea of the above feature pyramid model as shown in FIG. 4, the feature pyramid model is based on a backbone network model of Darknet53, 75 rolling layers are shared, 5 residual blocks are used for 5 times of downsampling, the deep downsampling feature map is used for predicting large targets, the large targets are predicted by using strong self-scaling information provided by deep layers, the 5 downsampling feature maps are used for predicting large targets by using three types of small semantic spatial connections, and small target spatial connections, and the target location information is not well combined with the three types of small target location prediction.

The invention provides a method for improving the performance of low-resolution features by adding additional bottom-up and transverse connection paths on an FPN module, constructing a top-down and bottom-up feature pyramid network, fusing two-way combined pyramid feature layers, and applying the pyramid feature layers to target detection of a remote sensing image, wherein the method mainly comprises the steps of firstly changing a first down-sampling layer into two 3x3 convolution layers, so that the first down-sampling layer of the network layer can retain more small target position information, then changing three different scale outputs of YO L Ov3 into feature maps output aiming at small targets, secondly fusing the high-low level features of a two-way pyramid so that small targets can be detected, in addition, adopting 11 convolutions to reduce the dimensionality of a network model, and improving the detection speed of the network, and finally performing quantitative and qualitative comparative analysis on VEDAI and NWPU VHR remote sensing vehicle data sets and the most advanced YO L Ov3 network.

Example 4

In order to make the detection of objects more challenging, both dataset images are scaled or cropped to 512pixel × 512pixel and 416pixel × 416pixel the VEDAI dataset contains nine classes of vehicle objects, namely "car, truck, pick, trailer, campingcar, boat, van, other and airplan". The average image contains 5.5 vehicle objects per image, accounting for about 0.7% of the total pixels of the image.

The target size in the data set is between 11.5pixel ×.5pixel to 24.1pixel ×.1pixel the NWPU VHR-10 data set is a public 10-level geospatial object detection aerial image data set, and comprises 10 categories of "airplane, ship, storage tank, basebaldhead, tenis, bankbal court, group track field, harbor, bridge and vessel", since most of the target categories in the data set are large in size, such as "tenis court, basebal track, group field", the "vessel" is selected as a single-category detection target, each image contains 11 targets on average, about twice as many as the VEDAI data set, the target size of the data set is between 19pixel × and 19pixel 94, and the target size of the data set is different from 19pixel × pixel 94, and 3, three kinds of remote sensing target and 3, three kinds of images are obtained by using a remote sensing algorithm.

To verify the recognition performance of the improved network, the improved network is compared with a YO L Ov3 network model, and the experiment adopts images with two sizes of 512pixel × 512pixel and 416pixel × 416pixel as input, training and testing are carried out on a 1080GPU workstation with a CPU of 2.10GHz and a display of 16GB, a deep learning framework of Pytrch, and a data set is enhanced and expanded by image rotation, clipping and the like.

Example 5

The results of the present example are analyzed, and table 1 shows the results of the model test on the VEDAI dataset, wherein the test accuracy and speed on the 416pixel × 416pixel resolution image are 47.7% and 42.3F/S, respectively, the test accuracy is improved by 5% compared with the original YO L Ov3 model, and the test speed is almost unchanged, and the test accuracy and speed on the 512pixel × 512pixel resolution image are 66.8% and 38.4F/S, respectively, the test accuracy is improved by 12.8% compared with the original YO L Ov3 model, and the test speed is almost unchanged.

TABLE 1 comparison of detection Performance on VEDAI datasets

Table 1Comparison of detection performance on VEDAI dataset

Experiment 2:

table 2 shows the model detection results on the single-type dataset VEDAI of the NWPU VHR, the detection precision and the speed are respectively 75.5 percent and 19.2F/S on the image with the resolution of 416pixel × 416pixel, compared with the original YO L Ov3 model, the detection precision is improved by 1.1 percent, and the detection speed is almost unchanged, the detection precision and the speed are respectively 88.3 percent and 40.0F/S on the image with the resolution of 512pixel × 512pixel, compared with the original YO L Ov3 model, the detection precision is improved by 3.2 percent, and the detection speed is almost unchanged.

TABLE 2 comparison of detection Performance on NWPU VHR dataset

Table 2Comparison of detection performance on NWPU VHR dataset

Fig. 2 and fig. 3 are graphs of detection results of 512pixel × 512pixel input sizes, a first behavior original image, a second behavior original network detection image, and a third behavior improved YO L Ov3 detection image, wherein viewed from the longitudinal direction, (a) the improved network in the group of experiments can solve the problem of missed detection of the original network, (b) the group of experiments show that the improved network can solve the problem of false detection of the original network, and (c) the group of experiments show that the improved network can enable the positioning of a prediction frame to be more accurate, and the improved network improves the small target recognition rate of the remote sensing image.

Example 6

In this embodiment, YO L Ov3 uses 8 times of downsampling feature maps output by a backbone network to detect small targets, when the resolution of a target in a remote sensing image is lower than 8pixel × pixel, the feature of the detected target is almost not existed in the output feature maps, and in order to enable the network to obtain the position information of more small targets, as shown in fig. 3, the first downsampling layer of the network is changed into two convolutional layers of 3 × 3 to retain more small target position feature information.

In conclusion, the method solves the problems of low target detection rate, high false alarm rate and low detection speed of the minimum remote sensing image in the prior art.

The above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. The method for detecting the target of the improved YO L Ov3 in the minimal remote sensing image is characterized by comprising the following steps:

s3 fusing the two-way combined pyramid feature layers;

2. The method for detecting the object of the modified YO L Ov3 in the S1, wherein the convolutional layer feature output of the last layer of the YO L Ov3 network is fused with the convolutional layer feature output of the next previous layer to form a top-down pyramid feature layer.

3. The method for detecting the object of the modified YO L Ov3 in the S2, wherein the convolutional layer feature output of the YO L Ov3 shallow layer is combined with the convolutional layer feature output of the next layer to form a bottom-up pyramid feature layer.

4. The method for detecting the target of the improved YO L Ov3 of claim 1, wherein in S4, the first downsampling layer of the YO L Ov3 network is changed to two convolution layers of 3x3, so that the YO L Ov3 network retains more small target position feature information in the initial stage.

5. The improved YO L Ov3 method for object detection of extremely small remotely sensed images as recited in claim 1, wherein said method adds an additional bottom-up, cross-connected path on the FPN module.

6. An object detection device of minimal remote sensing image with improved YO L Ov3, comprising an FPN module, a memory, a processor and a computer program stored on the memory and operable on the processor, wherein when the computer program is executed by the processor, the object detection device of minimal remote sensing image with improved YO L Ov3 as claimed in any one of claims 1 to 5 is realized.

7. The improved YO L Ov3 remote sensing image target detection device as recited in claim 6 wherein additional bottom-up, cross-connect paths are added to said FPN module.

8. A storage medium having stored thereon a computer program which, when executed by a processor, implements the method for object detection of extremely small remotely sensed images of improved YO L Ov3 as claimed in any one of claims 1 to 5.