CN115240089A

CN115240089A - Vehicle detection method of aerial remote sensing image

Info

Publication number: CN115240089A
Application number: CN202210792617.6A
Authority: CN
Inventors: 刘晶红; 朱圣杰; 王波; 左羽佳
Original assignee: Changchun Institute of Optics Fine Mechanics and Physics of CAS
Current assignee: Changchun Institute of Optics Fine Mechanics and Physics of CAS
Priority date: 2022-07-07
Filing date: 2022-07-07
Publication date: 2022-10-25

Abstract

The invention discloses a vehicle detection method of an aerial remote sensing image, which comprises the following steps: step 1, obtaining optical remote sensing image input data; step 2, performing adaptive scale cropping on the image in the input data according to the actual pixel size of the vehicle target in the training data set to obtain an image data set to be trained; step 3, calculating a fixed reference frame based on the marked detection frame in the image data set to be trained, substituting the fixed reference frame into the model and training to obtain model parameters to obtain a trained model; step 4, carrying out self-adaptive cutting on the images to be detected one by one according to the relevant data of the images to be detected; step 5, searching the vehicle target in the cut image to be detected by using the trained model; and 6, splicing the identification results obtained in the step 5, removing the target of the repeated overlapped image part, and obtaining a vehicle detection result. The method can quickly extract the vehicle targets of the large-view-field urban background, accurately and quickly acquire the quantity and position information of the vehicle targets, and effectively reduce the false alarm rate.

Description

Vehicle detection method of aerial remote sensing image

Technical Field

The invention belongs to the technical field of image processing, and particularly relates to a vehicle detection method of an aerial remote sensing image.

Background

Aerial remote sensing usually uses an airplane or a balloon as a working platform, and the flying height is usually several hundreds of meters to dozens of kilometers. The aerial remote sensing has the advantages of large imaging scale, high resolution, accurate geometric correction and the like, and is one of important remote sensing means. In aerial remote sensing images, vehicle detection is an indispensable technology in civil and military monitoring, such as traffic management, city planning and the like. However, vehicle identification is performed by manual interpretation, so that the data utilization rate is low, the information timeliness is poor, and the vehicle identification is easily influenced by physical conditions, spirit and subjective consciousness. Therefore, an efficient and accurate automatic detection technology of the visible light remote sensing image is particularly important for reducing the consumption of human capital through computer vision.

Compared with the common image, the aerial remote sensing image has a unique visual angle. The difficulties of this task are mainly:

1. large field of view: aerial remote sensing images are shot by high-altitude high-resolution imaging equipment, and the obtained images generally have the characteristics of large field of view and high resolution, so that simple down-sampling to the input size required by most algorithms is not appropriate.

2. Larger scale range: due to the fact that the acquisition height of the remote sensing image is different from the sensor parameter, the scales of the similar targets are not consistent. In general, in aerial imaging, objects of interest tend to be very small, densely packed together.

3. The special viewing angle: the aerial image is a top view perspective, which makes the ground target rotation invariant, and the direction angle is an arbitrary value. Thus, the targets do not have the problem of overlap.

4. The background is complex: a large number of objects with characteristics similar to those of vehicle targets exist in remote sensing images of cities, and aerial images are easily affected by weather such as cloud and fog, and the influence of complex meteorological conditions on the aerial images needs to be considered.

Many existing aerial remote sensing image processing methods utilize advanced general image detection models, such as fast R-CNN, deformable R-CNN, YOLOv4 and the like, for detection. Observing the input size of the model, the fast RCNN model would require that the short edge of the input image is 600 pixels, the YOLO model would also adjust the image to 608 × 608 pixels as the input image, and neither of these frames can directly receive the typical input size of the aerial remote sensing image (ITCVD remote sensing dataset: 5616 × 3744 pixels, DOTA remote sensing dataset: about 4000 × 4000 pixels).

To meet the requirements of the standard architecture, direct downscaling of the image is not feasible, since this approach will directly result in the loss of small pixel objects. To solve the above-mentioned problem, the existing algorithm usually first segments the original image. The YLOLT model uses a "sliding window" approach to segment the image and designs a 15% overlap to ensure that all regions can be analyzed. However, the pixel size of the target object depends on the shooting height and camera parameters, and when the original image is cut by using a fixed size, the target pixel still has a large dynamic range, which affects the detection capability of the target.

In addition, some studies have employed long short term memory networks (LSTM) and Spatial Memory Networks (SMN) to enhance target features. For example, the AC-FCN model indicates that information between target objects helps to improve detection capabilities. The training of such methods is often complex and these methods are still simple feed forward networks, prone to loss of feature information. The FA-SSD model improves the ability to extract context information from small targets by using higher level abstraction properties. Although the methods have good effects on some small targets, the methods are not suitable for aerial images due to the fact that the methods do not have real-time performance, and detection efficiency is low.

In summary, the existing vehicle detection method of the aerial remote sensing image has the following disadvantages:

1. the existing target identification technology of aerial remote sensing images adopts a segmentation method with a fixed pixel size to segment each image and then detect the segmented image. Such a division method is difficult to unify the target pixel size, and the target pixel size varies in a wide range, which affects the recognition accuracy.

2. The conventional aerial remote sensing image usually directly adopts a universal detector, is difficult to achieve higher detection precision and higher detection efficiency, and cannot be well applied to actual engineering projects. Due to the fact that the scale scaling interval of the target in the remote sensing image is large, the multi-scale fixed reference frame adopted by the universal detector cannot be well matched with the scale of the target, and the detection capability of the model is poor.

3. The image source of the existing aviation remote sensing image vehicle detection data set is mainly captured by aerospace satellite images such as Google Earth, the image shot by an actual airplane is lacked, and the data set does not have flight and shooting data information.

4. For common cloud and fog phenomena in aerial remote sensing images, two solutions exist: one is to improve the image quality by a cloud and fog removal algorithm, the traditional defogging algorithm includes a Dark Channel Prior method (DCP), a Maximum Contrast Method (MC), a Color attenuation Prior method (CAP), and the like, and a model for directly defogging an image by an idea based on deep learning is also provided, but this greatly reduces the detection efficiency and is difficult to perform subsequent detection tasks. The other scheme is that the pictures shielded by the cloud and fog are directly sent to a network for training, and the characteristics are constrained through an objective function, so that the target recognition rate is improved. The existing cloud and fog removing algorithm usually generates a halo effect or a color serious distortion phenomenon, and the detection effect is poor. However, the existing data sets have no pictures in the meteorological environments such as cloud and mist, and the designed model is difficult to deal with the complex meteorological environments such as cloud and mist due to lack of data support.

Disclosure of Invention

The present invention aims to provide a vehicle detection method of an aerial remote sensing image, which aims at overcoming the defects of the prior art.

In order to solve the technical problems, the technical scheme adopted by the invention is as follows:

a vehicle detection method of an aerial remote sensing image is characterized by comprising the following steps:

step 1, obtaining optical remote sensing image input data;

step 2, performing adaptive scale cropping on the image in the input data according to the actual pixel size of the vehicle target in the training data set to obtain an image data set to be trained;

step 3, calculating a fixed reference frame based on the marked detection frame in the image data set to be trained, substituting the fixed reference frame into the model and training to obtain model parameters to obtain a trained model;

step 4, performing self-adaptive cutting on the image to be detected one by one according to the relevant data of the image to be detected;

step 5, searching the vehicle target in the cut image to be detected by using the trained model;

and 6, splicing the identification results obtained in the step 5, removing the target of the repeated overlapped image part, and obtaining a vehicle detection result.

As a preferable mode, in the step 2, if the imaging related parameter of the training data set is clear, the size of the image to be segmented is calculated by multiplying the theoretical pixel value of the vehicle target by a scaling coefficient k; otherwise, counting the length and width dimensions of the marked target pixels in each picture in the training data set, calculating the root-mean-square, and multiplying the median by the amplification ratio k to obtain the side length of the cutting pixel of the corresponding picture; after all the pictures in the training set are segmented, obtaining square images with different sizes, and unifying all the segmented image sizes into the input size of the model through an interpolation scaling method to obtain an image data set to be trained.

Preferably, the training data set image-related parameters include at least one of camera imaging focal length, aerial height and pixel size parameters.

In a preferable mode, in the step 2, when cropping, the overlapping area is set to be N times of the target pixel in the training data set image, where N is greater than 1.

Preferably, in step 3, the sizes of the vehicle targets in the image data set to be trained processed in step 2 are clustered, and typical values of a plurality of target sizes are obtained and substituted into the model as a fixed reference frame.

As a preferable mode, in the step 3, a k-means + + method is adopted to cluster the sizes of the vehicle targets in the image data set to be trained, which is processed in the step 2.

As a preferable mode, in the step 4, the theoretical pixel size of the vehicle target is calculated through the camera imaging focal length, the aerial photography height and the pixel size parameter for shooting the image to be detected, and the size of the image to be segmented is calculated by multiplying the estimated value by the proportionality coefficient k.

As a preferable mode, in the step 5, a convolutional neural network is adopted in combination with a global attention mechanism to search for a vehicle object in the clipped image to be detected.

As a preferable mode, in the step 3, a data enhancement method is adopted to improve robustness during model training.

As a preferable mode, in the step 6, the removing the target of the repeated overlapped image part includes: and finding the optimal target boundary box by adopting a non-maximum value inhibition method, eliminating redundant boundary boxes and outputting a final result.

Compared with the prior art, the invention has the following beneficial effects: the method combines the convolutional neural network with the attention mechanism, more environment information is fused in the characteristic diagram, the relevance of the model to the target and the environment information is enhanced, and further the global significance detection and the target confirmation of the large-field-of-view urban optical remote sensing image vehicle target are assisted. During model training, the robustness of a model network is enhanced in a self-confrontation mode, and the performance of the model is enhanced by expanding a data set in an affine transformation and cloud simulation mode. The number of the fixed reference frames is reduced and the calculation mode of the fixed reference frames is optimized according to the characteristics of the actual application scene, so that the target detection efficiency and the detection precision are improved. And slicing the image to be detected according to the self-adaptive image segmentation size, unifying the target size, and optimizing the size design of the overlapping area, thereby realizing rapid detection. And finally, splicing and recombining the detected targets according to the original cutting parameters, finding the optimal target boundary box by adopting a non-maximum inhibition method, eliminating redundant boundary boxes and finally outputting a detection result. The method has the advantages of simple parameter setting and low calculation complexity, can quickly extract the vehicle targets of the large-view-field urban background, accurately and quickly acquire the quantity and position information of the vehicle targets, effectively reduces the false alarm rate, and provides a feasible method for detecting the vehicle targets of the aerospace optical remote sensing image.

Drawings

FIG. 1 is a schematic diagram of a detection process of the vehicle detection method of the aerial remote sensing image provided by the invention.

FIG. 2 is a schematic diagram of an application environment of the present invention.

Fig. 3 is a schematic structural diagram of a detection model according to the present invention.

Fig. 4 is a schematic diagram of adaptive segmentation effect.

FIG. 5 is a schematic diagram of a rotation angle detection frame in the detection model.

Fig. 6 is a schematic diagram of a cloud simulation principle according to the present invention.

Fig. 7 is a schematic diagram of the convolution structure of the GR module.

Detailed Description

Embodiments of the present invention will be described in further detail below with reference to the accompanying drawings.

The existing target recognition technology of aerial remote sensing images adopts a segmentation method with a fixed pixel size to segment each picture and then detect the picture. Such a division method is difficult to unify the target pixel size, and the target pixel size varies in a wide range, which affects the recognition accuracy. Based on the defect, the invention provides an adaptive image segmentation method based on aircraft and camera parameters, which limits the size of a target object within a small range by dynamically adjusting the size of crops. The method plays an important role in improving the speed and the precision of model detection.

Secondly, the conventional aerial remote sensing image usually adopts a universal detector directly, so that higher detection precision and higher detection efficiency are difficult to achieve, and the conventional aerial remote sensing image cannot be well applied to actual engineering projects. Due to the fact that the scale scaling range of the target in the remote sensing image is large, the multi-scale fixed reference frame adopted by the universal detector cannot be well matched with the scale of the target, and the detection capability of the model is poor. Therefore, the invention combines the self-adaptive image segmentation method to unify the similar target scales, and the model design adopts a more targeted fixed reference frame to improve the detection precision of the model. The model designed by the invention adopts the CSPDarknet53 network as a main framework for vehicle detection, optimizes a feature fusion structure, improves the detection capability of small targets, deletes a redundant structure of a general detector in actual use, and improves the detection efficiency of the model.

The image source of the existing aviation remote sensing image vehicle detection data set is mainly captured by aerospace satellite images such as Google Earth, the image shot by an actual airplane is lacked, and the data set does not have flight and shooting data information. To this end, the invention provides an aerial remote sensing image dataset (RO-ARS) with a rotation detection frame. The flight height and the camera focal length are marked on the pictures in the data set, and the marking data of the detection frame comprise the length, the width, the center position and the rotation angle of the vehicle. In order to improve the authenticity of the data set, the data set comprises more urban background images of complex meteorological environments.

In addition, for the common cloud and fog phenomenon in aerial remote sensing images, two existing solutions are adopted: one method is to improve the image quality through a cloud and fog removal algorithm, the traditional defogging algorithm includes a Dark Channel Prior method (DCP, dark Channel Prior), a Maximum Contrast method (MC, maximum Contrast), a Color attenuation Prior method (CAP, color suppression Prior) and the like, and a model for directly defogging an image through a deep learning-based thought is also provided, but the detection efficiency is greatly reduced, and subsequent detection tasks are difficult to perform. The other scheme is that the pictures shielded by the cloud and mist are directly sent to network training, and the characteristics are constrained through a target function, so that the target recognition rate is improved. The existing cloud and fog removing algorithm usually generates a halo effect or a color serious distortion phenomenon, and the detection effect is poor. However, the existing data sets have no pictures in meteorological environments such as cloud and fog, and the designed model is difficult to deal with complex meteorological environments such as cloud and fog due to lack of data support. In order to enrich cloud and fog images of the proposed data set, the invention carries out cloud and fog simulation on the proposed data set by using Berlin noise (Perlin noise) and Fractal Brownian Motion (Fractal Brown Motion) according to the principle of Retinex theory on optical imaging under cloud and fog.

The invention relates to a complete method for rapidly detecting vehicles by responding to aerial remote sensing images under complex backgrounds. The invention provides a single-scale fast convolution neural network vehicle detection method, which aims at the problems that a large-field urban aerial image has the characteristics of high resolution and large field of view, and the traditional method has low detection efficiency, poor reliability and the like in the aerial optical remote sensing target detection.

The detection process of the invention is shown in figure 1 and comprises the following steps:

step 1, obtaining large-view-field optical remote sensing image input data under an urban background;

step 2, carrying out self-adaptive scale cutting on the image in the input data according to the actual pixel size of the vehicle target in the training data set to obtain an image data set to be trained;

step 3, calculating a fixed reference frame based on the marked detection frame in the image data set to be trained, substituting the fixed reference frame into the model and training to obtain model parameters, and obtaining a trained model;

step 4, carrying out self-adaptive cutting on the images to be detected one by one according to the relevant data of the images to be detected;

The large-field-of-view optical remote sensing image under the urban background obtained in the step 1 has the characteristic of a large field angle, and the applicable background of the invention is shown in fig. 2. The number of pixels and the proportion of the vehicle target in the remote sensing image are small, the target has small characteristic size, the urban background environment is complex, and a large amount of similar target interference exists. In addition, the target has the characteristics of complete rotation invariance, unbalanced target distribution and local density.

Step 2, extracting image slices according to the pixel size of the vehicle target in the training data set to obtain an image data set to be trained, comprising: for a data set with camera imaging focal length, aerial photography height and pixel size parameters, the size of the image to be segmented can be calculated by multiplying the theoretical pixel value of a vehicle target by a proportionality coefficient k, wherein the k value is matched with the model, and the default value is 15. For a training data set lacking relevant data support, the invention designs the following processing rules: firstly, counting the length and width of a target pixel marked in each picture of a training set, calculating the root mean square, and multiplying the median by an amplification ratio k to obtain the side length of a cutting pixel of the corresponding picture. After all the pictures in the training set are segmented, the square images with different sizes are obtained, and then the scales of all the segmented images are unified into the input size (608 multiplied by 608 pixels) of the model through an interpolation scaling method, so that the data set to be trained is obtained.

The self-adaptive segmentation size obtained by calculation adopted by the invention can enable target pixels to be unified as much as possible, is convenient for target detection and identification, and is also beneficial to simplifying a model structure, improving the detection efficiency of a network and reducing the problems of false detection and missed detection, as shown in figure 3.

In the step 2, in order to ensure that the target is not segmented during the cropping, an overlapping area is set by N times of the target pixels in the training data set image, so as to ensure the detection effect of the edge of the segmented image. Wherein N is greater than 1, e.g., N is 1.5.

In the step 3, the sizes of the vehicle targets in the image data set to be trained processed in the step 2 are clustered by adopting a k-means + + method, and typical values of 3 target sizes are obtained and used as fixed reference frames to be substituted into the model. Training the proposed method by using a data set to be trained will obtain a convolutional neural network model capable of identifying a vehicle target.

In the step 4, the theoretical pixel size of the vehicle target is calculated through the camera imaging focal length, the aerial photography height and the pixel size parameters for shooting the image to be detected, and the size of the image to be segmented is calculated by multiplying the estimated value by a proportionality coefficient k.

The SSRD-Net network in the step 5 can extract information of the coordinate position, the length, the width and the rotation angle of the vehicle target, in order to enhance the robustness and the anti-interference performance of a detected target area, a convolution neural network is adopted to combine with a global attention mechanism to carry out global significance detection on the vehicle target in the large-field-of-view urban optical remote sensing image, and the model structure is shown in FIG. 3, and comprises the following steps:

step 5-1: using CSPDarknet53 network based on CNN convolutional neural network as main network, extracting image feature to obtain feature map S with different convolutional depths _72×72 、S _38×38 、S _19×19 。

Step 5-2: will S _72×72 、S _38×38 、S _19×19 Bringing the network into a network neck-shaped structure, fusing and identifying feature maps with different network depths, expanding the sensing field of the network by adopting a feature Pyramid Pooling (SPP), designing a GR (generalized likelihood tracking) module by adopting a modified multi-head self-attention mechanism to further extract information of the feature maps, and obtaining an output feature map S of the network _n . The structural design structure of the GR module is shown in fig. 7. The network eliminates the detection branch which is not matched with the target size under the task environment, reduces the complexity of the model and improves the detection efficiency of the model.

Step 5-3: will the characteristic diagram S _n Inputting the data into a head structure of the model, and normalizing the scale of the data into a YOLO format through two convolution modules to obtain the final output of the network.

In the training process in the step 3, according to the actual aviation detection condition, the invention designs various data enhancement methods to improve the robustness of the network, and the method comprises the following steps:

step 3-1: aiming at the problem that the images of the aviation data set are few, an affine transformation algorithm is adopted:

the (x, y) is the original pixel point coordinate, and the (u, v) is the point coordinate after affine transformation, so that the number of images is increased, and the sensitivity of the model to the vehicle angle is reduced by utilizing the rotation transformation to solve the problem of uneven rotation angle distribution of a target vehicle existing in data sets such as ITCVD (integrated transform of chemical vapor deposition) and the like.

Step 9-2: aiming at the problem of single meteorological condition in the existing data set, as shown in fig. 6, according to the Retinex cloud imaging theory:

I(x)＝J(x)t(x)+A(1-t(x))

where I is the observed intensity, J is the scene radiation, a is the global atmospheric light, and t is the medium transport describing the portion of light that is not scattered and reaches the camera. The goal of the regression model is to add a on a J basis at a ratio of t. The method for designing Berlin noise (Perlin noise) and Fractal Brownian Motion (Fractal Brown Motion) is adopted to simulate the cloud and mist, and the detection capability of the model on the target under the cloud and mist condition is enhanced.

Step 6, stitching the identified target results into an original image, and removing repeated targets, comprising the following steps:

step 6-1: converting the positions of target information detected by the slice images from the same image into an original image to obtain a large number of target candidate frames, wherein the scaling coefficients and the overlapping areas are consistent with those in the step 3;

step 6-2: after the target information is converted, a large number of candidate boxes are generated at the same target position, the candidate boxes may overlap with each other, a Non-Maximum Suppression (NMS) method is used to find the optimal target bounding box, eliminate redundant bounding boxes, and output the final result.

The result of the identified target in step 6 is shown in FIG. 5, which includes 5 parts, the target x-direction center coordinate b _x Y-direction center coordinate b _y The length L of the long side of the detection frame, the length S of the short side of the detection frame and the rotation angle theta of the detection frame. Wherein, the rotation angle is defined as the included angle between the long edge and the positive direction of the y axis. To make the modelThe output is normalized, the coordinates of the center point P are represented by the relative coordinates [ σ (x), σ (y), t ] of the corresponding grid _l ,t _s ]Expressed, the calculation formula is as follows:

b _x ＝σ(t _x )+i·c _x

b _y ＝σ(t _y )+j·c _y

wherein (c) _x ,c _y ) The size of the divided grid is shown, (i, j) is the position coordinate of the grid where the detection frame is located in the whole graph, (p) _l ,p _s ) A preset long and short side of the fixed reference frame is shown as fig. 5.

The invention designs a set of complete vehicle target detection technology of aerial remote sensing images. The method combines the self-adaptive image segmentation of the aircraft and the camera parameters to enable the target size to be uniform, and the segmentation method can enable the model to have lower complexity and accelerate the detection efficiency of the model. In the design of the model, a fixed reference frame is set according to the actual target size, and redundant detection structures in the universal detector are removed in the feature fusion process, so that the detection speed is improved. For the problem that the complex meteorological environments such as cloud and fog are lacked in the data set of model training, a cloud and fog simulation program is designed, and the detection accuracy of the model in the complex meteorological environment is enhanced.

Simulation shows that the method is practical and effective, achieves the expected effect, has higher detection efficiency and detection precision than the conventional detection method, and has stronger robustness to the complex meteorological environment.

While the embodiments of the present invention have been described in connection with the drawings, the present invention is not limited to the above-described embodiments, which are intended to be illustrative rather than restrictive, and many modifications may be made by one skilled in the art without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. A vehicle detection method of an aerial remote sensing image is characterized by comprising the following steps:

step 1, obtaining optical remote sensing image input data;

2. The vehicle detection method of the aerial remote sensing image according to claim 1, wherein in the step 2, if the relevant imaging parameters of the training data set are definite, the size of the image to be segmented is calculated by multiplying the theoretical pixel value of the vehicle target by a proportionality coefficient k; otherwise, counting the length and width of the marked target pixel in each picture in the training data set, calculating the root mean square, and multiplying the median by the amplification ratio k to obtain the side length of the cutting pixel of the corresponding picture; after all the pictures in the training set are segmented, obtaining square images with different sizes, and unifying all the segmented image sizes into the input size of the model through an interpolation scaling method to obtain an image data set to be trained.

3. The method of claim 2, wherein the training data set image-related parameters include at least one of camera imaging focal length, aerial height, and pixel size parameters.

4. The method for detecting the vehicle according to any one of claims 1 to 3, wherein in the step 2, when cropping, the overlapping area is set to be N times of a target pixel in the training data set image, wherein N is greater than 1.

5. The vehicle detection method of the aerial remote sensing image according to claim 1, wherein in the step 3, the sizes of the vehicle targets in the image data set to be trained processed in the step 2 are clustered, and typical values of a plurality of target sizes are obtained and used as fixed reference frames to be substituted into the model.

6. The vehicle detection method of the aerial remote sensing image according to claim 5, wherein in the step 3, a k-means + + method is adopted to cluster the sizes of the vehicle targets in the image data set to be trained, which is processed in the step 2.

7. The vehicle detection method according to claim 1, wherein in step 4, the theoretical pixel size of the vehicle target is calculated by parameters of camera imaging focal length, aerial photography height and pixel size for shooting the image to be detected, and the estimated value is multiplied by a proportionality coefficient k to calculate the size of the image to be segmented.

8. The vehicle detection method of the aerial remote sensing image according to claim 1, wherein in the step 5, a vehicle target in the cut image to be detected is searched in a mode of combining a convolutional neural network with a global attention mechanism.

9. The vehicle detection method of aerial remote sensing images according to claim 1, wherein in the step 3, a data enhancement method is adopted to improve robustness during model training.

10. The vehicle detection method according to claim 1, wherein the step 6 of removing the object of the repeated overlapping image portion includes: and finding the optimal target boundary box by adopting a non-maximum value inhibition method, eliminating redundant boundary boxes and outputting a final result.