CN115880588A

CN115880588A - Two-stage unmanned aerial vehicle detection method combined with time domain

Info

Publication number: CN115880588A
Application number: CN202111070360.5A
Authority: CN
Inventors: 杨小伟; 杨雪; 王松波; 杨鹤猛; 臧博琦; 杨晓斌
Original assignee: State Grid Tianjin Electric Power Co Chengxi Power Supply Branch; State Grid Corp of China SGCC; State Grid Tianjin Electric Power Co Ltd
Current assignee: State Grid Tianjin Electric Power Co Chengxi Power Supply Branch; State Grid Corp of China SGCC; State Grid Tianjin Electric Power Co Ltd
Priority date: 2021-09-13
Filing date: 2021-09-13
Publication date: 2023-03-31

Abstract

The invention discloses a two-stage unmanned aerial vehicle detection method combined with a time domain, which comprises the following steps: step 1: completing the draw data set and image preprocessing steps; step 2: dividing a data set and constructing an unmanned aerial vehicle detection algorithm model; and step 3: training the constructed unmanned aerial vehicle detection algorithm model: and 4, step 4: and evaluating the constructed unmanned aerial vehicle detection algorithm model to perform unmanned aerial vehicle detection. According to the invention, the candidate area is extracted by using image segmentation in the first stage, the low-resolution image of the overlapped area is cut in the image before the image enters the feature extraction network in the first stage, in the detection in the second stage, the optical flow gradient is calculated by using the images of continuous multiple frames to obtain a new candidate area, and the feature map containing the time domain is extracted by combining the candidate area in the first stage, so that the missing detection and the false detection of the detection in the first stage are reduced, and the accuracy of the unmanned aerial vehicle detection is improved.

Description

Two-stage unmanned aerial vehicle detection method combined with time domain

Technical Field

The invention relates to the technical field of object detection and identification, in particular to a two-stage unmanned aerial vehicle detection method combining a time domain.

Background

Unmanned aerial vehicles have been widely used in agriculture, wildfire fighting, photography, security and other industries. Due to the large-scale application of unmanned aerial vehicles and the rapid development of computer vision technologies, the technologies for object detection, tracking and detection through videos acquired by unmanned aerial vehicles tend to be mature. The unmanned aerial vehicle brings convenience to people, and meanwhile, the problems of harming public safety, snooping personal privacy, divulging key area information and the like also occur. Therefore, detection of drones becomes particularly important in certain scenarios.

The method for unmanned aerial vehicle detection can be divided into a signal processing method and a visual processing method according to the signal frequency domain and the signal time domain. In the early stage of detecting the unmanned aerial vehicle by using the signal processing method, a researcher with signal professional knowledge needs to do a lot of preprocessing work, and the communication frequency bands and modulation modes of the unmanned aerial vehicle are different, so that the time and labor cost spent on the unmanned aerial vehicle detection method based on signal processing are too high. In the visual processing method, because the unmanned aerial vehicle occupies too few pixels in a video frame or an image, when the unmanned aerial vehicle is detected by using a target detection algorithm based on a prior frame, the problem of missed detection is easy to occur. In addition, in the background that other small targets can also appear, when unmanned aerial vehicle distance is far away, small object and unmanned aerial vehicle are similar in appearance, cause the condition that the detection algorithm appears the false retrieval easily. In addition, when a high-resolution image is calculated through a detection model, the scale of the image needs to be firstly scaled to a certain fixed size, so that the pixels of the unmanned aerial vehicle in the image are further reduced, and the difficulty of unmanned aerial vehicle detection is increased.

Currently, it has also been proposed to detect moving drones by subtracting background images and then identify drones using deep-learning classifiers, using kalman filtering to improve detection, but this approach uses a large number of parameters and thresholds, and the reliance on background subtraction for moving object detection results in a large number of false detections.

In summary, the problem of how to accurately detect the detection error and missing caused by the complex background and the undersize of the pixels of the unmanned aerial vehicle in the image is a hotspot problem in the research field.

Disclosure of Invention

The invention aims to provide a two-stage unmanned aerial vehicle detection method combined with a time domain, which can effectively ensure that an unmanned aerial vehicle in an image can be accurately detected under a complex background and reduce the possibility of false detection and missing detection.

In order to realize the purpose of the invention, the technical scheme provided by the invention is as follows:

a two-stage unmanned aerial vehicle detection method combined with a time domain comprises the following steps:

step 1: completing the draw data set and image preprocessing steps;

step 2: dividing a data set and constructing an unmanned aerial vehicle detection algorithm model;

in the step 2, constructing an unmanned aerial vehicle detection algorithm model comprises a first stage and a second stage, in the first stage, performing pixel classification on image pixels, giving repeated frame areas, taking a modified Resnet34 residual network as a feature extraction network to obtain a feature map, pooling an output feature map by using a spatial pyramid to obtain a feature map containing abstract semantic information and local information, obtaining a candidate area containing possible unmanned aerial vehicles, and then using a channel attention and pixel attention network;

in the second stage, tracking the motion of the unmanned aerial vehicle in continuous multi-frame videos by using optical flow gradients to obtain a new candidate area in the image, and obtaining all candidate areas where the unmanned aerial vehicle possibly exists by combining the output of the first stage; tracking N frames before and after a video, segmenting the video into video blocks with fixed sizes, extracting characteristics including time domains in the video blocks by using an I3D network, pooling by using a spatial pyramid, and using a channel attention network and a pixel attention network;

and 3, step 3: training the constructed unmanned aerial vehicle detection algorithm model:

and 4, step 4: and evaluating the constructed unmanned aerial vehicle detection algorithm model to perform unmanned aerial vehicle detection.

In the step 1, a camera is used for obtaining a video stream containing the unmanned aerial vehicle, the video stream is segmented according to a time sequence, images are labeled according to a task detected by the unmanned aerial vehicle, a Drone data set is completed, and the unmanned aerial vehicle data set is preprocessed.

In step 1, the image preprocessing comprises the following steps: image segmentation, image scaling, image enhancement and normalization;

the image segmentation is used for segmenting the high-resolution image into the low-resolution image so as to reduce the loss of unmanned aerial vehicle pixels during image scaling; the image scaling is used for scaling the segmented image without changing the length-width ratio of the unmanned aerial vehicle so as to match the input scale and the dimension of the model; the image enhancement is used for expanding scenes and comprises luminosity change and geometric change of images, the luminosity change refers to the change of pixels of the images and mainly refers to saturation change, contrast change and brightness change, the geometric change refers to the change of the scales of the images and mainly refers to horizontal or vertical overturning and image rotation of the images, and the normalization operation is used for normalizing image data from 0-255 to 0-1. The interpolation method of image scaling is nearest neighbor interpolation, bilinear linear interpolation and bicubic interpolation.

In step 2, the modified Resnet34 refers to that all four feature extraction blocks are used to extract features, then the scales of the extracted last three feature maps are scaled by using an upsampling operation, then the scaled dimensions are spliced, and then the dimensions of the spliced feature maps are changed by using a convolution of 1 × 1.

The channel attention network is used for giving more weight to the feature map and suppressing the features with less information, and specifically is realized by multiplying the attention vector by the channel of the feature map.

The pixel attention network is used for generating an attention matrix pixel by pixel, distributing more weight to a space position corresponding to the unmanned aerial vehicle, distributing less weight to a non-unmanned aerial vehicle area, and executing element multiplication and addition of pixel attention masks on all convolution mapping channels in sequence in order to suppress background information.

In step 3, the new candidate area is obtained by calculating the optical flow gradient of the continuous 3 frames of video, and the optical flow gradient calculation is to detect the key points in the 3 frames of images by using a key point detection algorithm, then calculate the forward and backward optical flows of the key points, expand the motion boundary thereof, and obtain the new candidate area.

The training and evaluation of the unmanned aerial vehicle detection model adopts a check and verification method: the training data set is divided into n groups, one group is selected as a test set each time, other groups are selected as training sets, the training data in the training sets are sequentially used for training the model, and the data in the test sets are used for evaluating the trained unmanned aerial vehicle detection model.

Compared with the prior art, the invention has the beneficial effects that:

the method combines the time domain to detect the unmanned aerial vehicle in two stages, extracts the candidate region by using image segmentation in the first stage, cuts the image into the low-resolution image of the overlapped region before the image enters the feature extraction network in the first stage, and avoids the pixel loss of the unmanned aerial vehicle; local information loss caused by network deepening is avoided by modifying the Resnet34, and pyramid pooling of four different pooling cores is used at the rear end of the output feature diagram to obtain multi-scale features; a channel attention network and a pixel attention network are used for better extracting candidate areas in the image; in the second stage of detection, the optical flow gradient is calculated by using images of continuous multiple frames to obtain a new candidate region, an I3D feature extraction network is used for extracting a feature map containing a time domain by combining the candidate region in the first stage, and then an attention network is used, so that the missing detection and the false detection of the first stage of detection are reduced, and the accuracy of unmanned aerial vehicle detection is improved.

Drawings

FIG. 1 is a flow chart of an embodiment of the method of the present invention.

FIG. 2 is a flow chart of data preprocessing and image enhancement in an embodiment of the method of the present invention.

Fig. 3 is a flowchart of the method of the present invention for constructing the model of the unmanned aerial vehicle detection algorithm in the embodiment.

Fig. 4 is a diagram of a channel attention network structure used in an embodiment of the method of the present invention.

FIG. 5 is a diagram of a pixel attention network used in an embodiment of the method of the present invention.

Fig. 6 is a network structure diagram of a first-stage feature extraction network according to an embodiment of the method of the present invention.

FIG. 7 is a flowchart illustrating a second stage of extracting time domain features according to an embodiment of the present invention.

Detailed Description

It should be noted that, in the present application, the embodiments and features of the embodiments may be combined with each other without conflict.

The invention is described in further detail below with reference to the figures and specific examples. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments according to the present application. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, and it should be understood that when used in this specification the singular forms "a", "an" and/or "the" include "specify the presence of stated features, steps, operations, elements, or modules, components, and/or combinations thereof, unless the context clearly indicates otherwise.

The invention is further described with reference to the following examples and figures.

The invention is completed based on a PyTorch deep learning framework, and the hardware configuration is as follows: intel (R) Core (TM) i7-9700KF CPU @3.60GHz 8-Core CPU,16G memory, NVIDIA GeForce RTX2070SUPER,8GB. Software configuration: ubuntu16.04 system, python 3.6.

The embodiment of the two-stage unmanned aerial vehicle detection method combined with the time domain, as shown in fig. 1, comprises the following steps:

step 1: drone data set and image pre-processing

As shown in the flow of fig. 2 (a), a camera is used to obtain a video stream containing an unmanned aerial vehicle, the video stream is segmented according to a time sequence, the unmanned aerial vehicle in the image of the current frame is labeled according to the task detected by the unmanned aerial vehicle, and a Drone data set of the data set is completed. During model training, the image enhancement algorithm and the preprocessing method in fig. 2 (b) are used for the Drone data set.

(1) The video sequence comprises an unmanned aerial vehicle, the video is segmented according to the time sequence, the image of the current frame is named according to the time sequence, the unmanned aerial vehicle is labeled by using Labelme labeling software, the image is labeled according to the detected task, and the production of a Drone data set is completed.

(2) Firstly, image cutting is carried out on the image of the current frame, and then image preprocessing operation of scaling operation and normalization which keep the length-width ratio is carried out. Random image rotation, random image inversion (horizontal and vertical), random image cropping, random image contrast change, random image brightness change and random image saturation change are used during the training of the model, so that the robustness of the model is improved.

Step 2: data set division and unmanned aerial vehicle detection algorithm model construction

And dividing the data set of the Drone into a training set and a testing set, wherein the division ratio is 0.8.

The method comprises the steps of constructing a detection model of the unmanned aerial vehicle, wherein the construction of the detection model is mainly divided into two parts, the first part is used for carrying out pixel classification on image pixels to obtain a candidate area containing possible unmanned aerial vehicles, then a motion boundary is obtained by utilizing the motion track of the unmanned aerial vehicle in the second stage to obtain a new candidate area, the newly appeared candidate area is combined, the characteristics of a time domain are extracted from a video block containing the track of the candidate area by using I3D, and the space pyramid pooling operation, a channel attention network and a pixel attention network are added at the rear ends of the two detection stages to realize the model construction of the unmanned aerial vehicle detection algorithm.

(1) As shown in fig. 3, assuming that the image dimension of the current frame is 1x1920x1080x3, in the detection of the first stage, the segmentation in the image preprocessing is used to be 9 blocks, that is, the input dimension of the image is 9x640x480x3, so as to avoid the problem of loss of pixels of the unmanned aerial vehicle due to scaling, before the input data passes through the feature extraction network, the dimension of the input data is 9x224x224x3 by scaling based on an interpolation method, after the input data passes through the feature extraction network of the first stage shown in fig. 6, the dimension of the output feature map is 9x56x56 x56x2048, the feature map processed by the channel attention vector and the output feature map channel is obtained by multiplying, each layer of the feature map is added to the pixel attention moment matrix and then multiplied to obtain the feature map processed by the pixel attention network, then the predicted max is obtained by using the softlayer, and the predicted mask is compared with the real loss value of each pixel calculated by using the smth L1 loss function.

(2) After obtaining the candidate area of the first stage, extracting the extraction features including time domain information by using a feature extraction network of the second stage as shown in fig. 7, wherein an input video block including a time domain is a continuous image of front and back 3 frames for predicting the track of the candidate area, the track of the candidate area is determined by using an optical flow gradient in 7 frames of images, specifically, key points of the front and back two frames of images are extracted, a forward optical flow and a backward optical flow of a current frame are calculated, and a motion boundary is obtained, because the motion boundary at this time can not cover the unmanned aerial vehicle, the range of the motion boundary is enlarged, a new candidate area is obtained, in order to avoid the pixel loss of the unmanned aerial vehicle, the video stream of 7 frames is segmented into 9x7x640x640x1 by using a method of constructing an unmanned aerial vehicle detection model, the problem of the pixel loss of the unmanned aerial vehicle caused by scaling is avoided, before the input data passes through the feature extraction network, the method comprises the steps of scaling input data by interpolation-based method to obtain dimensions of 9x7x224x224x3, using an I3D feature extraction network which is high in calculation speed and small in occupied memory to extract features containing time domain information, enabling the dimensions of output feature graphs to be 9x7x14x14x480, then calculating an average value of a second dimension to obtain a two-dimensional feature graph containing a time domain, enabling the dimensions to be 9x14 x480, then using an upsampling operation with an upsampling factor of 4 to obtain a feature graph with an output dimension of 9x56 x480, using convolution with a convolution kernel of 3x3 to perform feature fusion on the upsampled feature graph to obtain a feature graph with an output dimension of 9x56x56x1024, passing through a spatial pyramid pooling layer and an attention network, and then using a softmax layer to obtain a prediction mask, wherein the prediction mask and a real mask use a Smooth L1 loss function to calculate a loss value of each pixel.

(3) In both the first and second stages of model construction, four different pooled kernel-space pyramids are used, as well as a channel attention network and a pixel attention network.

The channel attention network is structured as shown in fig. 4, and pooled using a MaxPooling layer with a pooling window of 56 × 56 to obtain an output of 9 × 2048, then using a fully connected layer with an output of 256 dimensions, obtaining a 9 × 256 output feature map using Batch Normalization and an activation function leak ReLU, then using a fully connected layer with an output dimension of 512, and using a Sigmoid activation function to obtain a channel attention vector.

The structure of the pixel attention network is shown in fig. 5, conv of the image is a standard 2D convolution, the used Dropout loss rate is 0.15, the Upsample layer is a convolution layer which is expanded to a specified scale by filling the feature map and then operated by a convolution kernel scale of 1x 1. Through the steps in fig. 5, the pixel attention network obtains two outputs, namely a pixel attention matrix and a prediction mask for supervising the pixel attention network, wherein the last dimension of the pixel attention matrix is 1, and the last dimension of the prediction mask is the number of categories.

And step 3: training and evaluating the constructed unmanned aerial vehicle detection algorithm model, and carrying out unmanned aerial vehicle detection by using the constructed unmanned aerial vehicle detection algorithm model.

In the training of the model, smooth L1 is used as a loss function, forward loss is calculated for an input image in the training process, an Adam optimizer is used for calculating a backward propagation gradient and updating parameters of neurons in a network layer, in order to guarantee the stability of the training, the set range of the learning rate is [0.01,0.0001], and the batch quantity of the training is set according to the requirement of a server memory. The training and testing evaluation of the unmanned aerial vehicle detection model adopts a verification method: the training data set is divided into n groups, one group is selected as a test set each time, other groups are selected as training sets, the training data in the training sets are sequentially used for training the model, and the data in the test sets are used for evaluating the trained unmanned aerial vehicle detection model.

It should be noted that the technical means not described in detail in the present application adopt known techniques.

The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, many modifications and adaptations can be made without departing from the principle of the present invention, and such modifications and adaptations should also be considered as the scope of the present invention.

Claims

1. A two-stage unmanned aerial vehicle detection method combined with a time domain is characterized by comprising the following steps:

step 1: completing the draw data set and image preprocessing steps;

in the step 2, constructing an unmanned aerial vehicle detection algorithm model comprises a first stage and a second stage, in the first stage, carrying out pixel classification on image pixels, giving repeated frame areas, taking a modified Resnet34 residual network as a feature extraction network to obtain a feature map, pooling an output feature map by using a spatial pyramid to obtain a feature map containing abstract semantic information and local information, obtaining a candidate area containing possible unmanned aerial vehicles, and then using channel attention and a pixel attention network;

in the second stage, tracking the motion of the unmanned aerial vehicle in continuous multi-frame videos by using optical flow gradients to obtain a new candidate area in the image, and obtaining all candidate areas where the unmanned aerial vehicle possibly exists by combining the output of the first stage; tracking N frames before and after a video, segmenting the video into video blocks with fixed sizes, extracting the characteristics including time domains in the video blocks by using an I3D network, pooling by using a spatial pyramid, and using a channel attention network and a pixel attention network;

and step 3: training the constructed unmanned aerial vehicle detection algorithm model:

2. The method according to claim 1, wherein in step 1, a video stream containing the UAV is obtained by a camera, the video stream is segmented according to a time sequence, an image is labeled according to a task detected by the UAV, a Drone data set is completed, and the UAV data set is preprocessed.

3. The time-domain integrated two-stage UAV detection method according to claim 1, wherein in step 1, said image preprocessing comprises: image segmentation, image scaling, image enhancement and normalization;

the image segmentation is used for segmenting the high-resolution image into the low-resolution image so as to reduce the loss of unmanned aerial vehicle pixels during image scaling; the image scaling is used for scaling the segmented image without changing the length-width ratio of the unmanned aerial vehicle so as to match the input scale and the dimension of the model; the image enhancement is used for expanding scenes and comprises luminosity change and geometric change of images, the luminosity change refers to the change of pixels of the images and mainly refers to saturation change, contrast change and brightness change, the geometric change refers to the change of the scales of the images and mainly refers to horizontal or vertical overturning and image rotation of the images, and the normalization operation is used for normalizing image data from 0-255 to 0-1.

4. The time domain combined two-stage unmanned aerial vehicle detection method according to claim 1, wherein in step 2, the modified Resnet34 refers to using all four feature extraction blocks to extract features, scaling the extracted last three feature maps by using an upsampling operation, splicing the scaled dimensions, and changing the dimensions of the spliced feature maps by using a convolution of 1x 1.

5. The time-domain integrated two-stage UAV detection method according to claim 1, wherein said channel attention network is used to give more weight to the feature map and suppress the less informative features, and is specifically implemented by multiplying the attention vector by the channel of the feature map.

6. The method of claim 1, wherein the pixel attention network is configured to generate an attention matrix pixel by pixel, assign more weight to spatial locations corresponding to drones, assign less weight to non-drone areas, and perform elemental multiplication and addition of pixel attention masks for all successive convolution mapping channels to suppress background information.

7. The method of claim 1, wherein in step 3, a new candidate area is obtained by using optical flow gradient calculation on 3 consecutive frames of video, the optical flow gradient calculation is to use a keypoint detection algorithm to detect keypoints in 3 frames of images, and then calculate forward and backward optical flows of these keypoints to enlarge their motion boundaries, so as to obtain a new candidate area.

8. The time domain integrated two-stage UAV detection method according to claim 1, wherein the training and evaluation of the UAV detection model adopts a verification method: the training data set is divided into n groups, one group is selected as a test set each time, other groups are selected as training sets, the training data in the training sets are sequentially used for training the model, and the data in the test sets are used for evaluating the trained unmanned aerial vehicle detection model.