CN111460968A

CN111460968A - Video-based unmanned aerial vehicle identification and tracking method and device

Info

Publication number: CN111460968A
Application number: CN202010231230.4A
Authority: CN
Inventors: 安平; 朱润杰; 尤志翔; 王嶺; 高伟
Original assignee: SHANGHAI MEDIA & ENTERTAINMENT TECHNOLOGY GROUP; University of Shanghai for Science and Technology
Current assignee: SHANGHAI MEDIA & ENTERTAINMENT TECHNOLOGY GROUP; University of Shanghai for Science and Technology
Priority date: 2020-03-27
Filing date: 2020-03-27
Publication date: 2020-07-28
Anticipated expiration: 2040-03-27
Also published as: CN111460968B

Abstract

The invention provides a video-based unmanned aerial vehicle identification and tracking method and device, and the method comprises the steps of carrying out manual unmanned aerial vehicle labeling on collected data sets one by one to obtain unmanned aerial vehicle labeling samples of multiple types and different sizes, training a network based on YO L Ov3 by using the data sets to obtain a trained deep learning target detection model, improving the image quality of an unmanned aerial vehicle video to be detected by adopting a Retinex image enhancement means, identifying each frame of the unmanned aerial vehicle video to be detected by using the deep learning target detection model, and rapidly tracking the unmanned aerial vehicle in the video based on a Sort algorithm.

Description

Video-based unmanned aerial vehicle identification and tracking method and device

Technical Field

The invention relates to the field of unmanned aerial vehicle identification and tracking, in particular to an unmanned aerial vehicle identification and tracking method and device based on video.

Background

The video-based moving target detection and tracking problem has a certain research foundation in scientific and technical development and engineering application, and some mature solutions are provided in the fields of intelligent transportation, intelligent monitoring and artificial intelligence research. Modern unmanned aerial vehicles play more and more important roles, and are paid attention by all parties at present. Along with people put forward higher requirement to the intellectuality, unmanned aerial vehicle is certainly favored by each trade: unmanned aerial vehicle at concert scene records, unmanned aerial vehicle delivery of shun feng express delivery, unmanned aerial vehicle of outdoor exploration shoots etc. show that unmanned aerial vehicle has been applied to in people's daily life betterly, brings a great deal of facility for people. In recent years, the real-time monitoring of the unmanned aerial vehicle has shown huge military and civil values, high importance is brought to academic circles and industrial circles, as a typical video-based moving target detection and tracking problem, how to apply the prior art to the video monitoring of the moving target of the unmanned aerial vehicle, the real-time detection and tracking of the target of the unmanned aerial vehicle are realized, and the technology has remarkable economic and social benefits in many aspects such as military guard, public security and the like.

Because the target of the small unmanned aerial vehicle has the characteristics of small size, variable flying speed, complex flying environment and the like, the method of radar detection, passive positioning and the like is easily influenced by other signal clutter, the result of false alarm is generated, the obtained result is only the result of a few pixels, only the position information of the target of the unmanned aerial vehicle is obtained, the flying area and the flying motivation of the unmanned aerial vehicle cannot be monitored with high precision, accurate target positioning cannot be provided for subsequent interference interception, and therefore an ideal result is difficult to achieve. In recent years, unmanned aerial vehicle identification and tracking methods based on optical image processing appear, but the effect is not satisfactory.

Through retrieval, Chinese patent application CN201911268966.2 and CN110706266A disclose an aerial target tracking method based on YO L Ov3, which comprises the steps of generating a model file, collecting a video file in real time, creating two threads of YO L Ov3 target detection and KCF target tracking, carrying out target detection by a YO L Ov3 target tracking thread, sending target position information in the step S03 to a KCF target tracking thread, simultaneously executing the step S07 and the step S11, starting the KCF target tracking thread, judging whether the KCF target tracking thread is initialized or not, manually setting a detection frame, completing initialization of KCF parameters, carrying out target detection by the KCF target tracking thread, taking a detection frame with the maximum response value as a target, updating the position parameters, and obtaining final target position information, wherein YO L Ov3 is adopted in the patent, but the tracking speed is further improved.

Disclosure of Invention

Aiming at the problems in the prior art, the invention provides a video-based unmanned aerial vehicle identification and tracking method and device, which greatly improve the real-time performance of tracking.

In order to solve the technical problems, the invention is realized by the following technical scheme:

according to a first aspect of the present invention, there is provided a video-based drone identifying and tracking method, comprising:

s11, acquiring unmanned aerial vehicle labeled image samples of multiple models and different sizes as a data set;

s12, adopting a YO L Ov3 network to train the data set to obtain a trained deep learning target detection model;

s13, adopting a Retinex image enhancement method to improve the image quality of an input video, identifying each frame of the input unmanned aerial vehicle video through the trained YO L Ov3 deep learning target detection model, obtaining a target unmanned aerial vehicle detection frame of each frame, and preparing for a subsequent tracking task;

and S14, according to the recognition result of S13, adopting a Sort algorithm to realize fast tracking of the unmanned aerial vehicle in the video.

The invention adopts the improvement based on the YO L Ov3 network and the Sort tracking algorithm, and can ensure good precision while improving the tracking speed.

Preferably, the S11 is specifically:

collect a large amount of images that contain unmanned aerial vehicle, cover the unmanned aerial vehicle of various models, many images of every kind of unmanned aerial vehicle model set up the unmanned aerial vehicle image and be unified size, carry out unmanned aerial vehicle's mark one by one to every image.

Preferably, in S12, the YO L Ov3 network trains the data set, adjusts the network hyperparameters, and obtains a deep learning target detection model with stable gradient decrease, a loss function decreasing to an expected value, and a fitting degree meeting requirements.

Preferably, an attention mechanism is added to the Darknet-53 of the YO L Ov3 network to quickly extract important features of data and improve network identification effect, the attention mechanism can put attention on important information to save system resources, a common convolutional neural network pooling layer is too simple and violent to directly use a maximum pooling mode or an average pooling mode and can cause key information not to be identified, and therefore the attention mechanism can be used for improving the problem and improving the accuracy of the model.

Preferably, the YO L Ov3 network, wherein the loss function uses the GIoU function as an index for measuring the target detection positioning performance:

in the formula, A represents a prediction box, B represents a real box, C represents the area of a minimum closed region containing A and B, and a molecule represents the area of a region which does not cover A and B simultaneously in C, the loss function value GIoU ranges from-1 to 1, the relationship between the prediction box and the real box can be better embodied, IoU is a IoU loss function value in a YO L Ov3 network, the improved loss function IoU is a GIoU, the relationship between the prediction box and the real box can be better embodied, and the identification accuracy of the network can be improved.

Preferably, in S13, the method further includes:

the image of the input video is converted into the image with constancy, so that the color is enhanced and the color constancy is kept while the high fidelity of the image and the dynamic range of the image are compressed, and the robustness of a subsequent identification network is improved.

The constancy image r (x, y) is

In the above formula, K is the number of the Gaussian center-surround functions of 1,2,3, w_kIs the weight corresponding to the k-th scale, F_k(x, y) kth center surround function.

Preferably, in S14, the implementing fast tracking of the drone in the video by using the Sort algorithm includes:

in each frame, taking the detected unmanned aerial vehicle detection frame as a reference, simultaneously adopting a Kalman filter to predict the tracking frame of the unmanned aerial vehicle, calculating IoU between the detection frame of all targets of the current frame and all tracking frames predicted by Kalman, obtaining an optimal matching pair of the detection frame and the tracking frame IoU through Hungary algorithm, representing the matched detection frame as the tracking result of the current frame, updating a Kalman tracker by using the currently detected target position information, and continuing to match the prediction frame of the next frame with the detection frame of the next frame;

repeating the above process, realizing the continuous tracking of the unmanned aerial vehicle.

According to a second aspect of the present invention, there is provided a video-based drone identifying and tracking device, comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor being operable to execute the video-based drone identifying and tracking method when executing the program.

Compared with the prior art, the invention has the following beneficial effects:

according to the video-based unmanned aerial vehicle identification and tracking method, a network model is trained according to a large amount of data sets, unmanned aerial vehicle identification and tracking are carried out by using a deep learning method, the existing network is improved, images are enhanced, and a more accurate identification and tracking result with stronger effect robustness is obtained.

The video-based unmanned aerial vehicle identification and tracking method provided by the invention provides a method with a very fast tracking speed on the premise of ensuring higher tracking precision under the condition that the current target tracking cannot give consideration to real-time performance and precision, and can be suitable for an actual target tracking task.

Drawings

Other features, objects and advantages of the invention will become more apparent upon reading of the detailed description of non-limiting embodiments with reference to the following drawings:

fig. 1 is a flow chart of a video-based drone identification and tracking method in accordance with an embodiment of the present invention;

FIG. 2 is a schematic diagram of a modified Darknet-53 network structure based on the YO L Ov3 network according to an embodiment of the present invention;

fig. 3 is a flow chart of a video-based drone identification and tracking method according to an embodiment of the present invention.

Detailed Description

The following examples illustrate the invention in detail: the embodiment is implemented on the premise of the technical scheme of the invention, and a detailed implementation mode and a specific operation process are given. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the inventive concept, which falls within the scope of the present invention.

Fig. 1 is a flow chart of a video-based drone identification and tracking method according to an embodiment of the present invention.

Referring to fig. 1, the video-based unmanned aerial vehicle identification and tracking method of the present embodiment includes the following steps:

s11, acquiring unmanned aerial vehicle labeled image samples of multiple models and different sizes;

a large amount of images containing unmanned aerial vehicles are collected, the unmanned aerial vehicles of various models are covered, multiple images of each type of unmanned aerial vehicle model are marked on each image one by one, and unmanned aerial vehicle marked image samples are obtained and serve as data sets for training.

And S12, training the data set obtained in the step S11 based on the YO L Ov3 network to obtain a trained deep learning target detection model.

S13, adopting a Retinex image enhancement means to improve the image quality of the unmanned aerial vehicle video to be detected, identifying each frame of the unmanned aerial vehicle video to be detected through a deep learning target detection model, obtaining a target unmanned aerial vehicle detection frame of each frame, and preparing for a subsequent tracking task;

and S14, adopting a Sort algorithm to realize rapid tracking of the unmanned aerial vehicle in the video according to the detection frame of the target unmanned aerial vehicle obtained in the S13.

According to the embodiment, the network model is trained according to a large number of data sets, the unmanned aerial vehicle is identified and tracked by using a deep learning method, the accuracy and robustness of identification and tracking of the unmanned aerial vehicle are improved, and when the images of the unmanned aerial vehicle are not clear, the images can be enhanced, so that the method is suitable for various complex scenes. The tracking speed is improved, and meanwhile, good precision can be guaranteed.

In a preferred embodiment, the above S11 may use 2664 pictures of drones as the training data set, wherein the pictures substantially cover all types of drones in different states and different backgrounds, and the size of the images is the same. Of course, the number of pictures is merely an example, and in other embodiments, the number of pictures may be other numbers of drone pictures, and is not limited to 2664.

In another preferred embodiment, the S12 uses a data set obtained by training S11 based on the YO L Ov3 network to adjust the network hyperparameter, so as to obtain a deep learning model with stable gradient decline, loss function reduced to a desired value, and fitting degree reaching requirements.

1) An attention mechanism is added into Darknet-53 of the YO L Ov3 network, important features of data can be extracted quickly, network identification effect is improved, attention can be paid to important information to save system resources, a common convolutional neural network pooling layer is too simple and violent in a mode of maximum pooling or average pooling directly and key information cannot be identified, and therefore the problem can be improved by the attention mechanism, and accuracy of a model is improved.

2) The improved loss function iou (interaction over union) is giou (generalized intersection over union), which can better reflect the relationship between the predicted frame and the real frame and make up for the deficiency of IoU.

IoU is adopted in the YO L Ov3 network as an index for measuring the target detection positioning performance,

in the above formula, a represents a prediction box, B represents a real box, a numerator represents a union of the prediction box and the real box, and a denominator represents an intersection of the prediction box and the real box. But if the prediction box does not intersect the real box IoU is zero and cannot be optimized; even the same IoU does not represent the same detection effect. The GIoU improved upon the above-described problems,

in the above formula, C represents the minimum closed region area including a and B, and the molecule represents the region area of C that does not cover both a and B. Because the range of IoU is from 0 to 1, and the range of GIoU is from-1 to 1, the relation between the prediction frame and the real frame can be better reflected, and the network identification accuracy rate can be improved.

As shown in FIG. 3, the Darknet-53 modified network structure based on YO L Ov3 network specifically includes 52 layers of convolution and 23 residual units, the network is a full convolution network, a large number of jump layer connections using residual are used, and in order to reduce the negative effect of gradient caused by pooling, the pooling layer is abandoned, 5 times of downsampling are performed by convolution with step size of 2, in the 5-level downsampling process, the convolution layer is followed by the residual units and attention mechanism416, the output is 13x13 (416/2)⁵13) to perform a size transformation of the tensor. Through the improvements, the accuracy and robustness of unmanned aerial vehicle identification and tracking can be well improved.

In another preferred embodiment, the above S13 may convert the image of the drone video into an image with constancy, so as to enhance the color and maintain the color constancy while maintaining the high fidelity of the image and compressing the dynamic range of the image. Specifically, the image of constancy r (x, y) is

In the above formula, K is 3 and represents RGB three channels, w_kThe weights corresponding to the kth scale are 1/3, the three scales are 15, 101, 301, S (x, y) is the observed image, F_k(x, y) kth center surround function.

In another preferred embodiment, in the above S14, the method for implementing fast tracking of the drone in the video by using the Sort algorithm may be implemented as follows:

in each frame, the detected unmanned aerial vehicle detection frame is taken as a reference, a tracking frame of the unmanned aerial vehicle is predicted by adopting a Kalman filter, IoU between the detection frames of all targets of the current frame and all tracking frames predicted by Kalman are calculated, an optimal matching pair of the detection frame and the tracking frame IoU is obtained by Hungary algorithm, the matched detection frame is expressed as a tracking result of the current frame, the Kalman tracker is updated by using the currently detected target position information, and the prediction frame of the next frame is continuously matched with the detection frame of the next frame, so that continuous tracking of the target can be realized.

In the unmanned aerial vehicle identification and tracking method based on the video in the embodiment, the network model is trained according to a large amount of data sets, the unmanned aerial vehicle identification and tracking are carried out by using a deep learning method, the existing network is improved, the image is enhanced, and the identification and tracking result which is more accurate and has stronger effect robustness is obtained.

The invention further provides a video-based unmanned aerial vehicle identification and tracking device, which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor can be used for executing the video-based unmanned aerial vehicle identification and tracking method of the above embodiments when executing the program.

Optionally, a memory for storing a program; a Memory, which may include a volatile Memory (abbreviated RAM), such as a Random-Access Memory (RAM), a static Random-Access Memory (SRAM), a Double Data Rate Synchronous Dynamic Random Access Memory (DDR SDRAM), and the like; the memory may also comprise a non-volatile memory, such as a flash memory. The memory 62 is used to store computer programs (e.g., applications, functional modules, etc. that implement the above-described methods), computer instructions, etc., which may be stored in one or more memories in a partitioned manner. And the computer programs, computer instructions, data, etc. described above may be invoked by a processor.

The computer programs, computer instructions, etc. described above may be stored in one or more memories in a partitioned manner. And the computer programs, computer instructions, data, etc. described above may be invoked by a processor.

A processor for executing the computer program stored in the memory to implement the steps of the method according to the above embodiments. Reference may be made in particular to the description relating to the preceding method embodiment.

The processor and the memory may be separate structures or may be an integrated structure integrated together. When the processor and the memory are separate structures, the memory, the processor may be coupled by a bus.

Embodiments of the present application further provide a computer-readable storage medium, in which computer-executable instructions are stored, and when at least one processor of the user equipment executes the computer-executable instructions, the user equipment executes the above-mentioned various possible methods.

The embodiments were chosen and described in order to best explain the principles of the invention and the practical application, and not to limit the invention. Any modifications and variations within the scope of the description, which may occur to those skilled in the art, are intended to be within the scope of the invention. The above preferred features can be used alone in any embodiment, or in any combination thereof, without conflict.

Claims

1. A video-based unmanned aerial vehicle identification and tracking method is characterized by comprising the following steps:

s13, adopting a Retinex image enhancement method to improve the image quality of an input video, and identifying each frame of the input unmanned aerial vehicle video through the trained YO L Ov3 deep learning target detection model to obtain a target unmanned aerial vehicle detection frame of each frame;

2. The video-based unmanned aerial vehicle identification and tracking method according to claim 1, wherein the S11 is specifically: collect a large amount of images that contain unmanned aerial vehicle, cover the unmanned aerial vehicle of various models, many images of every kind of unmanned aerial vehicle model set up the unmanned aerial vehicle image and be unified size, carry out unmanned aerial vehicle's mark one by one to every image.

3. The video-based unmanned aerial vehicle recognition and tracking method of claim 1, wherein in S12, the YO L Ov3 network trains the data set, adjusts network hyperparameters, and obtains a deep learning target detection model with stable gradient descent, loss function reduced to a desired value, and fitting degree reaching requirements.

4. The video-based drone identifying and tracking method according to claim 3, characterized in that an attention mechanism is added in the Darknet-53 of the YO L Ov3 network to extract important features of the data quickly.

5. The video-based drone identifying and tracking method according to claim 3, wherein the YO L Ov3 network, wherein the loss function uses the GIoU function as an index to measure target detection positioning performance:

in the above formula, A represents a prediction box, B represents a real box, C represents the area of a minimum closed region containing A and B, and the numerator represents the area of a region not covering A and B simultaneously in C, the loss function value GIoU ranges from-1 to 1, the relationship between the prediction box and the real box can be better embodied, and IoU is a IoU loss function value in a YO L Ov3 network.

6. The video-based unmanned aerial vehicle identification and tracking method according to any one of claims 1-5, wherein S13 further comprises:

converting an image of an input video into an image of constancy, the image of constancy r (x, y) being:

in the above formula, K is the number of the Gaussian center-surround functions of 1,2,3, w_kIs the weight corresponding to the kth scale, S (x, y) is each frame image of the observed image input video, x, y represents the specific position in the image, F_k(x, y) kth center surroundA function.

7. The video-based unmanned aerial vehicle identification and tracking method according to any one of claims 1-5, wherein in S14, the performing fast tracking of the unmanned aerial vehicle in the video by using the Sort algorithm includes:

8. A video-based drone identifying and tracking device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor when executing the program is operable to perform the method of any one of claims 1-7.