CN117079196A

CN117079196A - Unmanned aerial vehicle identification method based on deep learning and target motion trail

Info

Publication number: CN117079196A
Application number: CN202311332415.4A
Authority: CN
Inventors: 刘志俭; 乔纯捷; 明德祥
Original assignee: Changsha Beidou Industrial Safety Technology Research Institute Co ltd
Current assignee: Changsha Beidou Industrial Safety Technology Research Institute Co ltd
Priority date: 2023-10-16
Filing date: 2023-10-16
Publication date: 2023-11-17
Anticipated expiration: 2043-10-16
Also published as: CN117079196B

Abstract

The application discloses an unmanned aerial vehicle identification method based on deep learning and a target motion track, which comprises the steps of shooting video data of an unmanned aerial vehicle by a camera or other visual sensors; preliminarily identifying potential unmanned aerial vehicle candidate targets according to the YoloV3 target detection network; analyzing candidate frames acquired by a target detection network, and calculating track parameters generated by a series of target detection frames of each candidate frame before the time point, wherein the track parameters comprise speed, direction and acceleration; incorporating the candidate frames and the corresponding track parameters into a target re-detection network; acquiring the finally identified unmanned aerial vehicle position based on the output result of the target re-detection network; by fully considering the potential unmanned aerial vehicle targets and corresponding track parameters, the unmanned aerial vehicle can be identified more accurately.

Description

Unmanned aerial vehicle identification method based on deep learning and target motion trail

Technical Field

The application relates to the technical field of unmanned aerial vehicle recognition, in particular to an unmanned aerial vehicle recognition method based on deep learning and target motion trail.

Background

Air drone recognition has been a difficult hot problem in computer vision because small objects in images often lack sufficient visual characteristic information relative to conventionally sized objects because of the very few visual characteristics, and are therefore difficult to distinguish from the background and similar but different classes of small objects. For example, in a scene of identifying an unmanned aerial vehicle, objects such as birds, kites and the like are easily misjudged as the unmanned aerial vehicle by a computer vision algorithm simply through deep learning. Generally, as the target is far away from the camera, the working scene of the camera is outdoor, and the influence factors such as different weather and environment are very difficult to help the computer to finish accurate judgment of the unmanned aerial vehicle by improving the visual appearance characteristic information of the unmanned aerial vehicle in the image.

Disclosure of Invention

In view of the above, the application provides an unmanned aerial vehicle recognition method based on deep learning and a target motion track, aiming at (1) optimizing a loss function of a deep learning network aiming at a small target, and realizing optimization of parameters of the deep learning network so as to better recognize the unmanned aerial vehicle; (2) The method comprises the steps of obtaining the motion track of the unmanned aerial vehicle by using a deep learning and target tracking technology, analyzing the motion track of the unmanned aerial vehicle, extracting the characteristics of the unmanned aerial vehicle, guaranteeing that the characteristics are different from other identified small targets, and improving the identification accuracy of the unmanned aerial vehicle; (3) And by combining the visual appearance and the motion trail characteristics of the moving target, whether the target is an unmanned aerial vehicle is comprehensively judged, and the accuracy and the robustness of unmanned aerial vehicle recognition are improved.

The unmanned aerial vehicle identification method based on deep learning and target motion trail provided by the application comprises the following steps:

s1: acquiring video data of a flying target shot by a camera or other visual sensors;

s2: preliminarily identifying potential unmanned aerial vehicle candidate targets according to the YoloV3 target detection network;

s3: analyzing candidate frames acquired by a target detection network, calculating track parameters generated by a series of target detection frames of each candidate frame before a certain time point, wherein the track parameters comprise speed and direction, and carrying out preliminary judgment on the candidate frames;

s4: incorporating the candidate frames and the corresponding track parameters into a target re-detection network;

s5: and acquiring the finally identified unmanned aerial vehicle position based on the output result of the target re-detection network.

As a further improvement of the application:

further, the step S1 of acquiring video data of the flying object captured by the camera or other visual sensor includes:

the imagery may be real-time video captured by a camera or a pre-stored video file. When the images are acquired, the resolution, the frame rate and other information of the images are acquired at the same time, so that the subsequent identification work is facilitated.

Further, in S2, initially identifying a potential candidate target of the unmanned aerial vehicle according to the YoloV3 target detection network, including:

and (3) performing object detection on each frame of the video acquired in the S1 by using a YoloV3 object detection network, wherein each frame of video can obtain detection frames of all detection objects.

Further, the method further comprises the following steps: judging and analyzing the center of the position of the unmanned aerial vehicle according to the target motion trail, wherein the loss function of the center position can be constructed as follows:

；

wherein S represents the number of meshes defined in yolv 3;indicating that if the detection frame has the unmanned aerial vehicle, the value is 1, otherwise, the value is 0; />Respectively representing the horizontal and vertical coordinates of the center position of a detection frame truly containing the unmanned aerial vehicle; />Representing the abscissa and ordinate of a detection frame predicted by a network; />The weight of the loss is represented.

In combination with other loss functions, the total loss function of YoloV3 is:

；

wherein S represents the number of meshes defined in yolv 3;indicating that if the detection frame has the unmanned aerial vehicle, the value is 1, otherwise, the value is 0; />Indicating that the detection frame has no unmanned aerial vehicle, the value is 1, otherwise, the detection frame is 0; />The width and the height of a detection frame which truly contains the unmanned aerial vehicle are respectively shown; />Representing the width and height of a detection frame predicted by a network; />The probability of unmanned aerial vehicle in the representation frame; />Representing different index weights.

Further, in the step S3, candidate frames acquired by the target detection network are analyzed, track parameters generated by a series of target detection frames of each candidate frame before the time point are calculated, the track parameters include speed and direction, and the candidate frames are primarily judged, including:

calculating corresponding track parameters according to candidate frames of each target acquired by the target detection network in the step S2, wherein the track parameters are calculated in the following way:

s31 obtaining the current frameLower targetThe same target as the first two frames->，The abscissa on the image is obtained according to the result of the identification of the reading target;

s32, calculating the speed and direction of the target in the current frame based on the speed and direction information of the same target in the previous two frames:

；

wherein,representing the speed of the object from frame n-1 to frame n,represents the time from the n-1 th frame to the n-th frame,/and>representing the direction of movement of the object from the n-1 frame to the n-th frame, referring to the component of the direction of the movement velocity in the x-axis and the y-axis, i.e. d, when the object moves from the n-1 frame position to the n-th frame position ₁ And d ₂ ；

S33: on the basis of obtaining the track parameter of the current frame, the track parameter of the current frame is further corrected by using the track parameter obtained by calculating the previous frame:

；

wherein,representing the modified speed of the target from frame n-1 to frame n,indicating the corrected direction of the object from the n-1 th frame to the n-th frame, +.>Representing the correction factor. The corrected track parameters can be integrated with the historical track parameters of the target, so that the track parameters of the whole target are more stable.

S34: carrying out preliminary discrimination on different candidate frame motion tracks according to statistics corresponding to the corrected track parameter sequence and the corrected track parameter first-order difference sequence:

the corrected track parameter sequence is as follows:

；

the first-order differential sequence of the corrected track parameters is as follows:

；

。

the statistics are the corrected track parameter sequenceStandard deviation of corrected track parameter first-order differential sequence，，/>，/>，/>And->The method comprises the steps of carrying out a first treatment on the surface of the Judging whether the target in the candidate frame is a disordered flight target according to the statistic; the judgment formula is as follows:

；

wherein,a function for judging whether the internal equation is established, wherein if the internal equation is established, the function is 1, otherwise, the function is 0; r is equal to 1, which means that the target is a disordered flying target, and equal to 0 means that the target needs to be further distinguished.

Further, the step S4 of incorporating the candidate frame and the corresponding track parameter into the target re-detection network includes:

and (3) aiming at the preliminary judgment result in the step (S3), carrying out target re-detection on the unmanned aerial vehicle by adopting a target re-detection network, wherein the target re-detection network consists of a feature extraction layer and a classification layer.

S41: and acquiring the detected central position and length and width information of the target candidate frame, cutting out a target image independently, and sending the target image into a feature extraction layer to extract features.

S42: after the feature vector output by the feature extraction layer is obtained, the input feature classification layer obtains the probability of judging that the target is the unmanned aerial vehicle only through visual features, and calculates the cross entropy:

；

wherein,indicating that if the detection frame has the unmanned aerial vehicle, the value is 1, otherwise, the value is 0; />Indicating that the detection frame has no unmanned aerial vehicle, the value is 1, otherwise, the detection frame is 0; r is the probability that the target in the detection frame output by the feature classification layer is an unmanned aerial vehicle;

s43: the feature vector output by the feature extraction layer and the track parameter of the current candidate frame under the current frame are fused, the fusion classification layer is input to obtain the probability that the target is the unmanned aerial vehicle, and the cross entropy is calculated:

；

wherein,indicating that if the detection frame has the unmanned aerial vehicle, the value is 1, otherwise, the value is 0; />Indicating that the detection frame has no unmanned aerial vehicle, the value is 1, otherwise, the detection frame is 0; f is the probability that the target in the detection frame output by the fusion classification layer is the unmanned aerial vehicle;

s44: the cross entropy results of the comprehensive feature classification layer and the fusion classification layer are used as loss functions of the network to train the whole network:

；

wherein,is a weight parameter.

Further, the step S5 of obtaining the finally identified unmanned aerial vehicle position based on the output result of the target re-detection network includes:

after training the target re-detection network based on the loss function designed in S4, calculating the comprehensive probability of the output of the target re-detection network feature classification layer and the fusion classification layer:

；

taking a detection frame with the comprehensive probability larger than 0.5 as a finally screened unmanned aerial vehicle detection result, and outputting parameters of the detection frame:

；

wherein,representing the abscissa where the center of the detection frame finally determined as the target of the unmanned aerial vehicle is located; />Representing the width and height of the detection frame that is ultimately determined to be the target of the drone.

Advantageous effects

The application provides an unmanned aerial vehicle identification method based on deep learning and a target movement track, which is characterized in that a YoloV3 target identification network for improving a loss function is firstly used for video data, video data of a flying target shot by a camera or a visual sensor can be rapidly processed, and candidate targets possibly used as unmanned aerial vehicles are identified, so that a favorable foundation is provided for accurate identification of subsequent unmanned aerial vehicles. The improved loss function focuses on identifying the central position of the candidate target, and the loss function combined with the L1 paradigm can still provide enough loss values for network optimization when the predicted position and the actual position have smaller phase difference, so that more accurate position information is provided for extracting the track parameters. Meanwhile, aiming at the bounding box where each candidate target is located, the motion speed and the motion direction of the candidate target are obtained through analysis among video frames, and track parameters of the target are formed. After obtaining the track parameters of each candidate frame, the inventor further uses a time sequence analysis method to analyze the track parameter sequences and the first-order differential sequences thereof, and identifies the proportion of points of different candidate frame track parameter sequences outside a certain standard deviation range, wherein the track with higher proportion represents the disorder of motion. The unmanned aerial vehicle is limited by the control of instructions, the flying process is often orderly and has a certain mechanical rule, so that if the motion of the target has disordered track parameters, the condition that the bird exists in the identified target can be directly eliminated, the number of the follow-up target re-detection network to be identified is reduced, and the running speed of the method is improved. In order to prevent error rejection, the track parameter sequence and the first-order differential sequence are combined to judge, so that the change condition of the track parameter itself can be considered, the change rate of the track parameter is also considered, and a more stable result is obtained after comprehensive judgment. And finally, extracting candidate frame features by using a feature layer of the target re-detection network, and further judging whether the target is an unmanned aerial vehicle according to the visual direct classification result and the classification result combined with the corresponding track parameters to obtain a more accurate unmanned aerial vehicle identification result. The method comprehensively considers the visual characteristics of the unmanned aerial vehicle image and the target motion track characteristics, and has higher accuracy and robustness. In practical application, the method can effectively avoid misjudgment caused by single visual characteristics or movement track characteristics, and can more accurately identify the unmanned aerial vehicle.

Drawings

Fig. 1 is a flow chart of an unmanned aerial vehicle recognition method based on deep learning and a target motion track according to an embodiment of the application;

fig. 2 is a schematic diagram of a target re-detection network according to an embodiment of the application.

Detailed Description

The application is further described below with reference to the accompanying drawings, without limiting the application in any way, and any alterations or substitutions based on the teachings of the application are intended to fall within the scope of the application.

In order to achieve the above object, the present application provides an unmanned aerial vehicle recognition method based on deep learning and a target motion trajectory, comprising the steps of:

s1: video data of a flying object captured by a camera or other visual sensor is acquired.

The step S1 of acquiring video data of a flying object shot by a camera or other visual sensors comprises the following steps:

the imagery may be real-time video captured by a camera or a pre-stored video file. When the images are acquired, the resolution, the frame rate and other information of the images are acquired at the same time, so that the subsequent identification work is facilitated. The embodiment uses a pre-stored video file to identify the unmanned aerial vehicle.

S2: and initially identifying potential unmanned aerial vehicle candidate targets according to the YoloV3 target detection network.

In the step S2, the preliminary identification of the potential candidate target of the unmanned aerial vehicle according to the YoloV3 target detection network includes:

Further, acquiring the center of the position of the unmanned aerial vehicle is a precondition of analyzing the motion trail, according to the characteristics of the target motion trail, the center of the position of the unmanned aerial vehicle needs to be analyzed in an emphasized manner, and the loss function of the center position can be constructed as follows:

；

wherein S represents the number of meshes defined in yolv 3, 2704 in this embodiment;indicating that if the detection frame has the unmanned aerial vehicle, the value is 1, otherwise, the value is 0; />Respectively show in the detection frame truly containing unmanned aerial vehicleA heart position abscissa; />Representing the abscissa and ordinate of a detection frame predicted by a network; />The weight of this loss is shown as 1 in this embodiment.

In combination with other loss functions, the total loss function of YoloV3 is:

；

wherein S represents the number of meshes defined in yolv 3, 2704 in this embodiment;indicating that if the detection frame has the unmanned aerial vehicle, the value is 1, otherwise, the value is 0; />Indicating that the detection frame has no unmanned aerial vehicle, the value is 1, otherwise, the detection frame is 0;the width and the height of a detection frame which truly contains the unmanned aerial vehicle are respectively shown; />Representing the width and height of a detection frame predicted by a network; />The probability of unmanned aerial vehicle in the representation frame; />Different index weights are indicated, in this embodiment 5 and 1 respectively.

The number of grids determines the minimum target size which can be screened by the target detection network, and the small targets appearing in the video are captured more accurately by adopting the larger grid number to divide the video shooting range more finely in consideration of the fact that the size of the unmanned aerial vehicle appears in the video. When different weights of the loss function are set, the center coordinates of the small targets are considered to play an important role in positioning the small targets, so that a loss term considering whether the center positions are accurate or not is added on the basis of the loss function of the traditional target detection.

S3: analyzing candidate frames acquired by the target detection network, calculating track parameters generated by a series of target detection frames of each candidate frame before the time point, wherein the track parameters comprise speed and direction, and carrying out preliminary judgment on the candidate frames.

In the step S3, the candidate frames acquired by the target detection network in the step S2 are analyzed, and a series of track parameters generated by the target detection frames before the time point of each candidate frame are calculated, wherein the track parameters comprise speed and direction, and the track parameters comprise:

s31, obtaining the target under the current frameThe same target as the first two frames->，The abscissa on the image is obtained according to the result of the identification of the reading target;

；

wherein,representing the target from frame n-1 to frameThe speed of the nth frame is set,represents the time from the n-1 th frame to the n-th frame,/and>representing the direction of motion of the object from the n-1 th frame to the n-th frame;

；

wherein,representing the modified speed of the target from frame n-1 to frame n,indicating the corrected direction of the object from the n-1 th frame to the n-th frame, +.>The correction coefficient is shown as 0.9 in this embodiment. The corrected track parameters can be integrated with the historical track parameters of the target, so that the track parameters of the whole target are more stable.

the corrected track parameter sequence is as follows:

；

the corresponding first order differential sequence is:

；

。

the statistic is the standard deviation of the corrected track parameter sequence and the corrected track parameter first-order difference sequence，，/>，/>，/>And->The method comprises the steps of carrying out a first treatment on the surface of the Judging whether the target in the candidate frame is a disordered flight target according to the statistic; the judgment formula is as follows:

；

According to the scheme, the calculated motion trail is used for screening the flying target, so that the accuracy and the efficiency of target identification and tracking can be improved. The movement of the drone is typically non-motorized and has some degree of mechanical movement, so its movement characteristics help distinguish it from non-drone targets. By analyzing the motion trail, the motion state, speed and direction boarding key information of the flying target can be determined, so that whether the flying target is an unmanned plane or not can be assisted and judged

S4: and incorporating the candidate frames and the corresponding track parameters into the target re-detection network.

And in the step S4, the candidate frame in the step S3 and the corresponding track parameter are incorporated into a target re-detection network, and the method comprises the following steps:

the target re-detection network is composed of a feature extraction layer and a classification layer, and the target re-detection of the unmanned aerial vehicle is completed jointly.

S41: and extracting the center position and the length and width information provided by the detected target candidate frame, cutting out a target image independently, and sending the target image into a feature extraction layer to extract features.

；

wherein,indicating that if the detection frame has the unmanned aerial vehicle, the value is 1, otherwise, the value is 0; />Indicating the detectionThe value of the frame is 1 if no unmanned aerial vehicle exists, otherwise, the value is 0; and r is the probability that the target in the detection frame output by the feature classification layer is the unmanned aerial vehicle. The feature extraction layer can use pretrained models such as VGG-16, VGG-19 and the like, and the feature classification layer consists of 2 full connection layers.

；

wherein,indicating that if the detection frame has the unmanned aerial vehicle, the value is 1, otherwise, the value is 0; />Indicating that the detection frame has no unmanned aerial vehicle, the value is 1, otherwise, the detection frame is 0; f is the probability that the target in the detection frame output by the fusion classification layer is the unmanned aerial vehicle. The fusion classification layer consists of 2 full connection layers.

；

wherein,the weight parameter is 0.1 in this embodiment.

For the targets which cannot be distinguished in the step S3, the scheme further uses a deep learning network, and judges the targets to be identified by combining the motion trail of the targets. After the deep learning network joins the motion trail related information, more characteristics of the target can be captured, so that a more accurate recognition result is obtained.

；

The words "preferred," "further," and "preferred" as used herein mean serving as an example, instance, or illustration. Any aspect or design described herein as "preferred" is not necessarily to be construed as advantageous over other aspects or designs. Rather, use of the word "preferred" is intended to present concepts in a concrete fashion. The term "or" as used in this disclosure is intended to mean an inclusive "or" rather than an exclusive "or". That is, unless specified otherwise or clear from the context, "X uses a or B" is intended to naturally include any of the permutations. That is, if X uses A; x is B; or X uses both A and B, then "X uses A or B" is satisfied in any of the foregoing examples.

Moreover, although the disclosure has been shown and described with respect to one or more implementations, equivalent alterations and modifications will occur to others skilled in the art based upon a reading and understanding of this specification and the annexed drawings. The present disclosure includes all such modifications and alterations and is limited only by the scope of the following claims. In particular regard to the various functions performed by the above described components (e.g., elements, etc.), the terms used to describe such components are intended to correspond, unless otherwise indicated, to any component which performs the specified function of the described component (e.g., that is functionally equivalent), even though not structurally equivalent to the disclosed structure which performs the function in the herein illustrated exemplary implementations of the disclosure. Furthermore, while a particular feature of the disclosure may have been disclosed with respect to only one of several implementations, such feature may be combined with one or other features of the other implementations as may be desired and advantageous for a given or particular application. Moreover, to the extent that the terms "includes," has, "" contains, "or variants thereof are used in either the detailed description or the claims, such terms are intended to be inclusive in a manner similar to the term" comprising.

The functional units in the embodiment of the application can be integrated in one processing module, or each unit can exist alone physically, or a plurality of or more than one unit can be integrated in one module. The integrated modules may be implemented in hardware or in software functional modules. The integrated modules may also be stored in a computer readable storage medium if implemented in the form of software functional modules and sold or used as a stand-alone product. The above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, or the like. The above-mentioned devices or systems may perform the storage methods in the corresponding method embodiments.

In summary, the foregoing embodiment is an implementation of the present application, but the implementation of the present application is not limited to the embodiment, and any other changes, modifications, substitutions, combinations, and simplifications made by the spirit and principles of the present application should be equivalent to the substitution manner, and all the changes, modifications, substitutions, combinations, and simplifications are included in the protection scope of the present application.

Claims

1. The unmanned aerial vehicle identification method based on deep learning and target motion trail is characterized by comprising the following steps of:

s1: acquiring video data of a flying target shot by a camera;

s4: incorporating the candidate boxes into the target re-detection network together with their corresponding trajectory parameters:

s41: obtaining the detected central position and length and width information of the target candidate frame, cutting out a target image independently, and sending the target image into a feature extraction layer to extract features;

s42: after obtaining the feature vector output by the feature extraction layer, the input feature classification layer obtains the probability of judging the target as an unmanned aerial vehicle only through visual features, and calculates cross entropy;

s43: fusing the feature vector output by the feature extraction layer with the track parameter of the current candidate frame under the current frame, inputting the track parameter into the fusion classification layer to obtain the probability that the target is the unmanned aerial vehicle, and calculating the cross entropy;

s44: the cross entropy results of the comprehensive feature classification layer and the fusion classification layer are used as loss functions of the network to train the whole network;

2. The unmanned aerial vehicle recognition method based on the deep learning and the target motion trail according to claim 1, wherein the video data in the step S1 is a real-time video file captured by a camera and/or a pre-stored video file, and the resolution and the frame rate information of the video file are acquired at the same time.

3. The unmanned aerial vehicle recognition method based on the deep learning and the target motion trajectory according to claim 1, wherein the step S2 is:

performing target detection on each frame of the video acquired in the S1 by utilizing a YoloV3 target detection network to obtain detection frames of all detection targets;

further comprises: judging and analyzing the center of the position of the unmanned aerial vehicle according to the target motion trail, and constructing a loss function of the center position as follows:

；

4. The unmanned aerial vehicle recognition method based on deep learning and target motion trajectories according to claim 3, wherein in step S2, other loss functions are combined, and the total loss function of YoloV3 is:

；

wherein S represents the number of meshes defined in yolv 3;indicating that if the detection frame has the unmanned aerial vehicle, the value is 1, otherwise, the value is 0; />Indicating that the detection frame has no unmanned aerial vehicle, the value is 1, otherwise, the detection frame is 0; />The width and the height of a detection frame which truly contains the unmanned aerial vehicle are respectively shown; />Representing the width and height of a detection frame predicted by a network; />The probability of unmanned aerial vehicle in the representation frame;representing different index weights.

5. The unmanned aerial vehicle recognition method based on the deep learning and the target motion trajectory according to claim 1, wherein the step S3 comprises:

calculating corresponding track parameters according to candidate frames of each target acquired by the target detection network in the step S2, wherein the track parameters are calculated by the steps of:

s31, obtaining the target under the current frameThe same target as the first two frames->，/>The abscissa on the image is obtained according to the result of the identification of the reading target;

；

wherein,representing the speed of the object from frame n-1 to frame n>Represents the time from the n-1 th frame to the n-th frame,/and>representing the direction of motion of the object from the n-1 th frame to the n-th frame;

；

wherein,representing the modified speed of the target from frame n-1 to frame n,indicating the corrected direction of the object from the n-1 th frame to the n-th frame, +.>Representing the correction coefficient;

the corrected track parameter sequence is as follows:

；

6. The unmanned aerial vehicle recognition method based on the deep learning and the target motion trajectory according to claim 1, wherein the step S4 comprises:

aiming at the preliminary judgment result in the step S3, a target re-detection network is adopted to carry out target re-detection on the unmanned aerial vehicle, wherein the target re-detection network consists of a feature extraction layer and a classification layer;

；

wherein,is a weight parameter.

7. The unmanned aerial vehicle recognition method based on the deep learning and the target motion trajectory according to claim 6, wherein the step S5 comprises:

；