CN117036989A - Miniature unmanned aerial vehicle target recognition and tracking control method based on computer vision - Google Patents

Miniature unmanned aerial vehicle target recognition and tracking control method based on computer vision Download PDF

Info

Publication number
CN117036989A
CN117036989A CN202310603409.1A CN202310603409A CN117036989A CN 117036989 A CN117036989 A CN 117036989A CN 202310603409 A CN202310603409 A CN 202310603409A CN 117036989 A CN117036989 A CN 117036989A
Authority
CN
China
Prior art keywords
target
unmanned aerial
aerial vehicle
tracking
image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310603409.1A
Other languages
Chinese (zh)
Inventor
邓恒
冯尚斌
顾爽
王怡菲
乐祥立
刘石
全权
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Technology
Beihang University
Original Assignee
Beijing University of Technology
Beihang University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Technology, Beihang University filed Critical Beijing University of Technology
Priority to CN202310603409.1A priority Critical patent/CN117036989A/en
Publication of CN117036989A publication Critical patent/CN117036989A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/10Terrestrial scenes
    • G06V20/17Terrestrial scenes taken from planes or by drones
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/277Analysis of motion involving stochastic approaches, e.g. using Kalman filters
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/60Analysis of geometric attributes
    • G06T7/62Analysis of geometric attributes of area, perimeter, diameter or volume
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30241Trajectory

Abstract

The invention provides a miniature unmanned aerial vehicle target identification and tracking control method based on computer vision, which comprises the steps of acquiring video streams acquired by an unmanned aerial vehicle in real time; extracting video frame images in the video stream, and processing each video frame image to obtain an input image; inputting the input image into a pre-trained YOLOv5 neural network model to obtain target bounding box information in the input image, wherein the target bounding box information comprises a target type and a confidence level; according to the target boundary box information and the tracking target determined by the user in advance, matching the target in the current video frame image with the tracking target in the previous video frame image, and determining tracking track updating data of the unmanned aerial vehicle; according to the tracking track updating data of the unmanned aerial vehicle, the unmanned aerial vehicle is tracked and controlled, and the unmanned aerial vehicle can accurately and efficiently identify and track the selected target.

Description

Miniature unmanned aerial vehicle target recognition and tracking control method based on computer vision
Technical Field
The invention belongs to the technical field of unmanned aerial vehicles, and particularly relates to a miniature unmanned aerial vehicle target identification and tracking control method based on computer vision.
Background
The unmanned aerial vehicle is a powered, controllable, capable of carrying multiple task equipment, performing multiple tasks and capable of being reused. Unmanned aerial vehicles that can be controlled using wireless remote control devices and their own controls, such as unmanned helicopters, unmanned fixed wing aircraft, unmanned parachute wing aircraft, multi-rotor unmanned aerial vehicles, and the like. The unmanned aerial vehicle can be used for mounting a shooting device and is used for aerial photography, mapping, investigation and the like.
Target tracking is an important research direction of computer vision, and is to accurately find information such as the position and the motion trail of an interested target in a video sequence, and the application of a target tracking technology to an unmanned plane is helpful for improving the intelligent level of the unmanned plane. In practical tracking applications, the target area of interest is often affected by some environmental factors, for example, in a complex environment where a GPS signal fails or communication is refused, so that an algorithm calculation result is inaccurate, a target cannot be tracked stably, and finally the target is lost.
Disclosure of Invention
Aiming at the problems in the prior art, the object of the present invention is to provide a miniature unmanned aerial vehicle target identification and tracking control method based on computer vision, which improves the accurate and efficient identification and tracking of the unmanned aerial vehicle to the selected target.
In order to solve the technical problems, the specific technical scheme is as follows:
in one aspect, provided herein is a computer vision-based micro unmanned aerial vehicle target recognition and tracking control method, the method comprising:
acquiring a video stream acquired by an unmanned aerial vehicle in real time;
extracting video frame images in the video stream, and processing each video frame image to obtain an input image;
inputting the input image into a pre-trained YOLOv5 neural network model to obtain target bounding box information in the input image, wherein the target bounding box information comprises a target type and a confidence level;
according to the target boundary box information and the tracking target determined by the user in advance, matching the target in the current video frame image with the tracking target in the previous video frame image, and determining tracking track updating data of the unmanned aerial vehicle;
and carrying out tracking control on the unmanned aerial vehicle according to the tracking track updating data of the unmanned aerial vehicle.
Further, acquiring the video stream acquired by the unmanned aerial vehicle in real time comprises:
establishing a data communication protocol with the unmanned aerial vehicle;
acquiring a plurality of data packets sent from the unmanned aerial vehicle according to the data communication protocol;
And extracting video frames from each data packet, and reconstructing the video frames to obtain the video stream of the unmanned aerial vehicle.
Further, the pre-trained YOLOv5 neural network model is obtained through training by the following steps:
acquiring a training data set with labels, wherein the types of objects in the training data set are consistent with the types of tracked objects of the unmanned aerial vehicle;
and training the initial Yolov5 neural network model according to the training data set and a preset loss function to obtain a training convergence Yolov5 neural network model.
Further, the YOLOv5 neural network model comprises a main network, a detection head network, a prediction layer and an anchor frame;
the backbone network is of a multi-level structure and is used for extracting multi-level image characteristics of an input image;
the detection head network is used for extracting target parameter information according to the multi-level image characteristics, and the target parameter information at least comprises target positions, categories and confidence information;
the prediction layer generates a target boundary box and confidence according to the output of the detection head network, and performs target sequencing according to the confidence so as to generate feature graphs with different scales;
The anchor frame is used for carrying out target prediction on each feature map, and screening and merging prediction results by utilizing non-maximum value inhibition so as to obtain target boundary frame information.
Further, a characteristic pyramid network and a path aggregation network are arranged in the main network;
the feature pyramid network is connected with the multi-level structure of the backbone network, and the image features of different levels are fused through transverse connection and up-sampling operation;
the path aggregation network is connected with the feature pyramid networks of different levels in a cascading way to fuse the feature information of the shallow layer and the deep layer.
Further, the matching of the target in the current video frame image and the tracking target in the previous video frame image according to the target bounding box information and the tracking target determined by the user in advance, and determining the tracking track update data of the unmanned aerial vehicle comprise:
processing the target boundary box information by using a Kalman filtering algorithm to obtain a motion state variable of a target in each video frame image, wherein the state variable comprises a center coordinate and a speed;
and matching the target in the current video frame with the tracking target in the previous video frame by using a Hungary algorithm according to the motion state variable of the target in the continuous video frame images, and determining the tracking track updating data of the unmanned aerial vehicle.
Further, according to the tracking track update data of the unmanned aerial vehicle, tracking control is performed on the unmanned aerial vehicle, including:
according to the tracking track updating data of the unmanned aerial vehicle, determining a tracking object of the unmanned aerial vehicle;
determining whether the target frame area of the tracking object exceeds a first threshold value according to the boundary frame information corresponding to the tracking object;
if the target frame area exceeds a first threshold value, calculating the motion control quantity of the unmanned aerial vehicle;
controlling the unmanned aerial vehicle to carry out tracking control according to the motion control quantity of the unmanned aerial vehicle, and judging whether a target frame of the tracked object is close to the edge of an input image or not in real time;
if the target frame of the tracking object is not close to the edge of the input image, adjusting the pitching angle of the unmanned aerial vehicle according to the target frame area of the tracking object;
if the target frame of the tracking object is close to the edge of the input image, the horizontal control is kept;
and if the area of the target frame does not exceed the first threshold, adjusting the video acquisition angle of the unmanned aerial vehicle so as to redetermine the tracking object.
Further, adjusting a pitch angle of the unmanned aerial vehicle according to a target frame area of the tracked object, including:
Judging whether the ratio of the target frame area of the tracking object to the input image area exceeds a preset ratio;
if yes, controlling the unmanned aerial vehicle to fly according to a preset inclination angle;
if not, controlling the unmanned aerial vehicle to keep flying in the original state.
In another aspect, there is provided herein a micro unmanned aerial vehicle target recognition and tracking control device based on computer vision, the device comprising:
the video stream acquisition module is used for acquiring video streams acquired by the unmanned aerial vehicle in real time;
the processing module is used for extracting video frame images in the video stream and processing each video frame image to obtain an input image;
the feature extraction module is used for inputting the input image into a pre-trained YOLOv5 neural network model so as to obtain target boundary box information in the input image, wherein the target boundary box information comprises a target type and a confidence level;
the matching module is used for matching the target in the current video frame image with the tracking target in the previous video frame image according to the target boundary box information and the tracking target determined by the user in advance, and determining tracking track updating data of the unmanned aerial vehicle;
And the control module is used for carrying out tracking control on the unmanned aerial vehicle according to the tracking track updating data of the unmanned aerial vehicle.
Finally, there is also provided herein a drone control system, the system comprising:
unmanned plane;
the control terminal is in communication connection with the unmanned aerial vehicle and is used for controlling the unmanned aerial vehicle to fly according to the method.
By adopting the technical scheme, the miniature unmanned aerial vehicle target identification and tracking control method based on computer vision is disclosed, and video streams acquired by the unmanned aerial vehicle in real time are acquired; extracting video frame images in the video stream, and processing each video frame image to obtain an input image; inputting the input image into a pre-trained YOLOv5 neural network model to obtain target bounding box information in the input image, wherein the target bounding box information comprises a target type and a confidence level; according to the target boundary box information and the tracking target determined by the user in advance, matching the target in the current video frame image with the tracking target in the previous video frame image, and determining tracking track updating data of the unmanned aerial vehicle; according to the tracking track updating data of the unmanned aerial vehicle, the unmanned aerial vehicle is tracked and controlled, and the unmanned aerial vehicle can accurately and efficiently identify and track the selected target.
The foregoing and other objects, features and advantages will be apparent from the following more particular description of preferred embodiments, as illustrated in the accompanying drawings.
Drawings
In order to more clearly illustrate the embodiments herein or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described below, it being obvious that the drawings in the following description are only some embodiments herein and that other drawings may be obtained according to these drawings without inventive effort to a person skilled in the art.
Fig. 1 illustrates a frame diagram of a drone control system provided by embodiments herein;
fig. 2 illustrates a schematic step diagram of a method for identifying and tracking a target of a micro unmanned aerial vehicle based on computer vision provided in an embodiment herein;
FIG. 3 shows an F1 confidence interval curve in embodiments herein;
FIG. 4 illustrates a state diagram of top plane drone target tracking in embodiments herein;
FIG. 5 is a diagram illustrating a mapping relationship between a camera coordinate system and a physical coordinate system in an embodiment herein;
FIG. 6 illustrates a specific example workflow diagram provided by embodiments herein;
FIG. 7 illustrates image recognition and multi-target tracking in embodiments herein;
FIG. 8 illustrates a continuous tracking and trace control flow diagram for processing successive frames to achieve a target in embodiments herein;
fig. 9 illustrates a control overall framework in embodiments herein:
FIG. 10 illustrates a diagram of results set implemented in a specific example in an embodiment herein;
fig. 11 illustrates a schematic structural diagram of a micro unmanned aerial vehicle target recognition and tracking control device based on computer vision provided in an embodiment herein;
fig. 12 shows a schematic structural diagram of a computer device provided in embodiments herein.
Description of the drawings:
101. unmanned aerial vehicle, 102, control terminal; 1101. a video stream acquisition module; 1102. a processing module; 1103. a feature extraction module; 1104. a matching module; 1105. and a control module.
Detailed Description
The following description of the embodiments of the present disclosure will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all embodiments of the disclosure. All other embodiments, based on the embodiments herein, which a person of ordinary skill in the art would obtain without undue burden, are within the scope of protection herein.
It should be noted that the terms "first," "second," and the like in the description and claims herein and in the foregoing figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments described herein may be capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, apparatus, article, or device that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed or inherent to such process, method, article, or device.
The unmanned aerial vehicle tracking technology based on computer vision is a hotspot of current research, and mainly aims to accurately find information such as the position and the motion trail of an interested target in a video sequence, and the application of the target tracking technology to the unmanned aerial vehicle is helpful for improving the intelligent level of the unmanned aerial vehicle. In practical tracking applications, the target area of interest is often affected by some environmental factors, for example, in a complex environment where a GPS signal fails or communication is refused, so that an algorithm calculation result is inaccurate, a target cannot be tracked stably, and finally the target is lost.
In order to solve the foregoing problem, the embodiment of the present disclosure provides an unmanned aerial vehicle control system, as shown in fig. 1, where the system includes an unmanned aerial vehicle 101 and a control terminal 102, where the unmanned aerial vehicle 101 is in wireless communication connection with the control terminal 102, where a camera or other acquisition device for acquiring pictures or videos is provided on the unmanned aerial vehicle 101, further, an acquisition direction of the acquisition device is a forward direction of an unmanned aerial vehicle head, the unmanned aerial vehicle 101 sends acquired pictures or video data to the control terminal 102, the control terminal 102 is controlled by an operator of the unmanned aerial vehicle, for example, may be a control handle, an unmanned aerial vehicle platform, or a control platform at a back end, and the control platform may be a background server, that is, the unmanned aerial vehicle 101 directly uploads the acquired data to the server, and the server may perform analysis of the acquired data and generate a control flight command related to the unmanned aerial vehicle so as to implement tracking flight of a tracking object, and specifically, the control terminal 102 may perform the following steps: acquiring a video stream acquired by an unmanned aerial vehicle in real time; extracting video frame images in the video stream, and processing each video frame image to obtain an input image; inputting the input image into a pre-trained YOLOv5 neural network model to obtain target bounding box information in the input image, wherein the target bounding box information comprises a target type and a confidence level; according to the target boundary box information and the tracking target determined by the user in advance, matching the target in the current video frame image with the tracking target in the previous video frame image, and determining tracking track updating data of the unmanned aerial vehicle; and carrying out tracking control on the unmanned aerial vehicle according to the tracking track updating data of the unmanned aerial vehicle. The problems of intelligent target detection and identification, autonomous target tracking control, automatic follow shooting, image feedback and the like under the condition of GPS signal failure or communication rejection complex environment are solved by utilizing a computer vision technology and an unmanned aerial vehicle platform, so that the micro unmanned aerial vehicle can accurately and efficiently identify, track and strike the selected target. The method does not need to additionally increase sensing equipment, is simple and convenient, has small calculated amount, is easy to maintain, and has low manufacturing cost and good practicability.
Based on the system provided by the above, the embodiment provides a miniature unmanned aerial vehicle target identification and tracking control method based on computer vision, which can improve the accurate and efficient identification and tracking of the unmanned aerial vehicle on the selected target. Fig. 2 is a schematic diagram of the steps of a method for computer vision-based micro-drone target recognition and tracking control provided in the embodiments herein, which provides the method steps of operation as described in the examples or flowcharts, but may include more or fewer steps of operation based on conventional or non-inventive labor. The order of steps recited in the embodiments is merely one way of performing the order of steps and does not represent a unique order of execution. When a system or apparatus product in practice is executed, it may be executed sequentially or in parallel according to the method shown in the embodiments or the drawings. As shown in fig. 2, the method may include:
s201: acquiring a video stream acquired by an unmanned aerial vehicle in real time;
s202: extracting video frame images in the video stream, and processing each video frame image to obtain an input image;
s203: inputting the input image into a pre-trained YOLOv5 neural network model to obtain target bounding box information in the input image, wherein the target bounding box information comprises a target type and a confidence level;
S204: according to the target boundary box information and the tracking target determined by the user in advance, matching the target in the current video frame image with the tracking target in the previous video frame image, and determining tracking track updating data of the unmanned aerial vehicle;
s205: and carrying out tracking control on the unmanned aerial vehicle according to the tracking track updating data of the unmanned aerial vehicle.
It may be understood that the execution main body of the embodiment of the present disclosure may be the control terminal, where the control terminal stores a YOLOv5 neural network model that is trained in advance, and is configured to perform feature extraction on a video stream received from an unmanned aerial vehicle, and then perform target matching on the extracted feature to determine a tracking object and a track of the unmanned aerial vehicle, so as to generate a control instruction of the unmanned aerial vehicle, and send the control instruction to the unmanned aerial vehicle, so as to implement tracking control of the unmanned aerial vehicle.
In this embodiment of the present disclosure, the unmanned aerial vehicle may be a mini multi-rotor unmanned aerial vehicle.
In the field of unmanned aerial vehicle automatic tracking targets, a target detection technology is an indispensable ring. The target detection can enable the unmanned aerial vehicle to realize autonomous flight by identifying and tracking the target object, so that the application effect of the unmanned aerial vehicle in the fields of military, civil and the like is enhanced. YOLOv5, as an efficient and accurate target detection algorithm based on deep learning, has been widely used in the scene of unmanned aerial vehicle automatic tracking targets. In this process, the drone processes and recognizes images or videos by capturing them and transmitting them to the computer terminal. In the identification process, YOLOv5 can rapidly locate and classify the target in the image, so that the unmanned aerial vehicle is guided to accurately track the target.
In this embodiment of the present disclosure, obtaining a video stream collected in real time by a drone includes:
establishing a data communication protocol with the unmanned aerial vehicle;
acquiring a plurality of data packets sent from the unmanned aerial vehicle according to the data communication protocol;
and extracting video frames from each data packet, and reconstructing the video frames to obtain the video stream of the unmanned aerial vehicle.
The control terminal is connected with the built-in wifi of the unmanned aerial vehicle, so that connection with the micro unmanned aerial vehicle is successful, and the real-time video stream is acquired by using a UDP data transmission protocol. The micro unmanned aerial vehicle divides the real-time video stream into a plurality of data packets, transmits the data packets to devices connected to the micro unmanned aerial vehicle through a WiFi network by using a UDP protocol, extracts video frames in each data packet, reconstructs the video stream, and finally returns to a generator object. Here, each time the iteration generator, it will acquire the latest video frame from tele. Finally, the video stream data of each frame is obtained by continuously calling the generator object in a circulating way, and further processing or displaying is carried out.
It should be noted that, preprocessing the obtained video frame, so that the obtained input image is adapted to the input requirement of the subsequent model, and the video frame needs to be adjusted to parameters such as step length, input picture size, deep learning frame and the like adapted to the model.
In the embodiment of the present specification, the pre-trained YOLOv5 neural network model is obtained by training the following steps:
acquiring a training data set with labels, wherein the types of objects in the training data set are consistent with the types of tracked objects of the unmanned aerial vehicle;
and training the initial Yolov5 neural network model according to the training data set and a preset loss function to obtain a training convergence Yolov5 neural network model.
It may be understood that the object in the training dataset is consistent with the tracked object type of the unmanned aerial vehicle to improve the prediction accuracy and reliability of the training model, for example, the tracked object type is human, and the tracked object type in the training dataset may also be human; the tracking object type is animal, and the object type in the training set can also be animal; the tracking object type is a moving tool, such as an automobile, an electric vehicle, a bicycle, etc., and the object type in the training set can also work for moving.
On the basis of constructing a target detection model by using a Yolov5 neural network architecture, a specific data set is adopted to construct a pedestrian target detection model with multi-feature fusion: for example, 803 pieces of pedestrian data are provided, wherein the target features in the training images are highly diversified (the body types of pedestrians, etc.), and the images and corresponding annotation frame information are included as a training data set. Compared with the coco128 data set of yolov5, the method has the advantages that the number of samples and the pertinence of the identification target are increased, and therefore the accuracy and the stability of the identification of the subsequent process are improved.
The specific training process is as follows:
1. collecting different types of pedestrian data sets, marking each image to obtain a corresponding tag file, and forming a tag file set by all the tag files; the tag file comprises the category and the target frame coordinates of the target in the unmanned aerial vehicle inspection image.
2. And processing the tag file set, converting each tag file into a text file, normalizing the coordinates of the target frame in the text file to obtain normalized frame coordinates, and forming a normalized coordinate set by all the normalized frame coordinates.
3. And training the model by using the training set, and optimizing parameters of the model. The trained model is evaluated using the validation set, and the metrics of the model, such as accuracy (Precision), recall (Recall), mAP, F1 score, etc., are calculated. Thereby obtaining a predictive model. The final model F1 score reached 0.93 at a confidence level of 0.365, as shown in fig. 3.
And (3) injection: the F1 score (F1-score) is a measure of the classification problem. Some machine learning contests of multi-classification problems often use F1-score as the final assessment method. It is the harmonic mean of the precision and recall, with a maximum of 1 and a minimum of 0.
For a certain class, a judgment index of Precision and Recall is combined, the value of F1-Score is from 0 to 1, 1 is the best, and 0 is the worst.
The specific formula is as follows:
in the embodiment of the specification, the YOLOv5 neural network model comprises a backbone network, a detection head network, a prediction layer and an anchor frame;
the backbone network is of a multi-level structure and is used for extracting multi-level image characteristics of an input image;
the detection head network is used for extracting target parameter information according to the multi-level image characteristics, and the target parameter information at least comprises target positions, categories and confidence information;
the prediction layer generates a target boundary box and confidence according to the output of the detection head network, and performs target sequencing according to the confidence so as to generate feature graphs with different scales;
the anchor frame is used for carrying out target prediction on each feature map, and screening and merging prediction results by utilizing non-maximum value inhibition so as to obtain target boundary frame information.
In a further embodiment, the backbone network is provided with a feature pyramid network and a path aggregation network;
the feature pyramid network is connected with the multi-level structure of the backbone network, and the image features of different levels are fused through transverse connection and up-sampling operation;
the path aggregation network is connected with the feature pyramid networks of different levels in a cascading way to fuse the feature information of the shallow layer and the deep layer.
It will be appreciated that the YOLOv5 neural network model uses a number of network structures, including anchor boxes (anchors), backbone (backbone) networks, and head of detection (head) networks for target detection, and the workflow of the YOLOv5 neural network model is as follows:
1. first, the input image is subjected to preprocessing steps, such as resizing, normalization, etc., to accommodate the input requirements of the network.
2. The input image may go through a CSPDarknet53 network, which is the backbone network in YOLOv5, where the CSPDarknet53 is made up of a series of convolution layers, residual blocks, and downsampling layers, gradually extracting the low-to-high-level features of the image, including details and semantic information of the image.
3. At different levels of CSPDarknet53, a Feature Pyramid Network (FPN) was introduced to obtain a multi-scale feature pyramid. The FPN fuses the features of different levels through transverse connection and up-sampling operation to acquire rich semantic information. This enables the network to handle targets of different scales simultaneously, improving the effect of target detection.
Yolov5 also introduced a path aggregation network (PANet) to further refine the feature pyramid network. The PANet is connected with feature pyramids of different levels in a cascading way, and feature information of a shallow layer and feature information of a deep layer are fused, so that a small-size target can be better detected. This improves the perceptibility of the target detection model to various target scales.
5. Based on the "backbone" network architecture, YOLOv5 also adds a detection head and a prediction layer. The detection head is responsible for extracting the position, category and confidence information of the target from the feature pyramid. The prediction layer generates a bounding box and a confidence score of the target according to the output of the detection head.
6. The output of the model is subjected to post-processing steps, such as non-maximum suppression (NMS), to filter and merge overlapping bounding boxes, and rank the targets according to confidence. A series of feature maps of different scales are thus obtained.
7. Next, for each feature map, target prediction is performed using an anchor frame. And obtaining the position and class probability of the target by carrying out class and boundary frame regression prediction on each anchor frame. Under the original yolov5 structure, the following parameters are added for each image frame appearing in the detection result:
target id number:
id≥0
The order in which the id objects enter the video stream.
Center point coordinates:
C=(Cx,Cy)
wherein Cx, cy are the abscissa and ordinate of the center point of the identification target image frame, respectively. A calibration orientation is provided for kalman filter tracking.
8. Finally, combining and screening the prediction results of different scales, removing overlapped bounding boxes by using non-maximum suppression (NMS), and sequencing targets according to the confidence level.
YOLOv5 can efficiently detect the position and type of an object in an image. By using the combination of the anchor frame, the backhaul network and the head network structure, the efficient target detection capability is realized, and the method is suitable for scenes with different scales and complexity.
In this embodiment of the present disclosure, the determining, according to the target bounding box information and the tracking target determined by the user in advance, the tracking track update data of the unmanned aerial vehicle by matching the target in the current video frame image with the tracking target in the previous video frame image includes:
processing the target boundary box information by using a Kalman filtering algorithm to obtain a motion state variable of a target in each video frame image, wherein the state variable comprises a center coordinate and a speed;
and matching the target in the current video frame with the tracking target in the previous video frame by using a Hungary algorithm according to the motion state variable of the target in the continuous video frame images, and determining the tracking track updating data of the unmanned aerial vehicle.
It can be understood that the YOLOv5 detection algorithm uses a neural network model to process the input image and generate a detection result containing information such as the target bounding box, class, and confidence. These detection results are typically used to identify the target object in the image. In order to achieve tracking of the target object, the motion state variable of the target object can be predicted and updated through a Kalman filtering algorithm, and the target object can be matched through a Hungary algorithm to determine a subsequent tracking track, wherein the tracking track can be a tracking path determined according to the tracking object, and the tracking path represents the control process of the unmanned aerial vehicle, such as change of a pitch angle, change of a height and the like.
In this specification, the kalman filter algorithm itself does not directly use image data, but uses target bounding box information provided by YOLOv5 as a measurement input. Since the Kalman filtering algorithm is a mathematical filtering algorithm based on a state space model, the Kalman filtering algorithm is mainly used for predicting and updating state variables, and does not directly process image data. The target detection and Kalman filtering algorithm is used cooperatively, and tracking and prediction of the target are realized by transmitting the information of the target boundary box.
The Kalman filtering algorithm uses target boundary box information provided by YOLOv5 as measurement input, predicts and updates the position of a target by combining a dynamic model of the system, and can provide smooth and accurate target position estimation by fusing detection results and estimation of the dynamic model.
Illustratively, the following describes a Kalman filter based position estimation:
1. and constructing state variables, a process model and an observation model of the system.
State variables:
wherein v is k To be measured three-dimensional velocity, z k For the altitude value in the z-axis direction of the aircraft, ba is the three-dimensional acceleration offset.
Process model:
x k =Ax k-1 +u k-1 +w k
wherein,for the system transfer matrix>For control input +.>For system noise, the uncertainty of the system model is characterized, and they have the following expression:
wherein a is x ,a y ,a z Accelerometer readings, respectively. Noise w k Assuming Gaussian white noise, the noise variance matrix isIs a diagonal array.
Observation model:
z k =Hx k +v k
wherein, observed quantityThe method comprises the step three of obtaining the horizontal speed through visual information and measuring the height by a height sensor. />For observing the transfer matrix>To observe noise, characterize the uncertainty of the observed quantity, let v be k Is Gaussian white noise, and the noise variance matrix is +. >Their expressions are as follows:
2. filter initialization
The initial value of the order state is:
x 0 =[v c d sonar cosθcosφ 0 3×1 ] T
wherein v is c =[v x v y v z ] T Is of the formulaThe initial visual speed given in (1), the altitude initial value given by the altitude sensor, wherein d sonar For the altitude sensor reading, the initial acceleration offset value is set to zero.
The initial value of the order state estimation error covariance is a diagonal matrix:
let k=0 and,P 0|0 =P 0
3. state one-step prediction
4. Error covariance one-step prediction
P k|k-1 =AP k-1|k-1 A T +Q k-1
5. Kalman filter gain update
K k =P k|k-1 H T (HP k|k-1 H T +R k ) -1
6. State update correction
7. Error covariance update correction
P k|k =(I 7 -K k H)P k|k-1
8. k=k+1, and the operation is continued in step 3.
Thus, the central coordinates of the object and the state variables such as speed are obtained.
And then, using a Hungary algorithm, wherein the Hungary algorithm can realize the association of a tracking target and a detection result in a target detection task, and the detection result in the current frame is matched with the tracking target in the previous frame through the principle of maximum weight matching, so that continuous tracking and track maintenance of the target are realized.
After tracking and matching the target through Kalman filtering and Hungary algorithm, the center point and track information of the target are obtained, and the information can be used as input quantity of visual servo for controlling the movement of the robot or unmanned aerial vehicle.
In this embodiment of the present disclosure, performing tracking control on the unmanned aerial vehicle according to tracking track update data of the unmanned aerial vehicle includes:
according to the tracking track updating data of the unmanned aerial vehicle, determining a tracking object of the unmanned aerial vehicle;
determining whether the target frame area of the tracking object exceeds a first threshold value according to the boundary frame information corresponding to the tracking object;
if the area of the target frame exceeds a first threshold, calculating the motion control quantity of the unmanned aerial vehicle, wherein the motion control quantity is a flight control instruction of the unmanned aerial vehicle, for example, the unmanned aerial vehicle can fly according to the current state;
and controlling the unmanned aerial vehicle to carry out tracking control according to the motion control quantity of the unmanned aerial vehicle, and judging whether a target frame of the tracked object is close to the edge of the input image or not in real time. Judging whether the tracked object is close to the edge of the image or not is equivalent to judging whether the unmanned aerial vehicle is close to the tracked object or not, and because the acquisition visual field of the camera is also enlarged along with the increase of the distance, when the distance and the acquisition angle between the tracked object and the unmanned aerial vehicle are proper, the target frame of the tracked object is also proper at the position of the input image, and the target frame of the tracked object is close to the edge area of the input image as a proper condition;
If the target frame of the tracked object is not close to the edge of the input image, adjusting the pitching angle of the unmanned aerial vehicle according to the target frame area of the tracked object so as to adjust the distance and the acquisition angle between the unmanned aerial vehicle and the tracked object;
if the target frame of the tracking object is close to the edge of the input image, the horizontal control is kept;
and if the area of the target frame does not exceed the first threshold, adjusting the video acquisition angle of the unmanned aerial vehicle so as to redetermine the tracking object.
Further, adjusting a pitch angle of the unmanned aerial vehicle according to a target frame area of the tracked object, including:
judging whether the ratio of the target frame area of the tracking object to the input image area exceeds a preset ratio;
if yes, controlling the unmanned aerial vehicle to fly according to a preset inclination angle;
if not, controlling the unmanned aerial vehicle to keep flying in the original state.
In order to realize accurate control of the unmanned aerial vehicle, the association between the three-dimensional camera coordinate system and the two-dimensional image coordinate system of the unmanned aerial vehicle is also required to be established, so that the unmanned aerial vehicle can be controlled based on the acquired images and corresponding control instructions, and the association between the three-dimensional camera coordinate system and the two-dimensional image coordinate system of the unmanned aerial vehicle is established through the following steps:
1. Firstly, the following visual servo model of the multi-rotor unmanned aerial vehicle is established:
1) Multi-rotor unmanned aerial vehicle flight control rigid body model
For simplicity, in modeling a multi-rotor unmanned aerial vehicle, it is assumed that the multi-rotor unmanned aerial vehicle is a rigid body, the multi-rotor is only subject to gravity and screw tension, and the mass and moment of inertia of the multi-rotor are unchanged. The geometric center of the multiple rotors is consistent with the gravity center. Then, the multi-rotor flight control rigid body model euler angle represents:
where e is the basis vector of the ground coordinate system, m is the mass of the unmanned aerial vehicle,for the barycentric coordinates of multiple rotors>Representing the speed of multiple rotors>Indicating the magnitude of the total tension of the propeller, +.>Acceleration of gravity, ++>Indicating gyroscopic moment ++>Representing the moment generated by the propeller on the machine body axis, < >>Representing the moment of inertia of a multi-rotor +.> Indicating the angular velocity of the body, wherein, the ∈h indicates the attitude angle, and W is the conversion matrix from the rotational angular velocity of the body to the attitude change rate; />Representing a rotation matrix from the body coordinate system to the world coordinate system.
In a multi-rotor unmanned aerial vehicle flight control model, an earth coordinate system and an organism coordinate system are simultaneously involved. On the one hand, it is desirable to represent the position and speed of the multiple rotors in the earth coordinate system, which facilitates the flight control hands to better determine the flight position and speed, and which is consistent with GPS measurements; on the other hand, the representation of the tension and moment is very intuitive in the body coordinate system, and the measurement of the sensor is also represented in the body coordinate system. The existence of the two coordinate systems facilitates calculation and application in different scenes. The remarkable characteristics of the multi-rotor flight control rigid body model are shown in Meaning that the direction is always the same as O b z b The axial negative directions are consistent.
The multi-rotor flight rigid body control is expressed by mathematics, so that the design, the writing and the realization of a multi-rotor control algorithm are facilitated.
2) Visual imaging model
As shown in fig. 4, a state diagram of target tracking of a plane-looking unmanned aerial vehicle is shown, wherein gray blocks are targets, om is the center of gravity of the multi-rotor unmanned aerial vehicle, O c Is the center of gravity of the camera, d is the distance between the two, eta is the angle of view and a point p is in space e The imaging position in the image plane can be approximated by a pinhole imaging model, i.e. point p e Projection position p in image plane c Is the optical center O and the space point p e An intersection of the line of (c) with the image plane. Thus the worldP in coordinate system e Point coordinates (x) e ,y e ,z e ) T And projection point p c Pixel coordinates (u, v) T The relationship between these can be described as follows:
wherein alpha is x As scale factor on the u-axis, alpha y Is the scale factor on the v axis, and(f is focal length, dx and dy are pixel sizes in the u-axis and v-axis directions, respectively); (u) 0 ,v 0 ) Is the intersection point of the camera optical axis and the image plane, called principal point coordinates. Alpha xy ,u 0, v 0 Only the camera internal parameters are relevant, called camera internal parameters. />The rotation matrix and translation vector of the camera coordinate system and the world coordinate system are called external parameters of the video camera.
The visual imaging principle clarifies the shooting mechanism of a camera carried by the aircraft, so that three-dimensional data = mapping acquired by the camera carried by the aircraft to a two-dimensional image plane is the basis for establishing a visual servo model.
3) Visual servo model
Visual servoing refers to controlling the movement of a robot using computer visual information. All vision-based servo schemes aim at reducing the error e (t ()).
Camera speed and speedIs the relation of:
and L is e =L s ,L s ∈R k*6 Called the Accord matrix, v c = (ω, v) representsInstantaneous line speed of camera, will +.>As input, if the Accord matrix is known, the output v can be obtained c Solving a Accord ratio matrix:
wherein, the 3-D point coordinate under the camera coordinate system is (x) e ,y e ,z e ) T The coordinates corresponding to the 2-D image plane are p= (x, y). The visual servo aims at reducing the difference between the current image coordinates of the target and the expected image coordinates of the target, has the meaning of enabling the robot to track the target more accurately through the Accord ratio matrix, and is a core module in the multi-rotor visual servo model.
4) Multi-rotor vision servo model
Based on the rigid body model and the visual imaging principle of the multi-rotor unmanned aerial vehicle, the following multi-rotor visual servo model is designed by utilizing a visual servo method.
In the longitudinal channel, define x= [ e ] y v y v z θ] T ,u=[f ω x ] T Can obtain
y(t)=[1 0 0 0]x(t)
Wherein,is the velocity under the camera system after vertical decomposition, x= [ e y v y v z θ] T Is the component velocity in the y-direction, x= [ e y v y v z θ] T Is the component velocity in the z direction, u= [ fω ] x ] T Is the pulling force, u= [ fω ] x ] T Is the angular velocity of the component in the x direction.
On the lateral channel, a multi-rotor lateral model is transformed to a camera coordinate system:
x=[e x v x v z ψ] T
u=[ω y ω z ] T
y(t)=[1 0 0 0]x(t)
wherein x= [ e ] x v x v z ψ] T Is the velocity under the camera system after horizontal decomposition, x= [ e x v x v z ψ] T Is the component velocity in the x direction, x= [ e x v x v z ψ] T Is the component velocity in the z direction, u= [ fω ] x ] T Is the pulling force, u= [ omega ] y ω z ] T Is the angular velocity in the z direction. x (t) represents a time-dependent expression of coordinates.
Establishing a multi-rotor nonlinear flight control rigid body model as a basis of multi-rotor control; then taking a pinhole model as an example, introducing mathematical expression of visual imaging; finally, introducing the concept and the formula expression of visual servo, deducing the Accord ratio matrix, and establishing the association between the three-dimensional camera coordinate system and the two-dimensional image coordinate system.
Fig. 5 is a schematic diagram of a mapping relationship between a camera coordinate system and a physical coordinate system in the embodiment herein. O represents the optical center of the camera, O I Representing the origin of the image coordinate system. (u, v) represents coordinate axes of an image coordinate system, (X) c ,Y c ,Z c ) Representing the coordinate axes of the camera coordinate system. P is p e (x e ,y e ,z e ) Representing coordinates of a three-dimensional point in a camera coordinate system, p c (u, v) represents a point p e (x e ,y e ,z e ) Projection onto an image.
Illustratively, on the basis of the model establishment, the method provided by the embodiment of the present specification may include the following steps:
1. a series of libraries required for installer operation, including openCV, av, and tellopy, etc.
2. And controlling the tello unmanned aerial vehicle by using a telopy library, and processing the picture data in the video stream by using opencv.
3. And connecting the unmanned aerial vehicle, and acquiring a video stream, wherein the video stream attempts 3 times of connection until success.
4. The drone takes off and subscribes to relevant travel log data.
5. A picture of the data is taken and the dimensions (h, w) of the picture, i.e. the height and width, are taken.
6. By the ratio of the target width and height to the image size, it is determined whether the target is large enough to follow. If the target is large enough, the yaw angle is converted to the angle that the drone needs to rotate using the yaw angle coefficient and the servo center point, and the altitude error is converted to the altitude that the drone needs to change using the altitude control coefficient and the servo center point.
7. And judging whether the object needs to advance or retreat according to the distance between the object and the image boundary. If the target is too far, flying the drone forward to approach the target; if the targets are too close, the drone is flown backwards to maintain distance.
8. And judging whether the unmanned aerial vehicle needs to ascend or descend according to the proportion of the target area and the image size, adjusting the pitching angle, and finally realizing the tracking of the target.
The embodiment of the specification provides a target recognition and tracking control method based on a computer vision unmanned aerial vehicle, which does not need to additionally increase sensing equipment, and has the advantages of small algorithm calculation amount, high algorithm robustness and the like. The method is simple and convenient, easy to maintain, low in manufacturing cost and good in practicability.
The embodiment of the specification provides a target recognition and tracking control method of a micro unmanned aerial vehicle based on computer vision, and a real experiment is carried out by using a tele unmanned aerial vehicle of Xinjiang, as shown in fig. 6, and a specific workflow diagram in the experiment is shown.
Firstly, performing visual detection of an unmanned aerial vehicle, wherein the flow is as follows;
1. the downloaded code is linked according to the following open source code: https:// github. On the basis of constructing a target detection model by using the YOLOv5 neural network architecture, training the data set according to the detailed process in the step one by adopting the data set.
2. And calling a YOLOv5 model in Python to perform target detection, turning on a tello lateral switch, connecting a tello built-in wifi, waiting for a yellow lamp to flash, and ensuring successful connection with the micro unmanned aerial vehicle. And acquiring the real-time video stream by using the UDP data transmission protocol, and carrying out iterative updating. Each time an iteration generator, it gets the latest video frame from tello. Finally, the video stream data of each frame is acquired by looping through the generator object.
3. For each frame of image acquired by the tello camera, preprocessing is required to facilitate model detection. Preprocessing includes operations of resizing, normalizing, scaling, cropping, etc. the image to accommodate the input requirements of the model.
4. Target detection was performed using the YOLOv5 model. Because the coordinates of the center point and the target id of the tello strike need to be determined in the subsequent strike process, the identified targets need to be numbered and assigned, and a GUI interaction interface is written, so that the unmanned aerial vehicle tracking target can be conveniently selected by user operation. The process of obtaining the coordinates of the center point comprises the following steps: YOLOv5 in open source code can obtain the following information: the class, location (upper left, lower right, width, height) and confidence of the target object. The coordinates of the center point can be obtained by simple algebraic operation according to the position information, and are marked on the interface, as shown in fig. 7, which is an image recognition and multi-target tracking schematic diagram in the embodiment herein.
5. The idea of obtaining id number is: one number that will be obtained during the multi-object tracking process, and the id numbers are given in the order of the identified objects. This id will not change provided that the target is always within the field of view and can be matched all the time; if a new object is identified again after the object is lost or a new object is identified, an id is given again. The value of id will increase in sequence with the identification of new targets and will not repeat.
In the running process, the computer outputs a detection frame, wherein the information contained in the detection frame comprises the following steps: the category, confidence, and center point of the object, the id values of different objects. The id number required to track the hit is input in the control panel to realize the target selection. The results of the target bounding box, class, confidence, etc. are typically used to identify the target object in the image. These detection results are then passed as inputs to a kalman filter algorithm.
6. And secondly, predicting the position of the tracking target in the current frame through a Kalman filtering algorithm in the third step, and generating predicted target position information. These predicted position information are composed of state variables such as the center coordinates of the target and the speed. And predicting and updating the position of the target according to the motion model and the observation data of the target by using a Kalman filtering algorithm. And then, two groups of target position information, one group is the predicted target position output by the Kalman filtering algorithm, the other group is the target position detected in the current frame output by the target detection algorithm, and the detection result in the current frame is matched with the tracking target in the previous frame as the input of the Hungary algorithm. The concrete implementation flow of the Hungary algorithm in the method is as follows:
1. Initializing a tracker: during the initialization process, parameters such as the number of trackers, the maximum tracking frame number, the distance threshold, etc., and state information (such as position, skip frame number, etc.) of each tracker are defined.
2. For the current target detection result, calculating a matching matrix: and calculating the distance between the detected target and all the current trackers, transmitting the distance matrix to a Hungary algorithm, and solving the matching relation between the tracker corresponding to the minimum distance and the target.
3. Updating the tracker state according to the matching result: if a target is not matched to any tracker, indicating that it is a new target, a new tracker is needed; if a tracker is not matched to any target, indicating that the target has left the field of view, the tracker needs to be deleted; if the distance of the match is greater than the distance threshold, it is also deleted, indicating that the distance is too far to match.
4. Depending on the number of unmatched objects, trackers are newly built or these objects are ignored.
7. And tracking and matching the target through Kalman filtering and Hungary algorithm to obtain the position and track information of the target, wherein the information can be used as the input quantity of visual servo.
Processing successive frames to achieve continuous tracking and tracking control of the target, as shown in fig. 8 and 9, the flow is as follows:
1. a frame of an image in a video stream is acquired, as well as the height and width of the image.
2. The target object in the frame image is detected using the YOLOv5 model, and the abscissa and ordinate of the center coordinates of the target object, and the width and height of the target object are obtained through the above-described series of operations.
3. And instantiating the object, and assigning the transmitted Tello object to the variable for communication and control with the unmanned aerial vehicle. Initializing variables: the control coefficient of the yaw angle is set to 0.001; the control coefficient of the thrust value is set to be-0.005; the control value of the pitch angle is set to 0.45; setting the ratio of the minimum area of the target detection result to the total area of the image to be 0; setting the ratio of the maximum area of the target detection result to the total area of the image to be 1; the proportion of the center position of the steering engine on the X axis is set to be 0.4; the proportion of the central position of the steering engine on the Y axis is set to 0.5; a range ratio for checking whether the object is located near the image boundary is set to 0.1; a threshold value for judging whether the target is located near the center position of the Y axis is set to 30; the adjustment factor of the thrust value in the downward flight is set to 0.5; the adjustment factor of the thrust value at the time of upward flight is set to 0.5.
4. And judging whether the unmanned aerial vehicle needs to be subjected to motion control according to the size of the target object. If the area of the target object exceeds the threshold value, performing motion control; otherwise, the control is not performed.
5. If control is needed, the control amounts of yaw movement and lifting movement of the unmanned aerial vehicle are calculated.
6. And judging whether the target object is in the edge area of the picture, if so, setting the pitching angle of the unmanned aerial vehicle to 0, namely keeping the pitching angle horizontal. Otherwise, the pitching angle of the unmanned aerial vehicle is adjusted according to the area of the target object, so that the unmanned aerial vehicle approaches or departs from the target object. Specifically, if the area of the target object is smaller than the threshold value, the pitch angle is set to the value initialized before, otherwise, the pitch angle is set to 0.
7. If the target tracking fails, the drone is simply motion controlled to re-search for the target. Specifically, the unmanned aerial vehicle is rotated to the left by a certain angle, the height of the unmanned aerial vehicle is kept unchanged, and the pitching angle of the unmanned aerial vehicle is kept unchanged.
8. And repeating the steps to realize continuous tracking and tracking control of the target.
9. Finally, tracking of the tello unmanned aerial vehicle is achieved through the steps. The results of the implementation are shown in fig. 10.
On the basis of the method provided above, the embodiment of the present disclosure further provides a micro unmanned aerial vehicle target recognition and tracking control device based on computer vision, as shown in fig. 11, where the device includes:
the video stream acquisition module 1101 is configured to acquire a video stream acquired by the unmanned aerial vehicle in real time;
a processing module 1102, configured to extract video frame images in the video stream, and process each video frame image to obtain an input image;
the feature extraction module 1103 is configured to input the input image into a pre-trained YOLOv5 neural network model, so as to obtain target bounding box information in the input image, where the target bounding box information includes a target type and a confidence level;
the matching module 1104 is configured to match a target in a current video frame image with a tracking target in a previous video frame image according to the target bounding box information and a tracking target determined in advance by a user, and determine tracking track update data of the unmanned aerial vehicle;
and the control module 1105 is configured to perform tracking control on the unmanned aerial vehicle according to the tracking track update data of the unmanned aerial vehicle.
The beneficial effects obtained by the device are consistent with those obtained by the method, and the embodiments of the present disclosure are not repeated.
The present embodiment provides a computer device, the internal structure of which can be shown in fig. 12. The computer device includes a processor, a memory, and a network interface connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a method for identifying a driving surface covering of a computer device.
It will be appreciated by those skilled in the art that the structure shown in FIG. 12 is merely a block diagram of some of the structures associated with the present inventive arrangements and is not limiting of the computer device to which the present inventive arrangements may be applied, and that a particular computer device may include more or fewer components than shown, or may combine some of the components, or have a different arrangement of components.
In one embodiment, a computer device is provided, comprising a memory and a processor, the memory having stored therein a computer program, the processor implementing the steps of the method embodiments described above when the computer program is executed.
In one embodiment, a computer-readable storage medium is provided, on which a computer program is stored which, when executed by a processor, carries out the steps of the method embodiments described above.
In an embodiment, a computer program product is provided, comprising a computer program which, when executed by a processor, implements the steps of the method embodiments described above.
Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, database, or other medium used in embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, high density embedded nonvolatile Memory, resistive random access Memory (ReRAM), magnetic random access Memory (Magnetoresistive Random Access Memory, MRAM), ferroelectric Memory (Ferroelectric Random Access Memory, FRAM), phase change Memory (Phase Change Memory, PCM), graphene Memory, and the like. Volatile memory can include random access memory (Random Access Memory, RAM) or external cache memory, and the like. By way of illustration, and not limitation, RAM can be in the form of a variety of forms, such as static random access memory (Static Random Access Memory, SRAM) or dynamic random access memory (Dynamic Random Access Memory, DRAM), and the like. The databases referred to in the embodiments provided herein may include at least one of a relational database and a non-relational database. The non-relational database may include, but is not limited to, a blockchain-based distributed database, and the like. The processor referred to in the embodiments provided in the present application may be a general-purpose processor, a central processing unit, a graphics processor, a digital signal processor, a programmable logic unit, a data processing logic unit based on quantum computing, or the like, but is not limited thereto.
It should also be understood that in embodiments herein, the term "and/or" is merely one relationship that describes an associated object, meaning that three relationships may exist. For example, a and/or B may represent: a exists alone, A and B exist together, and B exists alone. In addition, the character "/" herein generally indicates that the front and rear associated objects are an "or" relationship.
Those of ordinary skill in the art will appreciate that the elements and algorithm steps described in connection with the embodiments disclosed herein may be embodied in electronic hardware, in computer software, or in a combination of the two, and that the elements and steps of the examples have been generally described in terms of function in the foregoing description to clearly illustrate the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present disclosure.
It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described systems, apparatuses and units may refer to corresponding procedures in the foregoing method embodiments, and are not repeated herein.
In the several embodiments provided herein, it should be understood that the disclosed systems, devices, and methods may be implemented in other ways. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of the units is merely a logical function division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. In addition, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices, or elements, or may be an electrical, mechanical, or other form of connection.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the elements may be selected according to actual needs to achieve the objectives of the embodiments herein.
Specific examples are set forth herein to illustrate the principles and embodiments herein and are merely illustrative of the methods herein and their core ideas; also, as will be apparent to those of ordinary skill in the art in light of the teachings herein, many variations are possible in the specific embodiments and in the scope of use, and nothing in this specification should be construed as a limitation on the invention.

Claims (10)

1. The miniature unmanned aerial vehicle target recognition and tracking control method based on computer vision is characterized by comprising the following steps of:
acquiring a video stream acquired by an unmanned aerial vehicle in real time;
extracting video frame images in the video stream, and processing each video frame image to obtain an input image;
inputting the input image into a pre-trained YOLOv5 neural network model to obtain target bounding box information in the input image, wherein the target bounding box information comprises a target type and a confidence level;
according to the target boundary box information and the tracking target determined by the user in advance, matching the target in the current video frame image with the tracking target in the previous video frame image, and determining tracking track updating data of the unmanned aerial vehicle;
and carrying out tracking control on the unmanned aerial vehicle according to the tracking track updating data of the unmanned aerial vehicle.
2. The method of claim 1, wherein acquiring the video stream acquired in real time by the drone comprises:
establishing a data communication protocol with the unmanned aerial vehicle;
acquiring a plurality of data packets sent from the unmanned aerial vehicle according to the data communication protocol;
and extracting video frames from each data packet, and reconstructing the video frames to obtain the video stream of the unmanned aerial vehicle.
3. The method of claim 1, wherein the pre-trained YOLOv5 neural network model is trained by:
acquiring a training data set with labels, wherein the types of objects in the training data set are consistent with the types of tracked objects of the unmanned aerial vehicle;
and training the initial Yolov5 neural network model according to the training data set and a preset loss function to obtain a training convergence Yolov5 neural network model.
4. The method of claim 1, wherein the YOLOv5 neural network model comprises a backbone network, a detection head network, a prediction layer, and an anchor frame;
the backbone network is of a multi-level structure and is used for extracting multi-level image characteristics of an input image;
the detection head network is used for extracting target parameter information according to the multi-level image characteristics, and the target parameter information at least comprises target positions, categories and confidence information;
the prediction layer generates a target boundary box and confidence according to the output of the detection head network, and performs target sequencing according to the confidence so as to generate feature graphs with different scales;
the anchor frame is used for carrying out target prediction on each feature map, and screening and merging prediction results by utilizing non-maximum value inhibition so as to obtain target boundary frame information.
5. The method of claim 4, wherein a feature pyramid network and a path aggregation network are provided in the backbone network;
the feature pyramid network is connected with the multi-level structure of the backbone network, and the image features of different levels are fused through transverse connection and up-sampling operation;
the path aggregation network is connected with the feature pyramid networks of different levels in a cascading way to fuse the feature information of the shallow layer and the deep layer.
6. The method according to claim 1, wherein the determining the tracking trajectory update data of the unmanned aerial vehicle according to the target bounding box information and the tracking target determined in advance by the user, and matching the target in the current video frame image with the tracking target in the previous video frame image, includes:
processing the target boundary box information by using a Kalman filtering algorithm to obtain a motion state variable of a target in each video frame image, wherein the state variable comprises a center coordinate and a speed;
and matching the target in the current video frame with the tracking target in the previous video frame by using a Hungary algorithm according to the motion state variable of the target in the continuous video frame images, and determining the tracking track updating data of the unmanned aerial vehicle.
7. The method of claim 4, wherein tracking control of the drone based on the tracking trajectory update data of the drone comprises:
according to the tracking track updating data of the unmanned aerial vehicle, determining a tracking object of the unmanned aerial vehicle;
determining whether the target frame area of the tracking object exceeds a first threshold value according to the boundary frame information corresponding to the tracking object;
if the target frame area exceeds a first threshold value, calculating the motion control quantity of the unmanned aerial vehicle;
controlling the unmanned aerial vehicle to carry out tracking control according to the motion control quantity of the unmanned aerial vehicle, and judging whether a target frame of the tracked object is close to the edge of an input image or not in real time;
if the target frame of the tracking object is close to the edge of the input image, the horizontal control is kept;
if the target frame of the tracking object is not close to the edge of the input image, adjusting the pitching angle of the unmanned aerial vehicle according to the target frame area of the tracking object;
and if the area of the target frame does not exceed the first threshold, adjusting the video acquisition angle of the unmanned aerial vehicle so as to redetermine the tracking object.
8. The method of claim 7, wherein adjusting the pitch angle of the drone based on the target frame area of the tracked object comprises:
Judging whether the ratio of the target frame area of the tracking object to the input image area exceeds a preset ratio;
if yes, controlling the unmanned aerial vehicle to fly according to a preset inclination angle;
if not, controlling the unmanned aerial vehicle to keep flying in the original state.
9. A miniature unmanned aerial vehicle target recognition and tracking control device based on computer vision, the device comprising:
the video stream acquisition module is used for acquiring video streams acquired by the unmanned aerial vehicle in real time;
the processing module is used for extracting video frame images in the video stream and processing each video frame image to obtain an input image;
the feature extraction module is used for inputting the input image into a pre-trained YOLOv5 neural network model so as to obtain target boundary box information in the input image, wherein the target boundary box information comprises a target type and a confidence level;
the matching module is used for matching the target in the current video frame image with the tracking target in the previous video frame image according to the target boundary box information and the tracking target determined by the user in advance, and determining tracking track updating data of the unmanned aerial vehicle;
and the control module is used for carrying out tracking control on the unmanned aerial vehicle according to the tracking track updating data of the unmanned aerial vehicle.
10. A drone control system, the system comprising:
unmanned plane;
a control terminal in communication with the drone for controlling the drone to fly according to the method of any one of claims 1 to 8.
CN202310603409.1A 2023-05-25 2023-05-25 Miniature unmanned aerial vehicle target recognition and tracking control method based on computer vision Pending CN117036989A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310603409.1A CN117036989A (en) 2023-05-25 2023-05-25 Miniature unmanned aerial vehicle target recognition and tracking control method based on computer vision

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310603409.1A CN117036989A (en) 2023-05-25 2023-05-25 Miniature unmanned aerial vehicle target recognition and tracking control method based on computer vision

Publications (1)

Publication Number Publication Date
CN117036989A true CN117036989A (en) 2023-11-10

Family

ID=88621461

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310603409.1A Pending CN117036989A (en) 2023-05-25 2023-05-25 Miniature unmanned aerial vehicle target recognition and tracking control method based on computer vision

Country Status (1)

Country Link
CN (1) CN117036989A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117788302A (en) * 2024-02-26 2024-03-29 山东全维地信科技有限公司 Mapping graphic processing system

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117788302A (en) * 2024-02-26 2024-03-29 山东全维地信科技有限公司 Mapping graphic processing system

Similar Documents

Publication Publication Date Title
CN113269098B (en) Multi-target tracking positioning and motion state estimation method based on unmanned aerial vehicle
CN111932588B (en) Tracking method of airborne unmanned aerial vehicle multi-target tracking system based on deep learning
CN112567201B (en) Distance measuring method and device
CN110136199B (en) Camera-based vehicle positioning and mapping method and device
Zhao et al. Detection, tracking, and geolocation of moving vehicle from uav using monocular camera
WO2020186678A1 (en) Three-dimensional map constructing method and apparatus for unmanned aerial vehicle, computer device, and storage medium
CN106874854B (en) Unmanned aerial vehicle tracking method based on embedded platform
CN110047108B (en) Unmanned aerial vehicle pose determination method and device, computer equipment and storage medium
CN105955308B (en) The control method and device of a kind of aircraft
CN109857144B (en) Unmanned aerial vehicle, unmanned aerial vehicle control system and control method
JP2022520019A (en) Image processing methods, equipment, mobile platforms, programs
CN106529538A (en) Method and device for positioning aircraft
Leira et al. Object detection, recognition, and tracking from UAVs using a thermal camera
CN108292140A (en) System and method for making a return voyage automatically
WO2020103108A1 (en) Semantic generation method and device, drone and storage medium
CN110570463B (en) Target state estimation method and device and unmanned aerial vehicle
Zhang et al. An intruder detection algorithm for vision based sense and avoid system
US10739770B2 (en) Autonomously-controlled inspection platform with model-based active adaptive data collection
CN115861860B (en) Target tracking and positioning method and system for unmanned aerial vehicle
CN111812978B (en) Cooperative SLAM method and system for multiple unmanned aerial vehicles
CN111831010A (en) Unmanned aerial vehicle obstacle avoidance flight method based on digital space slice
Baldini et al. Learning pose estimation for UAV autonomous navigation and landing using visual-inertial sensor data
CN117036989A (en) Miniature unmanned aerial vehicle target recognition and tracking control method based on computer vision
CN114564042A (en) Unmanned aerial vehicle landing method based on multi-sensor fusion
Basit et al. Joint localization of pursuit quadcopters and target using monocular cues

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination