CN117036989A

CN117036989A - Miniature unmanned aerial vehicle target recognition and tracking control method based on computer vision

Info

Publication number: CN117036989A
Application number: CN202310603409.1A
Authority: CN
Inventors: 邓恒; 冯尚斌; 顾爽; 王怡菲; 乐祥立; 刘石; 全权
Original assignee: Beijing University of Technology; Beihang University
Current assignee: Beijing University of Technology; Beihang University
Priority date: 2023-05-25
Filing date: 2023-05-25
Publication date: 2023-11-10

Abstract

The invention provides a miniature unmanned aerial vehicle target identification and tracking control method based on computer vision, which comprises the steps of acquiring video streams acquired by an unmanned aerial vehicle in real time; extracting video frame images in the video stream, and processing each video frame image to obtain an input image; inputting the input image into a pre-trained YOLOv5 neural network model to obtain target bounding box information in the input image, wherein the target bounding box information comprises a target type and a confidence level; according to the target boundary box information and the tracking target determined by the user in advance, matching the target in the current video frame image with the tracking target in the previous video frame image, and determining tracking track updating data of the unmanned aerial vehicle; according to the tracking track updating data of the unmanned aerial vehicle, the unmanned aerial vehicle is tracked and controlled, and the unmanned aerial vehicle can accurately and efficiently identify and track the selected target.

Description

Miniature unmanned aerial vehicle target recognition and tracking control method based on computer vision

Technical Field

The invention belongs to the technical field of unmanned aerial vehicles, and particularly relates to a miniature unmanned aerial vehicle target identification and tracking control method based on computer vision.

Background

The unmanned aerial vehicle is a powered, controllable, capable of carrying multiple task equipment, performing multiple tasks and capable of being reused. Unmanned aerial vehicles that can be controlled using wireless remote control devices and their own controls, such as unmanned helicopters, unmanned fixed wing aircraft, unmanned parachute wing aircraft, multi-rotor unmanned aerial vehicles, and the like. The unmanned aerial vehicle can be used for mounting a shooting device and is used for aerial photography, mapping, investigation and the like.

Target tracking is an important research direction of computer vision, and is to accurately find information such as the position and the motion trail of an interested target in a video sequence, and the application of a target tracking technology to an unmanned plane is helpful for improving the intelligent level of the unmanned plane. In practical tracking applications, the target area of interest is often affected by some environmental factors, for example, in a complex environment where a GPS signal fails or communication is refused, so that an algorithm calculation result is inaccurate, a target cannot be tracked stably, and finally the target is lost.

Disclosure of Invention

Aiming at the problems in the prior art, the object of the present invention is to provide a miniature unmanned aerial vehicle target identification and tracking control method based on computer vision, which improves the accurate and efficient identification and tracking of the unmanned aerial vehicle to the selected target.

In order to solve the technical problems, the specific technical scheme is as follows:

in one aspect, provided herein is a computer vision-based micro unmanned aerial vehicle target recognition and tracking control method, the method comprising:

acquiring a video stream acquired by an unmanned aerial vehicle in real time;

extracting video frame images in the video stream, and processing each video frame image to obtain an input image;

inputting the input image into a pre-trained YOLOv5 neural network model to obtain target bounding box information in the input image, wherein the target bounding box information comprises a target type and a confidence level;

according to the target boundary box information and the tracking target determined by the user in advance, matching the target in the current video frame image with the tracking target in the previous video frame image, and determining tracking track updating data of the unmanned aerial vehicle;

and carrying out tracking control on the unmanned aerial vehicle according to the tracking track updating data of the unmanned aerial vehicle.

Further, acquiring the video stream acquired by the unmanned aerial vehicle in real time comprises:

establishing a data communication protocol with the unmanned aerial vehicle;

acquiring a plurality of data packets sent from the unmanned aerial vehicle according to the data communication protocol;

And extracting video frames from each data packet, and reconstructing the video frames to obtain the video stream of the unmanned aerial vehicle.

Further, the pre-trained YOLOv5 neural network model is obtained through training by the following steps:

acquiring a training data set with labels, wherein the types of objects in the training data set are consistent with the types of tracked objects of the unmanned aerial vehicle;

and training the initial Yolov5 neural network model according to the training data set and a preset loss function to obtain a training convergence Yolov5 neural network model.

Further, the YOLOv5 neural network model comprises a main network, a detection head network, a prediction layer and an anchor frame;

the backbone network is of a multi-level structure and is used for extracting multi-level image characteristics of an input image;

the detection head network is used for extracting target parameter information according to the multi-level image characteristics, and the target parameter information at least comprises target positions, categories and confidence information;

the prediction layer generates a target boundary box and confidence according to the output of the detection head network, and performs target sequencing according to the confidence so as to generate feature graphs with different scales;

The anchor frame is used for carrying out target prediction on each feature map, and screening and merging prediction results by utilizing non-maximum value inhibition so as to obtain target boundary frame information.

Further, a characteristic pyramid network and a path aggregation network are arranged in the main network;

the feature pyramid network is connected with the multi-level structure of the backbone network, and the image features of different levels are fused through transverse connection and up-sampling operation;

the path aggregation network is connected with the feature pyramid networks of different levels in a cascading way to fuse the feature information of the shallow layer and the deep layer.

Further, the matching of the target in the current video frame image and the tracking target in the previous video frame image according to the target bounding box information and the tracking target determined by the user in advance, and determining the tracking track update data of the unmanned aerial vehicle comprise:

processing the target boundary box information by using a Kalman filtering algorithm to obtain a motion state variable of a target in each video frame image, wherein the state variable comprises a center coordinate and a speed;

and matching the target in the current video frame with the tracking target in the previous video frame by using a Hungary algorithm according to the motion state variable of the target in the continuous video frame images, and determining the tracking track updating data of the unmanned aerial vehicle.

Further, according to the tracking track update data of the unmanned aerial vehicle, tracking control is performed on the unmanned aerial vehicle, including:

according to the tracking track updating data of the unmanned aerial vehicle, determining a tracking object of the unmanned aerial vehicle;

determining whether the target frame area of the tracking object exceeds a first threshold value according to the boundary frame information corresponding to the tracking object;

if the target frame area exceeds a first threshold value, calculating the motion control quantity of the unmanned aerial vehicle;

controlling the unmanned aerial vehicle to carry out tracking control according to the motion control quantity of the unmanned aerial vehicle, and judging whether a target frame of the tracked object is close to the edge of an input image or not in real time;

if the target frame of the tracking object is not close to the edge of the input image, adjusting the pitching angle of the unmanned aerial vehicle according to the target frame area of the tracking object;

if the target frame of the tracking object is close to the edge of the input image, the horizontal control is kept;

and if the area of the target frame does not exceed the first threshold, adjusting the video acquisition angle of the unmanned aerial vehicle so as to redetermine the tracking object.

Further, adjusting a pitch angle of the unmanned aerial vehicle according to a target frame area of the tracked object, including:

Judging whether the ratio of the target frame area of the tracking object to the input image area exceeds a preset ratio;

if yes, controlling the unmanned aerial vehicle to fly according to a preset inclination angle;

if not, controlling the unmanned aerial vehicle to keep flying in the original state.

In another aspect, there is provided herein a micro unmanned aerial vehicle target recognition and tracking control device based on computer vision, the device comprising:

the video stream acquisition module is used for acquiring video streams acquired by the unmanned aerial vehicle in real time;

the processing module is used for extracting video frame images in the video stream and processing each video frame image to obtain an input image;

the feature extraction module is used for inputting the input image into a pre-trained YOLOv5 neural network model so as to obtain target boundary box information in the input image, wherein the target boundary box information comprises a target type and a confidence level;

the matching module is used for matching the target in the current video frame image with the tracking target in the previous video frame image according to the target boundary box information and the tracking target determined by the user in advance, and determining tracking track updating data of the unmanned aerial vehicle;

And the control module is used for carrying out tracking control on the unmanned aerial vehicle according to the tracking track updating data of the unmanned aerial vehicle.

Finally, there is also provided herein a drone control system, the system comprising:

unmanned plane;

the control terminal is in communication connection with the unmanned aerial vehicle and is used for controlling the unmanned aerial vehicle to fly according to the method.

By adopting the technical scheme, the miniature unmanned aerial vehicle target identification and tracking control method based on computer vision is disclosed, and video streams acquired by the unmanned aerial vehicle in real time are acquired; extracting video frame images in the video stream, and processing each video frame image to obtain an input image; inputting the input image into a pre-trained YOLOv5 neural network model to obtain target bounding box information in the input image, wherein the target bounding box information comprises a target type and a confidence level; according to the target boundary box information and the tracking target determined by the user in advance, matching the target in the current video frame image with the tracking target in the previous video frame image, and determining tracking track updating data of the unmanned aerial vehicle; according to the tracking track updating data of the unmanned aerial vehicle, the unmanned aerial vehicle is tracked and controlled, and the unmanned aerial vehicle can accurately and efficiently identify and track the selected target.

The foregoing and other objects, features and advantages will be apparent from the following more particular description of preferred embodiments, as illustrated in the accompanying drawings.

Drawings

In order to more clearly illustrate the embodiments herein or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described below, it being obvious that the drawings in the following description are only some embodiments herein and that other drawings may be obtained according to these drawings without inventive effort to a person skilled in the art.

Fig. 1 illustrates a frame diagram of a drone control system provided by embodiments herein;

fig. 2 illustrates a schematic step diagram of a method for identifying and tracking a target of a micro unmanned aerial vehicle based on computer vision provided in an embodiment herein;

FIG. 3 shows an F1 confidence interval curve in embodiments herein;

FIG. 4 illustrates a state diagram of top plane drone target tracking in embodiments herein;

FIG. 5 is a diagram illustrating a mapping relationship between a camera coordinate system and a physical coordinate system in an embodiment herein;

FIG. 6 illustrates a specific example workflow diagram provided by embodiments herein;

FIG. 7 illustrates image recognition and multi-target tracking in embodiments herein;

FIG. 8 illustrates a continuous tracking and trace control flow diagram for processing successive frames to achieve a target in embodiments herein;

fig. 9 illustrates a control overall framework in embodiments herein:

FIG. 10 illustrates a diagram of results set implemented in a specific example in an embodiment herein;

fig. 11 illustrates a schematic structural diagram of a micro unmanned aerial vehicle target recognition and tracking control device based on computer vision provided in an embodiment herein;

fig. 12 shows a schematic structural diagram of a computer device provided in embodiments herein.

Description of the drawings:

101. unmanned aerial vehicle, 102, control terminal; 1101. a video stream acquisition module; 1102. a processing module; 1103. a feature extraction module; 1104. a matching module; 1105. and a control module.

Detailed Description

The following description of the embodiments of the present disclosure will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all embodiments of the disclosure. All other embodiments, based on the embodiments herein, which a person of ordinary skill in the art would obtain without undue burden, are within the scope of protection herein.

It should be noted that the terms "first," "second," and the like in the description and claims herein and in the foregoing figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments described herein may be capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, apparatus, article, or device that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed or inherent to such process, method, article, or device.

The unmanned aerial vehicle tracking technology based on computer vision is a hotspot of current research, and mainly aims to accurately find information such as the position and the motion trail of an interested target in a video sequence, and the application of the target tracking technology to the unmanned aerial vehicle is helpful for improving the intelligent level of the unmanned aerial vehicle. In practical tracking applications, the target area of interest is often affected by some environmental factors, for example, in a complex environment where a GPS signal fails or communication is refused, so that an algorithm calculation result is inaccurate, a target cannot be tracked stably, and finally the target is lost.

In order to solve the foregoing problem, the embodiment of the present disclosure provides an unmanned aerial vehicle control system, as shown in fig. 1, where the system includes an unmanned aerial vehicle 101 and a control terminal 102, where the unmanned aerial vehicle 101 is in wireless communication connection with the control terminal 102, where a camera or other acquisition device for acquiring pictures or videos is provided on the unmanned aerial vehicle 101, further, an acquisition direction of the acquisition device is a forward direction of an unmanned aerial vehicle head, the unmanned aerial vehicle 101 sends acquired pictures or video data to the control terminal 102, the control terminal 102 is controlled by an operator of the unmanned aerial vehicle, for example, may be a control handle, an unmanned aerial vehicle platform, or a control platform at a back end, and the control platform may be a background server, that is, the unmanned aerial vehicle 101 directly uploads the acquired data to the server, and the server may perform analysis of the acquired data and generate a control flight command related to the unmanned aerial vehicle so as to implement tracking flight of a tracking object, and specifically, the control terminal 102 may perform the following steps: acquiring a video stream acquired by an unmanned aerial vehicle in real time; extracting video frame images in the video stream, and processing each video frame image to obtain an input image; inputting the input image into a pre-trained YOLOv5 neural network model to obtain target bounding box information in the input image, wherein the target bounding box information comprises a target type and a confidence level; according to the target boundary box information and the tracking target determined by the user in advance, matching the target in the current video frame image with the tracking target in the previous video frame image, and determining tracking track updating data of the unmanned aerial vehicle; and carrying out tracking control on the unmanned aerial vehicle according to the tracking track updating data of the unmanned aerial vehicle. The problems of intelligent target detection and identification, autonomous target tracking control, automatic follow shooting, image feedback and the like under the condition of GPS signal failure or communication rejection complex environment are solved by utilizing a computer vision technology and an unmanned aerial vehicle platform, so that the micro unmanned aerial vehicle can accurately and efficiently identify, track and strike the selected target. The method does not need to additionally increase sensing equipment, is simple and convenient, has small calculated amount, is easy to maintain, and has low manufacturing cost and good practicability.

Based on the system provided by the above, the embodiment provides a miniature unmanned aerial vehicle target identification and tracking control method based on computer vision, which can improve the accurate and efficient identification and tracking of the unmanned aerial vehicle on the selected target. Fig. 2 is a schematic diagram of the steps of a method for computer vision-based micro-drone target recognition and tracking control provided in the embodiments herein, which provides the method steps of operation as described in the examples or flowcharts, but may include more or fewer steps of operation based on conventional or non-inventive labor. The order of steps recited in the embodiments is merely one way of performing the order of steps and does not represent a unique order of execution. When a system or apparatus product in practice is executed, it may be executed sequentially or in parallel according to the method shown in the embodiments or the drawings. As shown in fig. 2, the method may include:

s201: acquiring a video stream acquired by an unmanned aerial vehicle in real time;

s202: extracting video frame images in the video stream, and processing each video frame image to obtain an input image;

s203: inputting the input image into a pre-trained YOLOv5 neural network model to obtain target bounding box information in the input image, wherein the target bounding box information comprises a target type and a confidence level;

S204: according to the target boundary box information and the tracking target determined by the user in advance, matching the target in the current video frame image with the tracking target in the previous video frame image, and determining tracking track updating data of the unmanned aerial vehicle;

s205: and carrying out tracking control on the unmanned aerial vehicle according to the tracking track updating data of the unmanned aerial vehicle.

It may be understood that the execution main body of the embodiment of the present disclosure may be the control terminal, where the control terminal stores a YOLOv5 neural network model that is trained in advance, and is configured to perform feature extraction on a video stream received from an unmanned aerial vehicle, and then perform target matching on the extracted feature to determine a tracking object and a track of the unmanned aerial vehicle, so as to generate a control instruction of the unmanned aerial vehicle, and send the control instruction to the unmanned aerial vehicle, so as to implement tracking control of the unmanned aerial vehicle.

In this embodiment of the present disclosure, the unmanned aerial vehicle may be a mini multi-rotor unmanned aerial vehicle.

In the field of unmanned aerial vehicle automatic tracking targets, a target detection technology is an indispensable ring. The target detection can enable the unmanned aerial vehicle to realize autonomous flight by identifying and tracking the target object, so that the application effect of the unmanned aerial vehicle in the fields of military, civil and the like is enhanced. YOLOv5, as an efficient and accurate target detection algorithm based on deep learning, has been widely used in the scene of unmanned aerial vehicle automatic tracking targets. In this process, the drone processes and recognizes images or videos by capturing them and transmitting them to the computer terminal. In the identification process, YOLOv5 can rapidly locate and classify the target in the image, so that the unmanned aerial vehicle is guided to accurately track the target.

In this embodiment of the present disclosure, obtaining a video stream collected in real time by a drone includes:

establishing a data communication protocol with the unmanned aerial vehicle;

The control terminal is connected with the built-in wifi of the unmanned aerial vehicle, so that connection with the micro unmanned aerial vehicle is successful, and the real-time video stream is acquired by using a UDP data transmission protocol. The micro unmanned aerial vehicle divides the real-time video stream into a plurality of data packets, transmits the data packets to devices connected to the micro unmanned aerial vehicle through a WiFi network by using a UDP protocol, extracts video frames in each data packet, reconstructs the video stream, and finally returns to a generator object. Here, each time the iteration generator, it will acquire the latest video frame from tele. Finally, the video stream data of each frame is obtained by continuously calling the generator object in a circulating way, and further processing or displaying is carried out.

It should be noted that, preprocessing the obtained video frame, so that the obtained input image is adapted to the input requirement of the subsequent model, and the video frame needs to be adjusted to parameters such as step length, input picture size, deep learning frame and the like adapted to the model.

In the embodiment of the present specification, the pre-trained YOLOv5 neural network model is obtained by training the following steps:

It may be understood that the object in the training dataset is consistent with the tracked object type of the unmanned aerial vehicle to improve the prediction accuracy and reliability of the training model, for example, the tracked object type is human, and the tracked object type in the training dataset may also be human; the tracking object type is animal, and the object type in the training set can also be animal; the tracking object type is a moving tool, such as an automobile, an electric vehicle, a bicycle, etc., and the object type in the training set can also work for moving.

On the basis of constructing a target detection model by using a Yolov5 neural network architecture, a specific data set is adopted to construct a pedestrian target detection model with multi-feature fusion: for example, 803 pieces of pedestrian data are provided, wherein the target features in the training images are highly diversified (the body types of pedestrians, etc.), and the images and corresponding annotation frame information are included as a training data set. Compared with the coco128 data set of yolov5, the method has the advantages that the number of samples and the pertinence of the identification target are increased, and therefore the accuracy and the stability of the identification of the subsequent process are improved.

The specific training process is as follows:

1. collecting different types of pedestrian data sets, marking each image to obtain a corresponding tag file, and forming a tag file set by all the tag files; the tag file comprises the category and the target frame coordinates of the target in the unmanned aerial vehicle inspection image.

2. And processing the tag file set, converting each tag file into a text file, normalizing the coordinates of the target frame in the text file to obtain normalized frame coordinates, and forming a normalized coordinate set by all the normalized frame coordinates.

3. And training the model by using the training set, and optimizing parameters of the model. The trained model is evaluated using the validation set, and the metrics of the model, such as accuracy (Precision), recall (Recall), mAP, F1 score, etc., are calculated. Thereby obtaining a predictive model. The final model F1 score reached 0.93 at a confidence level of 0.365, as shown in fig. 3.

And (3) injection: the F1 score (F1-score) is a measure of the classification problem. Some machine learning contests of multi-classification problems often use F1-score as the final assessment method. It is the harmonic mean of the precision and recall, with a maximum of 1 and a minimum of 0.

For a certain class, a judgment index of Precision and Recall is combined, the value of F1-Score is from 0 to 1, 1 is the best, and 0 is the worst.

The specific formula is as follows:

in the embodiment of the specification, the YOLOv5 neural network model comprises a backbone network, a detection head network, a prediction layer and an anchor frame;

In a further embodiment, the backbone network is provided with a feature pyramid network and a path aggregation network;

It will be appreciated that the YOLOv5 neural network model uses a number of network structures, including anchor boxes (anchors), backbone (backbone) networks, and head of detection (head) networks for target detection, and the workflow of the YOLOv5 neural network model is as follows:

1. first, the input image is subjected to preprocessing steps, such as resizing, normalization, etc., to accommodate the input requirements of the network.

2. The input image may go through a CSPDarknet53 network, which is the backbone network in YOLOv5, where the CSPDarknet53 is made up of a series of convolution layers, residual blocks, and downsampling layers, gradually extracting the low-to-high-level features of the image, including details and semantic information of the image.

3. At different levels of CSPDarknet53, a Feature Pyramid Network (FPN) was introduced to obtain a multi-scale feature pyramid. The FPN fuses the features of different levels through transverse connection and up-sampling operation to acquire rich semantic information. This enables the network to handle targets of different scales simultaneously, improving the effect of target detection.

Yolov5 also introduced a path aggregation network (PANet) to further refine the feature pyramid network. The PANet is connected with feature pyramids of different levels in a cascading way, and feature information of a shallow layer and feature information of a deep layer are fused, so that a small-size target can be better detected. This improves the perceptibility of the target detection model to various target scales.

5. Based on the "backbone" network architecture, YOLOv5 also adds a detection head and a prediction layer. The detection head is responsible for extracting the position, category and confidence information of the target from the feature pyramid. The prediction layer generates a bounding box and a confidence score of the target according to the output of the detection head.

6. The output of the model is subjected to post-processing steps, such as non-maximum suppression (NMS), to filter and merge overlapping bounding boxes, and rank the targets according to confidence. A series of feature maps of different scales are thus obtained.

7. Next, for each feature map, target prediction is performed using an anchor frame. And obtaining the position and class probability of the target by carrying out class and boundary frame regression prediction on each anchor frame. Under the original yolov5 structure, the following parameters are added for each image frame appearing in the detection result:

target id number:

id≥0

The order in which the id objects enter the video stream.

Center point coordinates:

C＝(Cx,Cy)

wherein Cx, cy are the abscissa and ordinate of the center point of the identification target image frame, respectively. A calibration orientation is provided for kalman filter tracking.

8. Finally, combining and screening the prediction results of different scales, removing overlapped bounding boxes by using non-maximum suppression (NMS), and sequencing targets according to the confidence level.

YOLOv5 can efficiently detect the position and type of an object in an image. By using the combination of the anchor frame, the backhaul network and the head network structure, the efficient target detection capability is realized, and the method is suitable for scenes with different scales and complexity.

In this embodiment of the present disclosure, the determining, according to the target bounding box information and the tracking target determined by the user in advance, the tracking track update data of the unmanned aerial vehicle by matching the target in the current video frame image with the tracking target in the previous video frame image includes:

It can be understood that the YOLOv5 detection algorithm uses a neural network model to process the input image and generate a detection result containing information such as the target bounding box, class, and confidence. These detection results are typically used to identify the target object in the image. In order to achieve tracking of the target object, the motion state variable of the target object can be predicted and updated through a Kalman filtering algorithm, and the target object can be matched through a Hungary algorithm to determine a subsequent tracking track, wherein the tracking track can be a tracking path determined according to the tracking object, and the tracking path represents the control process of the unmanned aerial vehicle, such as change of a pitch angle, change of a height and the like.

In this specification, the kalman filter algorithm itself does not directly use image data, but uses target bounding box information provided by YOLOv5 as a measurement input. Since the Kalman filtering algorithm is a mathematical filtering algorithm based on a state space model, the Kalman filtering algorithm is mainly used for predicting and updating state variables, and does not directly process image data. The target detection and Kalman filtering algorithm is used cooperatively, and tracking and prediction of the target are realized by transmitting the information of the target boundary box.

The Kalman filtering algorithm uses target boundary box information provided by YOLOv5 as measurement input, predicts and updates the position of a target by combining a dynamic model of the system, and can provide smooth and accurate target position estimation by fusing detection results and estimation of the dynamic model.

Illustratively, the following describes a Kalman filter based position estimation:

1. and constructing state variables, a process model and an observation model of the system.

State variables:

wherein v is _k To be measured three-dimensional velocity, z _k For the altitude value in the z-axis direction of the aircraft, ba is the three-dimensional acceleration offset.

Process model:

x _k ＝Ax _k-1 +u _k-1 +w _k

wherein,for the system transfer matrix>For control input +.>For system noise, the uncertainty of the system model is characterized, and they have the following expression:

wherein a is _x ,a _y ,a _z Accelerometer readings, respectively. Noise w _k Assuming Gaussian white noise, the noise variance matrix isIs a diagonal array.

Observation model:

z _k ＝Hx _k +v _k

wherein, observed quantityThe method comprises the step three of obtaining the horizontal speed through visual information and measuring the height by a height sensor. />For observing the transfer matrix>To observe noise, characterize the uncertainty of the observed quantity, let v be _k Is Gaussian white noise, and the noise variance matrix is +. >Their expressions are as follows:

2. filter initialization

The initial value of the order state is:

x ₀ ＝[v _c d _sonar cosθcosφ 0 _3×1 ] ^T

wherein v is _c ＝[v _x v _y v _z ] ^T Is of the formulaThe initial visual speed given in (1), the altitude initial value given by the altitude sensor, wherein d _sonar For the altitude sensor reading, the initial acceleration offset value is set to zero.

The initial value of the order state estimation error covariance is a diagonal matrix:

let k=0 and,P _0|0 ＝P ₀ 。

3. state one-step prediction

4. Error covariance one-step prediction

P _k|k-1 ＝AP _k-1|k-1 A ^T +Q _k-1

5. Kalman filter gain update

K _k ＝P _k|k-1 H ^T (HP _k|k-1 H ^T +R _k ) ^-1

6. State update correction

7. Error covariance update correction

P _k|k ＝(I ₇ -K _k H)P _k|k-1

8. k=k+1, and the operation is continued in step 3.

Thus, the central coordinates of the object and the state variables such as speed are obtained.

And then, using a Hungary algorithm, wherein the Hungary algorithm can realize the association of a tracking target and a detection result in a target detection task, and the detection result in the current frame is matched with the tracking target in the previous frame through the principle of maximum weight matching, so that continuous tracking and track maintenance of the target are realized.

After tracking and matching the target through Kalman filtering and Hungary algorithm, the center point and track information of the target are obtained, and the information can be used as input quantity of visual servo for controlling the movement of the robot or unmanned aerial vehicle.

In this embodiment of the present disclosure, performing tracking control on the unmanned aerial vehicle according to tracking track update data of the unmanned aerial vehicle includes:

if the area of the target frame exceeds a first threshold, calculating the motion control quantity of the unmanned aerial vehicle, wherein the motion control quantity is a flight control instruction of the unmanned aerial vehicle, for example, the unmanned aerial vehicle can fly according to the current state;

and controlling the unmanned aerial vehicle to carry out tracking control according to the motion control quantity of the unmanned aerial vehicle, and judging whether a target frame of the tracked object is close to the edge of the input image or not in real time. Judging whether the tracked object is close to the edge of the image or not is equivalent to judging whether the unmanned aerial vehicle is close to the tracked object or not, and because the acquisition visual field of the camera is also enlarged along with the increase of the distance, when the distance and the acquisition angle between the tracked object and the unmanned aerial vehicle are proper, the target frame of the tracked object is also proper at the position of the input image, and the target frame of the tracked object is close to the edge area of the input image as a proper condition;

If the target frame of the tracked object is not close to the edge of the input image, adjusting the pitching angle of the unmanned aerial vehicle according to the target frame area of the tracked object so as to adjust the distance and the acquisition angle between the unmanned aerial vehicle and the tracked object;

In order to realize accurate control of the unmanned aerial vehicle, the association between the three-dimensional camera coordinate system and the two-dimensional image coordinate system of the unmanned aerial vehicle is also required to be established, so that the unmanned aerial vehicle can be controlled based on the acquired images and corresponding control instructions, and the association between the three-dimensional camera coordinate system and the two-dimensional image coordinate system of the unmanned aerial vehicle is established through the following steps:

1. Firstly, the following visual servo model of the multi-rotor unmanned aerial vehicle is established:

1) Multi-rotor unmanned aerial vehicle flight control rigid body model

For simplicity, in modeling a multi-rotor unmanned aerial vehicle, it is assumed that the multi-rotor unmanned aerial vehicle is a rigid body, the multi-rotor is only subject to gravity and screw tension, and the mass and moment of inertia of the multi-rotor are unchanged. The geometric center of the multiple rotors is consistent with the gravity center. Then, the multi-rotor flight control rigid body model euler angle represents:

where e is the basis vector of the ground coordinate system, m is the mass of the unmanned aerial vehicle,for the barycentric coordinates of multiple rotors>Representing the speed of multiple rotors>Indicating the magnitude of the total tension of the propeller, +.>Acceleration of gravity, ++>Indicating gyroscopic moment ++>Representing the moment generated by the propeller on the machine body axis, < >>Representing the moment of inertia of a multi-rotor +.> Indicating the angular velocity of the body, wherein, the ∈h indicates the attitude angle, and W is the conversion matrix from the rotational angular velocity of the body to the attitude change rate; />Representing a rotation matrix from the body coordinate system to the world coordinate system.

In a multi-rotor unmanned aerial vehicle flight control model, an earth coordinate system and an organism coordinate system are simultaneously involved. On the one hand, it is desirable to represent the position and speed of the multiple rotors in the earth coordinate system, which facilitates the flight control hands to better determine the flight position and speed, and which is consistent with GPS measurements; on the other hand, the representation of the tension and moment is very intuitive in the body coordinate system, and the measurement of the sensor is also represented in the body coordinate system. The existence of the two coordinate systems facilitates calculation and application in different scenes. The remarkable characteristics of the multi-rotor flight control rigid body model are shown in Meaning that the direction is always the same as O _b z _b The axial negative directions are consistent.

The multi-rotor flight rigid body control is expressed by mathematics, so that the design, the writing and the realization of a multi-rotor control algorithm are facilitated.

2) Visual imaging model

As shown in fig. 4, a state diagram of target tracking of a plane-looking unmanned aerial vehicle is shown, wherein gray blocks are targets, om is the center of gravity of the multi-rotor unmanned aerial vehicle, O _c Is the center of gravity of the camera, d is the distance between the two, eta is the angle of view and a point p is in space _e The imaging position in the image plane can be approximated by a pinhole imaging model, i.e. point p _e Projection position p in image plane _c Is the optical center O and the space point p _e An intersection of the line of (c) with the image plane. Thus the worldP in coordinate system _e Point coordinates (x) _e ,y _e ,z _e ) ^T And projection point p _c Pixel coordinates (u, v) ^T The relationship between these can be described as follows:

wherein alpha is _x As scale factor on the u-axis, alpha _y Is the scale factor on the v axis, and(f is focal length, dx and dy are pixel sizes in the u-axis and v-axis directions, respectively); (u) ₀ ,v ₀ ) Is the intersection point of the camera optical axis and the image plane, called principal point coordinates. Alpha _x ,α _y ,u _0, v ₀ Only the camera internal parameters are relevant, called camera internal parameters. />The rotation matrix and translation vector of the camera coordinate system and the world coordinate system are called external parameters of the video camera.

The visual imaging principle clarifies the shooting mechanism of a camera carried by the aircraft, so that three-dimensional data = mapping acquired by the camera carried by the aircraft to a two-dimensional image plane is the basis for establishing a visual servo model.

3) Visual servo model

Visual servoing refers to controlling the movement of a robot using computer visual information. All vision-based servo schemes aim at reducing the error e (t ()).

Camera speed and speedIs the relation of:

and L is _e ＝L _s ，L _s ∈R ^k*6 Called the Accord matrix, v _c = (ω, v) representsInstantaneous line speed of camera, will +.>As input, if the Accord matrix is known, the output v can be obtained _c Solving a Accord ratio matrix:

wherein, the 3-D point coordinate under the camera coordinate system is (x) _e ,y _e ,z _e ) ^T The coordinates corresponding to the 2-D image plane are p= (x, y). The visual servo aims at reducing the difference between the current image coordinates of the target and the expected image coordinates of the target, has the meaning of enabling the robot to track the target more accurately through the Accord ratio matrix, and is a core module in the multi-rotor visual servo model.

4) Multi-rotor vision servo model

Based on the rigid body model and the visual imaging principle of the multi-rotor unmanned aerial vehicle, the following multi-rotor visual servo model is designed by utilizing a visual servo method.

In the longitudinal channel, define x= [ e ] _y v _y v _z θ] ^T ，u＝[f ω _x ] ^T Can obtain

y(t)＝[1 0 0 0]x(t)

Wherein,is the velocity under the camera system after vertical decomposition, x= [ e _y v _y v _z θ] ^T Is the component velocity in the y-direction, x= [ e _y v _y v _z θ] ^T Is the component velocity in the z direction, u= [ fω ] _x ] ^T Is the pulling force, u= [ fω ] _x ] ^T Is the angular velocity of the component in the x direction.

On the lateral channel, a multi-rotor lateral model is transformed to a camera coordinate system:

x＝[e _x v _x v _z ψ] ^T ，

u＝[ω _y ω _z ] ^T

y(t)＝[1 0 0 0]x(t)

wherein x= [ e ] _x v _x v _z ψ] ^T Is the velocity under the camera system after horizontal decomposition, x= [ e _x v _x v _z ψ] ^T Is the component velocity in the x direction, x= [ e _x v _x v _z ψ] ^T Is the component velocity in the z direction, u= [ fω ] _x ] ^T Is the pulling force, u= [ omega ] _y ω _z ] ^T Is the angular velocity in the z direction. x (t) represents a time-dependent expression of coordinates.

Establishing a multi-rotor nonlinear flight control rigid body model as a basis of multi-rotor control; then taking a pinhole model as an example, introducing mathematical expression of visual imaging; finally, introducing the concept and the formula expression of visual servo, deducing the Accord ratio matrix, and establishing the association between the three-dimensional camera coordinate system and the two-dimensional image coordinate system.

Fig. 5 is a schematic diagram of a mapping relationship between a camera coordinate system and a physical coordinate system in the embodiment herein. O represents the optical center of the camera, O _I Representing the origin of the image coordinate system. (u, v) represents coordinate axes of an image coordinate system, (X) _c ,Y _c ,Z _c ) Representing the coordinate axes of the camera coordinate system. P is p _e (x _e ,y _e ,z _e ) Representing coordinates of a three-dimensional point in a camera coordinate system, p _c (u, v) represents a point p _e (x _e ,y _e ,z _e ) Projection onto an image.

Illustratively, on the basis of the model establishment, the method provided by the embodiment of the present specification may include the following steps:

1. a series of libraries required for installer operation, including openCV, av, and tellopy, etc.

2. And controlling the tello unmanned aerial vehicle by using a telopy library, and processing the picture data in the video stream by using opencv.

3. And connecting the unmanned aerial vehicle, and acquiring a video stream, wherein the video stream attempts 3 times of connection until success.

4. The drone takes off and subscribes to relevant travel log data.

5. A picture of the data is taken and the dimensions (h, w) of the picture, i.e. the height and width, are taken.

6. By the ratio of the target width and height to the image size, it is determined whether the target is large enough to follow. If the target is large enough, the yaw angle is converted to the angle that the drone needs to rotate using the yaw angle coefficient and the servo center point, and the altitude error is converted to the altitude that the drone needs to change using the altitude control coefficient and the servo center point.

7. And judging whether the object needs to advance or retreat according to the distance between the object and the image boundary. If the target is too far, flying the drone forward to approach the target; if the targets are too close, the drone is flown backwards to maintain distance.

8. And judging whether the unmanned aerial vehicle needs to ascend or descend according to the proportion of the target area and the image size, adjusting the pitching angle, and finally realizing the tracking of the target.

The embodiment of the specification provides a target recognition and tracking control method based on a computer vision unmanned aerial vehicle, which does not need to additionally increase sensing equipment, and has the advantages of small algorithm calculation amount, high algorithm robustness and the like. The method is simple and convenient, easy to maintain, low in manufacturing cost and good in practicability.

The embodiment of the specification provides a target recognition and tracking control method of a micro unmanned aerial vehicle based on computer vision, and a real experiment is carried out by using a tele unmanned aerial vehicle of Xinjiang, as shown in fig. 6, and a specific workflow diagram in the experiment is shown.

Firstly, performing visual detection of an unmanned aerial vehicle, wherein the flow is as follows;

1. the downloaded code is linked according to the following open source code: https:// github. On the basis of constructing a target detection model by using the YOLOv5 neural network architecture, training the data set according to the detailed process in the step one by adopting the data set.

2. And calling a YOLOv5 model in Python to perform target detection, turning on a tello lateral switch, connecting a tello built-in wifi, waiting for a yellow lamp to flash, and ensuring successful connection with the micro unmanned aerial vehicle. And acquiring the real-time video stream by using the UDP data transmission protocol, and carrying out iterative updating. Each time an iteration generator, it gets the latest video frame from tello. Finally, the video stream data of each frame is acquired by looping through the generator object.

3. For each frame of image acquired by the tello camera, preprocessing is required to facilitate model detection. Preprocessing includes operations of resizing, normalizing, scaling, cropping, etc. the image to accommodate the input requirements of the model.

4. Target detection was performed using the YOLOv5 model. Because the coordinates of the center point and the target id of the tello strike need to be determined in the subsequent strike process, the identified targets need to be numbered and assigned, and a GUI interaction interface is written, so that the unmanned aerial vehicle tracking target can be conveniently selected by user operation. The process of obtaining the coordinates of the center point comprises the following steps: YOLOv5 in open source code can obtain the following information: the class, location (upper left, lower right, width, height) and confidence of the target object. The coordinates of the center point can be obtained by simple algebraic operation according to the position information, and are marked on the interface, as shown in fig. 7, which is an image recognition and multi-target tracking schematic diagram in the embodiment herein.

5. The idea of obtaining id number is: one number that will be obtained during the multi-object tracking process, and the id numbers are given in the order of the identified objects. This id will not change provided that the target is always within the field of view and can be matched all the time; if a new object is identified again after the object is lost or a new object is identified, an id is given again. The value of id will increase in sequence with the identification of new targets and will not repeat.

In the running process, the computer outputs a detection frame, wherein the information contained in the detection frame comprises the following steps: the category, confidence, and center point of the object, the id values of different objects. The id number required to track the hit is input in the control panel to realize the target selection. The results of the target bounding box, class, confidence, etc. are typically used to identify the target object in the image. These detection results are then passed as inputs to a kalman filter algorithm.

6. And secondly, predicting the position of the tracking target in the current frame through a Kalman filtering algorithm in the third step, and generating predicted target position information. These predicted position information are composed of state variables such as the center coordinates of the target and the speed. And predicting and updating the position of the target according to the motion model and the observation data of the target by using a Kalman filtering algorithm. And then, two groups of target position information, one group is the predicted target position output by the Kalman filtering algorithm, the other group is the target position detected in the current frame output by the target detection algorithm, and the detection result in the current frame is matched with the tracking target in the previous frame as the input of the Hungary algorithm. The concrete implementation flow of the Hungary algorithm in the method is as follows:

1. Initializing a tracker: during the initialization process, parameters such as the number of trackers, the maximum tracking frame number, the distance threshold, etc., and state information (such as position, skip frame number, etc.) of each tracker are defined.

2. For the current target detection result, calculating a matching matrix: and calculating the distance between the detected target and all the current trackers, transmitting the distance matrix to a Hungary algorithm, and solving the matching relation between the tracker corresponding to the minimum distance and the target.

3. Updating the tracker state according to the matching result: if a target is not matched to any tracker, indicating that it is a new target, a new tracker is needed; if a tracker is not matched to any target, indicating that the target has left the field of view, the tracker needs to be deleted; if the distance of the match is greater than the distance threshold, it is also deleted, indicating that the distance is too far to match.

4. Depending on the number of unmatched objects, trackers are newly built or these objects are ignored.

7. And tracking and matching the target through Kalman filtering and Hungary algorithm to obtain the position and track information of the target, wherein the information can be used as the input quantity of visual servo.

Processing successive frames to achieve continuous tracking and tracking control of the target, as shown in fig. 8 and 9, the flow is as follows:

1. a frame of an image in a video stream is acquired, as well as the height and width of the image.

2. The target object in the frame image is detected using the YOLOv5 model, and the abscissa and ordinate of the center coordinates of the target object, and the width and height of the target object are obtained through the above-described series of operations.

3. And instantiating the object, and assigning the transmitted Tello object to the variable for communication and control with the unmanned aerial vehicle. Initializing variables: the control coefficient of the yaw angle is set to 0.001; the control coefficient of the thrust value is set to be-0.005; the control value of the pitch angle is set to 0.45; setting the ratio of the minimum area of the target detection result to the total area of the image to be 0; setting the ratio of the maximum area of the target detection result to the total area of the image to be 1; the proportion of the center position of the steering engine on the X axis is set to be 0.4; the proportion of the central position of the steering engine on the Y axis is set to 0.5; a range ratio for checking whether the object is located near the image boundary is set to 0.1; a threshold value for judging whether the target is located near the center position of the Y axis is set to 30; the adjustment factor of the thrust value in the downward flight is set to 0.5; the adjustment factor of the thrust value at the time of upward flight is set to 0.5.

4. And judging whether the unmanned aerial vehicle needs to be subjected to motion control according to the size of the target object. If the area of the target object exceeds the threshold value, performing motion control; otherwise, the control is not performed.

5. If control is needed, the control amounts of yaw movement and lifting movement of the unmanned aerial vehicle are calculated.

6. And judging whether the target object is in the edge area of the picture, if so, setting the pitching angle of the unmanned aerial vehicle to 0, namely keeping the pitching angle horizontal. Otherwise, the pitching angle of the unmanned aerial vehicle is adjusted according to the area of the target object, so that the unmanned aerial vehicle approaches or departs from the target object. Specifically, if the area of the target object is smaller than the threshold value, the pitch angle is set to the value initialized before, otherwise, the pitch angle is set to 0.

7. If the target tracking fails, the drone is simply motion controlled to re-search for the target. Specifically, the unmanned aerial vehicle is rotated to the left by a certain angle, the height of the unmanned aerial vehicle is kept unchanged, and the pitching angle of the unmanned aerial vehicle is kept unchanged.

8. And repeating the steps to realize continuous tracking and tracking control of the target.

9. Finally, tracking of the tello unmanned aerial vehicle is achieved through the steps. The results of the implementation are shown in fig. 10.

On the basis of the method provided above, the embodiment of the present disclosure further provides a micro unmanned aerial vehicle target recognition and tracking control device based on computer vision, as shown in fig. 11, where the device includes:

the video stream acquisition module 1101 is configured to acquire a video stream acquired by the unmanned aerial vehicle in real time;

a processing module 1102, configured to extract video frame images in the video stream, and process each video frame image to obtain an input image;

the feature extraction module 1103 is configured to input the input image into a pre-trained YOLOv5 neural network model, so as to obtain target bounding box information in the input image, where the target bounding box information includes a target type and a confidence level;

the matching module 1104 is configured to match a target in a current video frame image with a tracking target in a previous video frame image according to the target bounding box information and a tracking target determined in advance by a user, and determine tracking track update data of the unmanned aerial vehicle;

and the control module 1105 is configured to perform tracking control on the unmanned aerial vehicle according to the tracking track update data of the unmanned aerial vehicle.

The beneficial effects obtained by the device are consistent with those obtained by the method, and the embodiments of the present disclosure are not repeated.

The present embodiment provides a computer device, the internal structure of which can be shown in fig. 12. The computer device includes a processor, a memory, and a network interface connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a method for identifying a driving surface covering of a computer device.

It will be appreciated by those skilled in the art that the structure shown in FIG. 12 is merely a block diagram of some of the structures associated with the present inventive arrangements and is not limiting of the computer device to which the present inventive arrangements may be applied, and that a particular computer device may include more or fewer components than shown, or may combine some of the components, or have a different arrangement of components.

In one embodiment, a computer device is provided, comprising a memory and a processor, the memory having stored therein a computer program, the processor implementing the steps of the method embodiments described above when the computer program is executed.

In one embodiment, a computer-readable storage medium is provided, on which a computer program is stored which, when executed by a processor, carries out the steps of the method embodiments described above.

In an embodiment, a computer program product is provided, comprising a computer program which, when executed by a processor, implements the steps of the method embodiments described above.

Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, database, or other medium used in embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, high density embedded nonvolatile Memory, resistive random access Memory (ReRAM), magnetic random access Memory (Magnetoresistive Random Access Memory, MRAM), ferroelectric Memory (Ferroelectric Random Access Memory, FRAM), phase change Memory (Phase Change Memory, PCM), graphene Memory, and the like. Volatile memory can include random access memory (Random Access Memory, RAM) or external cache memory, and the like. By way of illustration, and not limitation, RAM can be in the form of a variety of forms, such as static random access memory (Static Random Access Memory, SRAM) or dynamic random access memory (Dynamic Random Access Memory, DRAM), and the like. The databases referred to in the embodiments provided herein may include at least one of a relational database and a non-relational database. The non-relational database may include, but is not limited to, a blockchain-based distributed database, and the like. The processor referred to in the embodiments provided in the present application may be a general-purpose processor, a central processing unit, a graphics processor, a digital signal processor, a programmable logic unit, a data processing logic unit based on quantum computing, or the like, but is not limited thereto.

It should also be understood that in embodiments herein, the term "and/or" is merely one relationship that describes an associated object, meaning that three relationships may exist. For example, a and/or B may represent: a exists alone, A and B exist together, and B exists alone. In addition, the character "/" herein generally indicates that the front and rear associated objects are an "or" relationship.

Those of ordinary skill in the art will appreciate that the elements and algorithm steps described in connection with the embodiments disclosed herein may be embodied in electronic hardware, in computer software, or in a combination of the two, and that the elements and steps of the examples have been generally described in terms of function in the foregoing description to clearly illustrate the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present disclosure.

It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described systems, apparatuses and units may refer to corresponding procedures in the foregoing method embodiments, and are not repeated herein.

In the several embodiments provided herein, it should be understood that the disclosed systems, devices, and methods may be implemented in other ways. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of the units is merely a logical function division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. In addition, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices, or elements, or may be an electrical, mechanical, or other form of connection.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the elements may be selected according to actual needs to achieve the objectives of the embodiments herein.

Specific examples are set forth herein to illustrate the principles and embodiments herein and are merely illustrative of the methods herein and their core ideas; also, as will be apparent to those of ordinary skill in the art in light of the teachings herein, many variations are possible in the specific embodiments and in the scope of use, and nothing in this specification should be construed as a limitation on the invention.

Claims

1. The miniature unmanned aerial vehicle target recognition and tracking control method based on computer vision is characterized by comprising the following steps of:

acquiring a video stream acquired by an unmanned aerial vehicle in real time;

2. The method of claim 1, wherein acquiring the video stream acquired in real time by the drone comprises:

establishing a data communication protocol with the unmanned aerial vehicle;

3. The method of claim 1, wherein the pre-trained YOLOv5 neural network model is trained by:

4. The method of claim 1, wherein the YOLOv5 neural network model comprises a backbone network, a detection head network, a prediction layer, and an anchor frame;

5. The method of claim 4, wherein a feature pyramid network and a path aggregation network are provided in the backbone network;

6. The method according to claim 1, wherein the determining the tracking trajectory update data of the unmanned aerial vehicle according to the target bounding box information and the tracking target determined in advance by the user, and matching the target in the current video frame image with the tracking target in the previous video frame image, includes:

7. The method of claim 4, wherein tracking control of the drone based on the tracking trajectory update data of the drone comprises:

8. The method of claim 7, wherein adjusting the pitch angle of the drone based on the target frame area of the tracked object comprises:

9. A miniature unmanned aerial vehicle target recognition and tracking control device based on computer vision, the device comprising:

10. A drone control system, the system comprising:

unmanned plane;

a control terminal in communication with the drone for controlling the drone to fly according to the method of any one of claims 1 to 8.