CN114529584A

CN114529584A - Single-target vehicle tracking method based on unmanned aerial vehicle aerial photography

Info

Publication number: CN114529584A
Application number: CN202210156746.6A
Authority: CN
Inventors: 吕艳辉; 郭向坤; 李彬
Original assignee: Shenyang Ligong University
Current assignee: Shenyang Ligong University
Priority date: 2022-02-21
Filing date: 2022-02-21
Publication date: 2022-05-24

Abstract

The invention provides a single-target vehicle tracking method based on unmanned aerial vehicle aerial photography, and relates to the technical field of computer vision. According to the invention, on the basis of the result of target detection, an image matching algorithm based on multi-feature fusion is adopted, the color histogram feature and the HOG feature of the image are fused, and the degree of representation of the image by using only a single feature and the accuracy in the image matching process can be obviously improved. In order to accurately predict the appearance position of the target in the video, a K + + neighborhood search algorithm is designed, so that the method is beneficial to reducing the calculation amount, has higher precision, and can effectively eliminate the interference generated by the appearance of the similar target. In the tracking process, the tracking target is completely shielded, and the single-target tracking is realized by adopting an anti-shielding algorithm based on vehicle motion state estimation. The method can quickly and accurately track a single target of a certain vehicle target in the video shot by the unmanned aerial vehicle, and has good universality and expandability.

Description

Single-target vehicle tracking method based on unmanned aerial vehicle aerial photography

Technical Field

The invention relates to the technical field of computer vision, in particular to a single-target vehicle tracking method based on unmanned aerial vehicle aerial photography.

Background

The target tracking tasks can be divided into two categories: one is multi-target tracking and the other is single-target tracking. Multi-target tracking refers to the task of tracking all targets or all individuals of a class of targets in a video sequence. It involves not only continuous tracking of each target, but also processing of recognition, self-occlusion, and mutual occlusion between different targets, and associating detection results with tracking results. Compared with multi-target tracking, the single-target tracking is more prone to single-target tracking, and because the motion situation of a certain body is usually more concentrated in a video sequence, the development of a single-target tracking technology is hot and rapid. From the traditional processing method for video frames, representative static background, frame difference method, optical flow method, Meanshift and Camshift algorithm belong to the early single-target tracking method, and have the characteristics of more practical applications, high FPS (field programmable gate array), low requirement on equipment computing capacity and the like. Then, a combination of detection and tracking begins to occur. The image features are extracted and a classifier (such as an SVM) is trained based on a machine learning method, so that the trained classifier finds the optimal region in the next frame. One method is based on generative models, and one is discriminant models, collectively referred to as detection tracking. The most representative single-target tracking method at present is a kernel correlation filtering method and an algorithm combined with deep learning on the basis of the kernel correlation filtering method. The precision and the speed of single-target tracking are refreshed again, but because the depth algorithm has large computation amount and high equipment computation requirement, and many tracking algorithms use online fine adjustment, the speed is not ideal, the practical application is limited, and a great development space still exists.

In the single-target tracking algorithm based on deep learning in recent years, the single-target tracking is pushed to a new era by the SimFC based on the twin network proposed in the CVPR2016, the characteristics of the twin network are utilized, after the template and the image are subjected to the same network for extracting the characteristics, the characteristic vectors of the template are subjected to cross correlation on the characteristic vectors of the searched image to obtain a response image, and the position with the maximum response is the position of the target. And almost all subsequent single-target tracking algorithms based on deep learning are proposed based on the algorithm. For example, the SiamMask algorithm adds a simple 1 × 1 convolution kernel with 2 channels on the basis of the cross-correlation, and obtains the output of two branches to perform different task processing; and the other representative siamrPN is based on siamrFC, and the classification and regression of the RPN network in the Faster-RCNN are fused with the siamrFC, so that the tracking precision and speed are improved greatly. The latest Dimp (learning discrete model prediction for tracking) tracking algorithm published by Martindaniel component laboratories is built, and on the basis of SimFC, an online training classifier is added, and the optimization of the classifier is carried out by using frame information before and after a video, so that the model can be adjusted in real time, and the tracking precision is improved.

Although the single target tracking algorithm based on the twin network has a better tracking effect at present, the information obtained by the network is provided by the first frame, and the obtained information amount is actually too small. Therefore, aiming at the problems of low precision and difficult real-time performance meeting of the tracking speed caused by insufficient samples in the current target tracking field,

disclosure of Invention

The invention aims to solve the technical problem of the prior art, provides a single-target vehicle tracking method based on unmanned aerial vehicle aerial photography, can quickly and accurately track a certain vehicle target in a video shot by an unmanned aerial vehicle, and has good universality and expandability.

In order to solve the technical problems, the technical scheme adopted by the invention is as follows:

a single-target vehicle tracking method based on unmanned aerial vehicle aerial photography comprises the following steps:

step 1: loading an unmanned aerial vehicle aerial video needing vehicle tracking, pausing at a first frame, manually framing a target vehicle to be tracked by using a mouse, wherein the framed area is an area to be tracked, and the target vehicle in the area to be tracked is a tracking target; then starting from a second frame of the video, detecting a target vehicle appearing in the video frame by using a target detection algorithm;

step 2: judging whether the target vehicle in the current frame is completely shielded, if not, continuing to the step 3; otherwise, executing step 5;

and step 3: establishing a K + + neighborhood around the tracking frame, screening redundant target detection frames by using the K + + neighborhood, leaving vehicles which possibly are tracking targets in the K + + neighborhood, and calculating IoU offset between the tracking frame and the detection frame and a central point;

and 4, step 4: extracting the targets in the screened detection frame into pictures, matching the pictures with the tracking targets selected by the first frame by using a multi-feature fusion image matching algorithm, respectively calculating the image similarity of the tracking targets and the screened targets and sequencing the images, comprehensively judging which detected target in the current frame is the target to be tracked by combining the calculation result of the step 3, and then updating the tracking frame; executing the step 6;

and 5: when the tracking target is blocked, an anti-blocking algorithm based on vehicle motion state estimation is adopted, after the tracking starts to exceed 20 frames, the average speed of the vehicle moving in the video is recorded every 20 frames, according to the mode, when the target disappears in the visual field, the disappearance coordinates are stored, the moving speed of the target is stopped being recorded, and the moving speed of the previous 20 frames is stored. If the target disappears within 50 frames, normally estimating a moving track and coordinates in the disappearance, simultaneously acquiring a K neighborhood of the current estimated position, recording the position of the target possibly appearing after 50 frames, setting the K neighborhood at the position to wait for capturing the target, if the vehicle is re-detected and captured by the K neighborhood, calling an image matching algorithm in the step 4 to start matching, if the matching is successful, continuing to track, if the matching is not successful, starting full-image matching, removing the recording of the motion speed and the coordinates, automatically searching by a tracker, calling a tracking mode of the steps 3-4, matching the target with the maximum similarity of the initially selected tracking target appearing in the current visual field by using a multi-feature fusion matching algorithm, and re-establishing a K + + neighborhood based on the coordinates of the target; executing the step 6;

step 6: judging whether the video is finished or not; if yes, ending the detection; otherwise, receiving the next frame and returning to the step 2.

Adopt the produced beneficial effect of above-mentioned technical scheme to lie in: the single-target vehicle tracking method based on unmanned aerial vehicle aerial photography provided by the invention adopts an image matching algorithm based on multi-feature fusion, a target prediction K + + neighborhood search algorithm and an anti-occlusion algorithm based on vehicle motion state estimation to realize single-target tracking. The image matching algorithm based on multi-feature fusion fuses the color histogram feature and the HOG feature of the image, and the method can remarkably improve the representation degree of the image by using only a single feature and the accuracy in the image matching process. The target prediction K + + neighborhood search algorithm can screen the result of target detection, contributes to reducing the calculated amount, is higher in precision of searching and tracking targets in neighborhoods compared with the original K + + neighborhood search algorithm, and can effectively eliminate interference generated by similar targets. When the tracked target is completely shielded in the tracking process, an anti-shielding algorithm based on vehicle motion state estimation is adopted, when the motion state of the vehicle in the shielding period is not changed much from the motion state before shielding, the algorithm can calculate the moving speed and the coordinates of the target before shielding in the video, and predict the motion state of the target in the disappearance time period, thereby estimating the position of the target which is probably appeared after the target exits from the shielding object, and realizing the relocation of the target. The method can quickly and accurately track a single target of a certain vehicle target in the video shot by the unmanned aerial vehicle, and has good universality and expandability.

Drawings

Fig. 1 is a flowchart of a single-target vehicle tracking method based on unmanned aerial vehicle aerial photography according to an embodiment of the present invention;

FIG. 2 illustrates a manner of selecting a tracking target according to an embodiment of the present invention; wherein, fig. 2a shows the wrong frame selection mode, and fig. 2b shows the correct frame selection mode;

FIG. 3 is a diagram illustrating tracking effects when a small portion of occlusion occurs, according to an embodiment of the present invention; wherein, fig. 3a, fig. 3b and fig. 3c are an image of 370 th frame, an image of 400 th frame and an image of 420 th frame, respectively;

FIG. 4 is a representation of a similar vehicle in accordance with an embodiment of the present invention; among them, fig. 4a, fig. 4b and fig. 4c are respectively the image of the 146 th frame, the image of the 219 th frame and the image of the 241 th frame;

FIG. 5 illustrates a fully occluded target according to an embodiment of the present invention; fig. 5a, 5b, and 5c show the 134 th frame image, the 140 th frame image, and the 144 th frame image, respectively.

Detailed Description

The following detailed description of embodiments of the present invention is provided in connection with the accompanying drawings and examples. The following examples are intended to illustrate the invention but are not intended to limit the scope of the invention.

Target tracking is an important task of computer vision, and from traditional tracking algorithms to the realization of the current generation of target tracking based on deep learning, numerous scholars and researchers invest a great deal of time and energy to make a very important contribution to the target tracking task. In the current computer vision neighborhood, especially the target tracking direction based on deep learning develops the most fierce heat, the invention also starts from a target detection network YOLOv4 of deep learning and combines image matching and target prediction to realize single-target tracking. For single target tracking, the invention proposes the idea that we should detect the target and know the kind of the target, and then add some means between each video frame to link the same target from frame to frame and find and label in the target category to which it belongs, thereby realizing single target tracking. Therefore, the main contents of the research of the invention are an image matching algorithm based on multi-feature fusion, a target prediction K + + neighborhood search algorithm and an anti-occlusion algorithm based on vehicle motion state estimation, which are divided into the following three points:

(1) and (3) performing an image matching algorithm IMF of multi-feature fusion. Starting with the basic features of the image, testing the similarity of each basic feature and calculating the image, and finding out an optimal mode suitable for matching the target under the condition that the unmanned aerial vehicle shoots the vehicle. After the color histogram feature is obtained and is more suitable for tracking vehicles, stress testing is carried out on the similarity calculation method of the feature, a road section crowded by vehicles is selected, and as a result, a single image feature cannot well express a target, so that the contour feature HOG feature of the vehicles is fused, the obtained similarity is added with corresponding weight to obtain a final similarity score, and after comprehensive judgment, the target is screened. Therefore, the image matching algorithm of multi-feature fusion is provided by taking the basic features of the image as the basis.

(2) And (4) a target prediction K + + neighborhood search algorithm. Firstly, a classification algorithm KNN is researched, then a K neighborhood search algorithm proposed according to the classification idea is introduced, and a K + + neighborhood search algorithm is proposed on the basis of the K + + neighborhood search algorithm. The K + + neighborhood search algorithm is obtained by fusing the ideas of the K neighborhood search algorithm and IoU and the Euclidean distance of the central point between the tracking frame and the prediction frame. The algorithm can better cope with the situation that vehicles same as the tracked target appear compared with the original K neighborhood search algorithm through experiments, and the algorithm can still ensure the correctness of the tracked target.

(2) An anti-occlusion algorithm AOE based on vehicle motion state estimation. The algorithm starts to record the motion information of the vehicle once every 20 frames after the target moves over 20 frames, wherein the motion information comprises the average speed and the displacement direction, and when the target is completely shielded, the motion information of the first 20 frames is stored to start to estimate the motion trend of the target in the vehicle gear period. Experiments prove that the AOE anti-occlusion algorithm can process the situation of re-occurrence of full occlusion of a target to a certain extent.

As shown in fig. 1, the method for tracking a single-target vehicle based on unmanned aerial vehicle aerial photography in the present embodiment is specifically described as follows.

Step 1: the method comprises the steps of obtaining an unmanned aerial vehicle aerial video, pausing at a first frame after the video is loaded, and selecting a target to be tracked in the first frame of the video by using a mouse frame, wherein the selected area is an area to be tracked, and a target vehicle in the area to be tracked is the tracked target. Then, the position of the target is continuously found in the video by a tracking algorithm in subsequent video frames, and the target itself should be selected as much as possible when the target is selected, as shown in fig. 2(b), the target does not need to contain too much information other than the target, so that the apparent feature of the target can be more accurately extracted, as shown in fig. 2(a), the information other than the target is selected too much. This part is implemented by OpenCV, which makes callback functions by using listening mouse events.

Step 2: judging whether the target vehicle in the current frame is completely shielded, if not, continuing to the step 3; if so, step 5 is performed.

And step 3: and the method is suitable for the K + + neighborhood search algorithm to predict the position of the tracking target.

After the tracking target is selected, a target detection algorithm is required to detect all vehicles appearing in the second frame and the subsequent frames of the video, and a K + + neighborhood search algorithm is required to process the detection result. In order to enhance the prediction effect of the target prediction algorithm, a more strict constraint condition needs to be added on the basis of the existing K neighborhood so as to increase the performance of the target prediction algorithm. In the embodiment, IoU and the concept of center point offset are combined to improve a K neighborhood search algorithm, and a K + + neighborhood search algorithm is designed to predict the position of the tracking target.

The execution process of the K + + neighborhood search algorithm is as follows:

step 3.1: calculating a K neighborhood range corresponding to the tracking frame when K is 2 according to the size of the tracking frame of the previous frame, and reducing the detection range of the current frame to the K neighborhood; the K neighborhood should satisfy formula (1):

wherein, W_kAnd H_kK is the width and height of the neighborhood search area, W and H are the width and height of the target tracking frame of the previous frame, and K is the aspect ratio of the two;

step 3.2: if only one target of the current frame is detected in the K neighborhood, namely at least two thirds of the area of the detection frame of the target is in the range of the K neighborhood, the target is the target of the previous frame, the tracking frame is updated, and the step 4 is continuously executed; if more than two targets appear in the K neighborhood in the current frame, executing the step 3.3;

step 3.3: respectively carrying out similarity calculation on the targets in the K neighborhood and the tracking target to obtain similarity scores and carrying out sequencing;

step 3.4: making IoU Euclidean distance between the target detection frame corresponding to the sequenced similarity score and the tracking frame of the previous frame and the central point; calculation IoU satisfies formula (2):

wherein gt is the tracking frame of the previous frame; bb (bounding box) is a detection box of the current frame appearing in the K neighborhood, IoU is calculated by using gt and the detection box in the K neighborhood respectively, and the detection box with the largest value IoU is selected for reservation, which satisfies the formula (3):

IoU(gt,bb)_max＝Max(IoU(gt,bb₁),...,IoU(gt,bb_n)) (3)

wherein IoU is the intersection ratio of the detection frame and the previous frame tracking frame, gt is the tracking frame of the previous frame, bb is the detection frame in the K + + neighborhood of the current frame, and n is the number of the detection frames in the K + + neighborhood;

calculating the Euclidean distance of the central point to satisfy the formula (4):

wherein d is the Euclidean distance between two points, c is the central point of the frame, and the coordinates are (x, y); c. C_gtFor the center point of the previous frame, c_bbDetecting the central point of a frame for the current frame; and (4) selecting the detection frame corresponding to the central point with the minimum Euclidean distance, and combining the detection frame with the similarity calculated by the previous image matching and the maximum IoU to judge which detection frame detects the tracking target.

The judgment sequence at this time is: the similarity of the images is compared, similar vehicles are excluded according to IoU, and finally the tracking target is selected according to the Euclidean distance of the central point.

And 4, step 4: after the detection frame is screened in the step 3, the tracking target is matched with the target in the screened detection frame to select the position where the current frame tracking target appears. The matching image uses a multi-feature fusion image matching algorithm, and the execution process is as follows:

step 4.1: after the target to be tracked is selected in a frame mode, extracting color histogram features and HOG features of the tracked target, and converting the two features into feature vectors.

And 4.2: and in subsequent frames, extracting the detected objects of the same category as pictures, extracting the color histogram feature and the HOG feature of each object in the same way, and obtaining feature vectors. In the calculation method of the color histogram feature vector, if each primary color can take 256 (0-255) values, then 1600 ten thousand colors (256 powers of three) in the whole color space result in huge calculation amount, so that the range 0-255 is divided into four equal-range regions: [0,63] is the 0 region, [64,127] is the 1 region, [128,191] is the 2 region and [192,255] is the 3 region. Four values are provided for each primary color, and 64 color types (fourth power) are provided for the three primary colors. Therefore, any color appearing in the image will certainly belong to one of the four regions. Then, the number of pixels appearing in each region is counted, and thus after the regions are partitioned, the calculated amount is reduced to the maximum extent, and a 64-dimensional feature vector is obtained.

Step 4.3: and (4) respectively calculating the color histogram feature similarity and the HOG feature similarity between the tracked target and all the targets obtained in the step (4.2), carrying out weighted scoring, and finally sorting all the scores. Wherein, the calculation mode of the color histogram feature similarity is cosine similarity, which satisfies the formula (5):

the smaller the included angle between the two vectors in the space is, the more the representative pointing directions tend to be the same, the more similar the representative pointing directions are proved to be, the greater the similarity is, and the smaller the characteristic angle theta corresponding to the cosine function image is, the larger the corresponding function value is. Therefore, according to the above theoretical basis, the magnitude of the vector angle in the coordinate system space can be used as the basis for determining the vector similarity. The smaller the angle, the larger the cosine value corresponding to the angle, the more similar they are. Accordingly, the cosine calculation method is also true for multidimensional vectors. Assume P and Q are two multidimensional vectors, P is [ P ]₁,P₂,...,P_m]Q is [ Q ]₁,Q₂,...,Q_m]And m is the vector dimension.

The HOG feature similarity uses a HOG feature descriptor, namely feature vectors, and finally calculates Euclidean distance between the feature vectors, wherein the smaller the distance is, the more similar the two pictures are proved. Since the HOG feature vector is an n-dimensional vector, the corresponding euclidean distance satisfies equation (6):

where d is the Euclidean distance of the vector, x_i、y_iTwo coordinate values of a vector in a multidimensional space. As can be seen from the formula, the closer the values at the corresponding indexes of the two feature vectors are, the smaller the distance d is, and the more similar the two images are.

And respectively giving an excitation coefficient to the similarity of the color histogram feature and the similarity of the HOG feature so as to obtain a calculation method which is most suitable for calculating the similarity between the vehicles. According to the application scenarios of experimental data and algorithms, the weight of the color histogram feature is set to 1, the weight of the HOG feature is set to 2, and the candidate frame picture with the maximum similarity is selected as the target to be tracked.

And (3) after each candidate frame which is screened and participates in calculation is extracted, after the cosine similarity calculation of the color histogram feature and the Euclidean distance similarity calculation of the HOG feature are carried out on the features and the tracked target, the features and the tracked target are respectively multiplied by the corresponding weights, and the final similarity score is obtained by adding the results to satisfy the formula (7):

S_i＝W₁(S(c_i,c_t))+W₂(S(h_i,h_t)) (7)

wherein S is a similarity calculation function of the parameter in parentheses, S_iScoring the total similarity of the image in the ith candidate frame and the tracking target; w₁The weight coefficient is the color histogram feature similarity weight coefficient, and the value is 1; w₂The HOG characteristic similarity weight coefficient is 2; s (c)_i,c_t) Color histogram feature similarity for the ith candidate frame and tracking target t, S (h)_i,h_t) Is composed ofHOG feature similarity, h, of the ith candidate box and the tracking target t_iFor the HOG feature of the ith candidate box, h_tHOG characteristics of the tracked target; c. C_tFor the center point of the previous frame, c_iThe center point of the frame is detected for the current frame. And finally, selecting the candidate frame with the maximum similarity total score as a real tracking object for tracking.

After the characteristics are fused, the expression capability of the template object is improved by the algorithm, the contour characteristics are added except for the apparent color characteristics, and the tracker can be ensured to make correct judgment when the traffic flow is large.

Step 4.4: and selecting the maximum score value sequenced in the step 4.3, and combining the result calculated in the step 3, and selecting the corresponding target as the appearance position of the tracking target of the current frame under the judgment condition that the IoU of the tracking frame and the detection frame is the largest and the Euclidean distance between the central points of the tracking frame and the detection frame is the shortest.

And 5: when the tracking target is blocked, an anti-blocking algorithm based on vehicle motion state estimation is adopted, after the tracking starts to exceed 20 frames, the average speed of the vehicle moving in the video is recorded every 20 frames, according to the mode, when the target disappears in the visual field, the disappearance coordinates are stored, the moving speed of the target is stopped being recorded, and the moving speed of the previous 20 frames is stored. If the target disappears within 50 frames (about 3 seconds), normally estimating the moving track and the coordinates in the disappearance, simultaneously acquiring a K neighborhood of the current estimated position, recording the position of the target possibly appearing after 3 seconds, setting the K neighborhood to wait for capturing the target at the position, calling an image matching algorithm in the step 4 to start matching if the vehicle is redetected and captured by the K neighborhood, continuing tracking if the matching is successful, starting full-image matching if the matching is not successful, removing the recording of the motion speed and the coordinates, automatically searching by a tracker, calling a tracking mode of the step 3-the step 4, matching the target with the maximum similarity between the tracking target appearing in the current visual field and the initially selected by using a multi-feature fusion matching algorithm, updating a tracking frame, and reestablishing the K neighborhood based on the coordinates of the target.

As shown in fig. 3, fig. 3a is an image of 370 th frame, which is the tracking effect in the case of no occlusion, and fig. 3b and fig. 3c are images of 400 th frame and 420 th frame, respectively, which are the tracking cases in the case of partial occlusion of the vehicle by the traffic light, and the tracking algorithm of the present embodiment can perform good processing for the cases.

As shown in fig. 4, where fig. 4a, fig. 4b, and fig. 4c are images of frame 146, frame 219, and frame 241, respectively, it can be seen that the image matching algorithm of the present embodiment is combined with the K + + neighborhood search algorithm, and even if there are vehicles with similar color profiles near the target vehicle, the tracking algorithm can still cope well.

As shown in fig. 5, where fig. 5 (a), fig. 5 (b) and fig. 5 (c) are images of frames 134, 140 and 144, respectively, the frame 134 indicates that the vehicle is about to disappear, the frame 140 is the motion estimation of the AOE anti-occlusion algorithm when the vehicle is completely occluded, and the frame 144 is when the vehicle exits from the occlusion, the target detected again by the K + + neighborhood generated by the estimation frame is captured and matched with the tracked target, it can be seen that the AOE anti-occlusion algorithm can solve the full occlusion problem of the target to some extent.

The method of the embodiment is used for tracking and detecting the single-target vehicles on the urban roads, the expressways and the traffic jam road sections, and the tracking accuracy is shown in table 1. The average accuracy of the single-target vehicle tracking algorithm based on unmanned aerial vehicle aerial photography of the embodiment to vehicle tracking is 91.1%.

TABLE 1 tracking accuracy for each scene

Scene	Tracking accuracy
		City road	0.887
Highway with a light-emitting diode	0.935
		Traffic jam	0.912

The accuracy of the method of this embodiment is compared with the accuracy of TLD, SiamFC, SiamRPN + +, and Dimp tracking algorithms, and the results are shown in table 2.

TABLE 2 comparison of tracking accuracy for algorithms

Algorithm	Tracking accuracy
		The invention	0.911
TLD	0.735
		SiamFC	0.814
SiamRPN++	0.876
		Dimp	0.925

According to the table 2, compared with the traditional TLD algorithm and the twin network-based single target tracking algorithm, the target tracking algorithm of the invention has the accuracy influenced by the detection result, so that the tracking accuracy of the tracking algorithm is comprehensively evaluated to be improved by 5.9% on average.

Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; such modifications and substitutions do not depart from the spirit of the corresponding technical solutions and scope of the present invention as defined in the appended claims.

Claims

1. The utility model provides a single target vehicle tracking method based on unmanned aerial vehicle takes photo by plane which characterized in that: the method comprises the following steps:

step 1: loading an unmanned aerial vehicle aerial video needing vehicle tracking, pausing at a first frame, manually framing a target vehicle to be tracked by using a mouse, wherein the framed area is an area to be tracked, and the target vehicle in the area to be tracked is a tracking target; then, starting from the second frame of the video, executing the step 2, and detecting the target vehicle appearing in the video frame;

and step 3: the method is suitable for a K + + neighborhood search algorithm to predict the position of a tracking target;

establishing a K + + neighborhood around the tracking frame, screening redundant target detection frames by using the K + + neighborhood, leaving vehicles which possibly are tracking targets in the K + + neighborhood, and calculating IoU offset between the tracking frame and the detection frame and a central point;

and 4, step 4: extracting the targets in the screened detection frame into pictures, matching the pictures with the tracking targets selected by the first frame by using a multi-feature fusion image matching algorithm, respectively calculating the image similarity of the tracking targets and the screened targets and sequencing the images, comprehensively judging which detected target in the current frame is the target to be tracked by combining the calculation result of the step 3, then updating the tracking frame, and executing a step 6;

and 5: when the tracking target is blocked, an anti-blocking algorithm based on vehicle motion state estimation is adopted, after the tracking starts to exceed 20 frames, the average speed of the vehicle moving in the video is recorded every 20 frames, according to the mode, when the target disappears in the visual field, the coordinates of the disappearance are stored, the moving speed of the target is stopped being recorded, and the moving speed of the previous 20 frames is stored; if the target disappears within 50 frames, normally estimating a moving track and coordinates in the disappearance, simultaneously acquiring a K neighborhood of the current estimated position, recording the position of the target possibly appearing after 50 frames, setting the K neighborhood at the position to wait for capturing the target, if the vehicle is re-detected and captured by the K neighborhood, calling an image matching algorithm in the step 4 to start matching, if the matching is successful, continuing to track, if the matching is not successful, starting full-image matching, removing the recording of the motion speed and the coordinates, automatically searching by a tracker, calling a tracking mode of the steps 3-4, matching the target with the maximum similarity of the initially selected tracking target appearing in the current visual field by using a multi-feature fusion matching algorithm, and re-establishing a K + + neighborhood based on the coordinates of the target; executing the step 6;

2. The unmanned aerial vehicle aerial photography-based single-target vehicle tracking method according to claim 1, wherein: and when the target is selected in the frame in the step 1, only the target is selected.

3. The unmanned aerial vehicle aerial photography-based single-target vehicle tracking method according to claim 1, wherein: the specific method of the step 3 comprises the following steps:

step 3.1: calculating a K neighborhood range corresponding to the tracking frame when K is 2 according to the size of the tracking frame of the previous frame, and reducing the detection range of the current frame to the K neighborhood; k neighborhood satisfies formula (1):

wherein, W_kAnd H_kRespectively the width and the height of a K neighborhood search area, respectively the width and the height of a target tracking frame of a previous frame, and K is the ratio of the width to the height of the target tracking frame of the previous frame;

step 3.4: making IoU Euclidean distance between the target detection frame corresponding to the sequenced similarity score and the tracking frame of the previous frame and the central point; and (3) taking the detection frame corresponding to the central point with the minimum Euclidean distance, combining the detection frame with the similarity calculated by the matching of the previous images and the maximum IoU to judge which detection frame detects the tracking target, wherein the judgment sequence is as follows: the similarity of the images is compared, similar vehicles are excluded according to IoU, and finally the tracking target is selected according to the Euclidean distance of the central point.

4. The unmanned aerial vehicle aerial photography-based single-target vehicle tracking method according to claim 3, wherein: in step 3.4, IoU is calculated to satisfy equation (2):

wherein gt is the tracking frame of the previous frame; bb is a detection frame of the current frame in the K neighborhood range; IoU calculation is carried out by using detection boxes in gt and K neighborhoods respectively, and the detection box with the largest IoU value is selected for reservation, and the formula (3) is satisfied:

IoU(gt,bb)_max＝Max(IoU(gt,bb₁),...,IoU(gt,bb_n)) (3)

wherein IoU () is the intersection ratio of the detection frame and the previous frame tracking frame, gt is the tracking frame of the previous frame, bb_iThe number of the ith detection frame appearing in the K + + neighborhood of the current frame is n, and the n is the total number of the detection frames in the K + + neighborhood;

wherein d (-) is the Euclidean distance between two points in the parentheses, c₁、c₂Respectively, the central points of the two frames, and the coordinates thereof are respectively (x)₁,y₁)、(x₂,y₂)；c_gtFor the center point of the previous frame, c_bbThe center point of the frame is detected for the current frame.

5. The unmanned aerial vehicle aerial photography-based single-target vehicle tracking method according to claim 1, wherein: the specific method of the step 4 comprises the following steps:

step 4.1: after the target to be tracked is selected, extracting color histogram features and HOG features of the tracked target, and converting the two features into feature vectors;

step 4.2: extracting the detected targets of the same category as pictures in subsequent frames, extracting the color histogram feature and the HOG feature of each target in the same way, and obtaining feature vectors;

step 4.3: respectively calculating the color histogram feature similarity and the HOG feature similarity between the tracked target and all the targets obtained in the step 4.2, carrying out weighted scoring, and finally sorting all the scores; finally, selecting the candidate frame with the maximum similarity total score as a real tracking object for tracking;

6. The unmanned aerial vehicle aerial photography-based single-target vehicle tracking method according to claim 5, wherein: in the calculation of the color histogram feature vector in the step 4.1 and the step 4.2, the value range 0-255 of each primary color is divided into four equal regions: [0,63] is region 0, [64,127] is region 1, [128,191] is region 2, [192,255] is region 3; the value of each primary color corresponds to four values after being partitioned, any color appearing in the image definitely belongs to one of the four regions, the number of pixels appearing in each region is counted, and a 64-dimensional feature vector is obtained.

7. The unmanned aerial vehicle aerial photography-based single-target vehicle tracking method according to claim 5, wherein: in the step 4.3, the calculation mode of the feature similarity of the color histogram is cosine similarity, which satisfies the formula (5):

the smaller the included angle is, the larger the cosine value corresponding to the included angle is, the more similar the included angle is; p and Q are two multidimensional vectors, P is [ P ]₁,P₂,...,P_m]Q is [ Q ]₁,Q₂,...,Q_m]M is the vector dimension;

the HOG feature similarity uses a HOG feature descriptor, namely feature vectors, and finally calculates Euclidean distance between the feature vectors, wherein the smaller the distance is, the more similar the two pictures are proved; since the HOG feature vector is an n-dimensional vector, the corresponding euclidean distance satisfies equation (6):

where d is the Euclidean distance of the vector, x_i、y_iTwo coordinate values of a vector in a multi-dimensional space;

after cosine similarity calculation of the color histogram feature and Euclidean distance similarity calculation of the HOG feature are carried out, the cosine similarity calculation and the Euclidean distance similarity calculation are respectively multiplied by corresponding weights, and the final similarity score is obtained through addition, and the formula (7) is satisfied:

S_i＝W₁(S(c_i,c_t))+W₂(S(h_i,h_t)) (7)

in the formula, S_iScoring the total similarity of the image in the ith candidate frame and the tracking target; w₁A color histogram feature similarity weight coefficient; w₂Weighting coefficient for HOG feature similarity; s is a similarity calculation function of the parameters in brackets; s (c)_i,c_t) A color histogram feature similarity function for the ith candidate frame and the tracking target t, c_iDetecting the center point of the frame for the current frame, c_tThe central point of the tracking frame of the previous frame; s (h)_i,h_t) HOG feature similarity function for ith candidate box and tracking target t, h_iFor the HOG feature of the ith candidate box, h_tFor tracking HOG features of the target.