CN116820131A - Unmanned aerial vehicle tracking method based on target perception ViT - Google Patents

Unmanned aerial vehicle tracking method based on target perception ViT Download PDF

Info

Publication number
CN116820131A
CN116820131A CN202310818688.3A CN202310818688A CN116820131A CN 116820131 A CN116820131 A CN 116820131A CN 202310818688 A CN202310818688 A CN 202310818688A CN 116820131 A CN116820131 A CN 116820131A
Authority
CN
China
Prior art keywords
target
unmanned aerial
tracking
aerial vehicle
prediction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310818688.3A
Other languages
Chinese (zh)
Inventor
李水旺
杨向阳
叶恒舟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guilin University of Technology
Original Assignee
Guilin University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guilin University of Technology filed Critical Guilin University of Technology
Priority to CN202310818688.3A priority Critical patent/CN116820131A/en
Publication of CN116820131A publication Critical patent/CN116820131A/en
Pending legal-status Critical Current

Links

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Image Analysis (AREA)

Abstract

The invention discloses an unmanned aerial vehicle tracking method based on target perception, and relates to the technical field of unmanned aerial vehicle tracking. The adopted tracking framework is a single-stream tracking framework and comprises a backbone network and a prediction head. The backbone network uses DeiT-Tiny, which is ViT-based, and achieves target perception through mutual information maximization operation between the template image and the characteristics thereof. The prediction header has three branches for prediction classification score, prediction sample quantization error, and prediction normalization bounding box size, respectively, each branch consisting of four convolutions-batch normalization-ReLU stacked together. Training is performed using the existing target tracking dataset to obtain a drone tracking model, and then the trained framework is deployed to the drone platform for target . According to the invention, by designing and training the unmanned aerial vehicle tracking model based on target perception, accurate, efficient and real-time tracking of the target under strong sunlight, the target with a rapid transformation view angle and the remote small target can be realized.

Description

Unmanned aerial vehicle tracking method based on target perception ViT
Technical Field
The invention relates to the field of target tracking, in particular to a target tracking method for an unmanned aerial vehicle.
Background
With the development of artificial intelligence, many industries are affected, some of which are subject to tremendous variation. For unmanned aerial vehicles, many companies are working to make unmanned aerial vehicles more intelligent by deep learning technology, one of which is unmanned aerial vehicle tracking technology. Unmanned aerial vehicle tracking has wide application in aspects such as disaster relief, traffic monitoring, environment monitoring, power inspection, etc. The unmanned aerial vehicle is different from unmanned aerial vehicle, unmanned ship etc. because the restriction of its take off weight, the processor and the battery that carry all need as light as possible, consequently unmanned aerial vehicle's processor performance and battery power all receive the restriction.
The drone tracker should possess two basic qualities: 1) To be able to cope with extreme challenges such as extreme viewing angle, motion blur and severe occlusion; 2) The requirements of high efficiency and low power consumption under the conditions of limited battery capacity and computational resource constraint are met.
Currently, the most widely used trackers in unmanned aerial vehicles are still discriminant filter (DCF) based trackers, and recently, convolution Neural Network (CNN) based lightweight trackers using filter pruning are also used. DCF-based trackers are favored because of their high efficiency, however they tend to be difficult to achieve with high tracking accuracy. CNN-based trackers, on the other hand, are known for their high accuracy, but they require very high computational resources and are therefore less suitable for efficient demands. To address this issue, researchers have introduced CNN-based lightweight trackers for unmanned aerial vehicle tracking, under trade-offs. These trackers employ filter pruning techniques to reduce the number of parameters in the network, thereby significantly improving accuracy and efficiency.
In the field of general vision tracking, the emerging ViT (Vision Transformer) -based trackers have achieved great success through the use of an attention mechanism, enabling more efficient capture of target locations. While the unmanned tracking field has not yet proposed ViT-based trackers, probably because ViT-based universal vision trackers have a large number of model parameters and low operating rates, these reasons prevent many beneficial exploration.
Disclosure of Invention
The invention aims to provide an unmanned aerial vehicle tracking method based on target perception ViT for real-time and efficient unmanned aerial vehicle tracking.
The technical scheme of the invention is to design and train the proposed unmanned aerial vehicle tracking model, and deploy the model to an unmanned aerial vehicle platform for target tracking so as to meet the requirements of users.
A ViT based drone tracking framework is shown in figure 1. The framework consists of a backbone network based on target awareness ViT and predictive headers.
(1) Backbone network
The backbone network carries the task of simultaneously outputting feature learning and template-search image coupling, allowing the two processes to interact. The input to the framework contains a target template Z and a search image X, which are first cut to the same size (16X 16) and flattened sequentially into a sequence, then labeled by a trainable linear projection layer, and yield K vectors, expressed as:
(1)
where d represents the embedded dimension of each vector, the vector sequenceAndrepresenting the template and the search image, respectively, wherein. By usingRepresent the firstLayer transform block, vector from the firstLayer to the firstLayer passageTo be converted. The entire conversion process can be expressed as:
(2)
wherein The combination operation is represented by a combination operation,is thatIs common to the parameters of (a)Layer transducer block.
The core idea of the framework is that the mutual information between the template image and its features is maximized.
Is provided withIs two random variables, thenThe mutual information between can be expressed as:
(3)
wherein The joint probability distribution is represented by a graph,representing the probability distribution of the edge(s),represents the Kullback-Leibler divergence (commonly abbreviated KL divergence). In practice, however, it is very difficult to estimate mutual information, since we can get samples, but not overall distributions. We therefore learn target perception ViT for drone tracking using Deep InfoMax (DIM), which is based on Jensen-Shannon divergence (JSD) instead of KL divergence. It is expressed as:
(4)
wherein Is composed ofA neural network of parameters which are to be varied,is a Softplus activation function. In this framework, we do the following:
(5)
=representing template features intercepted from the backbone network output. The mutual information maximization loss function is defined as follows:
(6)
(2) Pre-measurement head and loss function
Prediction header using full convolutional networkThree branches are included, each including 4 convolutionally-batch normalized-ReLU layers stacked together for estimating the bounding box of the target. Intercepting part of a search image from vectors output from a backbone networkAnd is re-interpreted as a 2-dimensional spatial signature input into the pre-measurement head. The result is a target classification scoreLocal offsetAnd normalizing bounding box size( wherein Representing the height and width of the search image respectively,representing the side length of the small block into which the image is cut). The initial estimate of the location is determined by the maximum classification score, expressed as. The predicted target bounding box is then calculated based on this coarse position as:
(7)
for the tracking task we classified with weighted focal loss and used a combination of IoU loss and L1 loss for bounding box regression. The final total loss function is:
(8)
wherein the constant is. After loading ViT pre-training weights for image classification, our framework uses the overall loss functionEnd-to-end training is performed.
Drawings
Unmanned aerial vehicle tracking frame based on target perception ViT of fig. 1
FIG. 2 attention-seeking contrast with and without target awareness
FIG. 3 prediction box visualization
FIG. 4 unmanned aerial vehicle tracking test
Detailed Description
The invention relates to an unmanned aerial vehicle tracking method based on target perception ViT, which comprises the following specific steps:
(1) First, a training dataset is prepared, which includes GOT-10k, laSOT, COCO and TrackingNet, which are all very well known in the field of target tracking.
(2) The unmanned plane tracking framework is created, and the backbone network in the framework uses DeiT-Tiny, which is a ViT-based network model, and the pre-measurement heads are 4 convolution-batch normalization-ReLU layers stacked together.
(3) The framework includes two inputs, a template and a search image, of 128 x 128 and 256 x 256, respectively, that scale the input picture to a specified size. The batch size was set to 32. Training a model using an AdamW optimizer and setting the weight decay to beAt firstThe initial learning rate is. A total of 300 rounds of training were performed, each round of inputting 60000 image pairs, and the learning rate was reduced by a factor of 10 after 240 rounds. After our target perception training, the recognition of the model to the target is more accurate. The visualized attention map (attention map) is shown in fig. 2, the left is the original image, the middle is the attention map without using target perception in the training process, the right is the attention map with adding target perception in the training process, and the attention generated by the model after adding target perception is more prominent from the figure.
(4) Test data sets were prepared, including DTB70, UAVDT, visDrone2018, UAV123, and uav123@10fps. These five data sets are existing challenging unmanned aerial vehicle test benchmarks, including videos taken under intense movements of the unmanned aerial vehicle, various cluttered scenes and objects, various weather conditions, flying height and camera viewing angles, etc., for evaluating unmanned aerial vehicle tracking algorithms. And taking the first frame picture of each video of the test set as a template, taking each frame as a search image, sequentially sending the search images into the frames, and outputting a result as a prediction frame of each frame. The prediction frame can be displayed on the image by the visual output, and as shown in fig. 3, the number in the upper left corner represents the frame of the video.
(5) The unmanned aerial vehicle simulation platform verifies that an embedded airborne processor Jetson Nano 2G is installed, and is a typical unmanned aerial vehicle simulation platform. Our unmanned aerial vehicle tracking framework is deployed on this platform, and then video is shot with the real machine to test the tracking effect. In the test, the GPU and CPU utilization were 52.7% and 18.9%, respectively, with an average speed of 43.6FPS. We tested objects in strong sunlight, objects with a fast changing viewing angle, and small objects at a distance, the test results are shown in fig. 4.

Claims (1)

1. The unmanned aerial vehicle tracking method based on target perception ViT is characterized by comprising the following steps of:
s1: the input of the framework comprises a target template to be tracked and a search image;
s2: the frame comprises a backbone network and a pre-measurement head;
s3: the backbone network uses a ViT-based network model DeiT-Tiny, input as segmented and flattened by the target templateThe individual vectors and +.>The number of vectors is 8×8, and the feature map is output;
s4, the prediction head is provided with three branches which are respectively used for predicting classification scores, predicting downsampling offset values and predicting normalized boundary box sizes, and each branch consists of four convolution-batch normalization-ReLUs which are stacked together;
s5, performing mutual information maximization processing on the template image before being sent into the backbone network and the template characteristics after the backbone network so as to realize target perception;
s6: loss function adopted in model trainingCalculated from equation (1);
(1)
wherein ,、/>、/>three constants, +.>、/>、/>、/>Loss of classification branch, loss of IoU of regression branch, L of regression branch, respectively 1 Loss, mutual information maximization loss of the target perception part;
S7:defined by equation (2);
(2)
wherein ,for the target predictive value, +.>Is a regulatory factor;
S8:defined by equation (3);
(3)
wherein ,area representing the graphic intersection of the prediction and truth boxes, +.>Representing the area of the graphical union of the prediction and truth boxes;
S9:defined by equation (4);
(4)
wherein ,for target value, & lt + & gt>N is the number of samples for the estimated value;
S10:defined by equation (5);
(5)
wherein JSD represents JS divergence, Z is the target template to be tracked,is a template feature after passing through the L-layer transducer block.
CN202310818688.3A 2023-07-05 2023-07-05 Unmanned aerial vehicle tracking method based on target perception ViT Pending CN116820131A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310818688.3A CN116820131A (en) 2023-07-05 2023-07-05 Unmanned aerial vehicle tracking method based on target perception ViT

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310818688.3A CN116820131A (en) 2023-07-05 2023-07-05 Unmanned aerial vehicle tracking method based on target perception ViT

Publications (1)

Publication Number Publication Date
CN116820131A true CN116820131A (en) 2023-09-29

Family

ID=88125706

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310818688.3A Pending CN116820131A (en) 2023-07-05 2023-07-05 Unmanned aerial vehicle tracking method based on target perception ViT

Country Status (1)

Country Link
CN (1) CN116820131A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117893574A (en) * 2024-03-14 2024-04-16 大连理工大学 Infrared unmanned aerial vehicle target tracking method based on correlation filtering convolutional neural network

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117893574A (en) * 2024-03-14 2024-04-16 大连理工大学 Infrared unmanned aerial vehicle target tracking method based on correlation filtering convolutional neural network

Similar Documents

Publication Publication Date Title
US20220215227A1 (en) Neural Architecture Search Method, Image Processing Method And Apparatus, And Storage Medium
JP2022515895A (en) Object recognition method and equipment
CN112069868A (en) Unmanned aerial vehicle real-time vehicle detection method based on convolutional neural network
CN112949673A (en) Feature fusion target detection and identification method based on global attention
CN114972213A (en) Two-stage mainboard image defect detection and positioning method based on machine vision
CN116343330A (en) Abnormal behavior identification method for infrared-visible light image fusion
CN111462192A (en) Space-time double-current fusion convolutional neural network dynamic obstacle avoidance method for sidewalk sweeping robot
CN113313703A (en) Unmanned aerial vehicle power transmission line inspection method based on deep learning image recognition
CN111046767A (en) 3D target detection method based on monocular image
CN113128564B (en) Typical target detection method and system based on deep learning under complex background
CN116385958A (en) Edge intelligent detection method for power grid inspection and monitoring
CN112613504A (en) Sonar underwater target detection method
CN116820131A (en) Unmanned aerial vehicle tracking method based on target perception ViT
CN115393690A (en) Light neural network air-to-ground observation multi-target identification method
CN115019302A (en) Improved YOLOX target detection model construction method and application thereof
CN114972439A (en) Novel target tracking algorithm for unmanned aerial vehicle
CN112766411A (en) Target detection knowledge distillation method for adaptive regional refinement
CN116740516A (en) Target detection method and system based on multi-scale fusion feature extraction
CN110222822B (en) Construction method of black box prediction model internal characteristic causal graph
CN112132207A (en) Target detection neural network construction method based on multi-branch feature mapping
Zhang et al. Full-scale Feature Aggregation and Grouping Feature Reconstruction Based UAV Image Target Detection
CN117576149A (en) Single-target tracking method based on attention mechanism
CN116152699B (en) Real-time moving target detection method for hydropower plant video monitoring system
CN117392568A (en) Method for unmanned aerial vehicle inspection of power transformation equipment in complex scene
CN116630387A (en) Monocular image depth estimation method based on attention mechanism

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination