CN115578416A - Unmanned aerial vehicle target tracking method, system, medium and electronic equipment - Google Patents

Unmanned aerial vehicle target tracking method, system, medium and electronic equipment Download PDF

Info

Publication number
CN115578416A
CN115578416A CN202211246719.4A CN202211246719A CN115578416A CN 115578416 A CN115578416 A CN 115578416A CN 202211246719 A CN202211246719 A CN 202211246719A CN 115578416 A CN115578416 A CN 115578416A
Authority
CN
China
Prior art keywords
aerial vehicle
unmanned aerial
target
feature
target tracking
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211246719.4A
Other languages
Chinese (zh)
Inventor
刘允刚
尹宇肖
满永超
陈琳
李峰忠
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong University
Original Assignee
Shandong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong University filed Critical Shandong University
Priority to CN202211246719.4A priority Critical patent/CN115578416A/en
Publication of CN115578416A publication Critical patent/CN115578416A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • G06T7/248Analysis of motion using feature-based methods, e.g. the tracking of corners or segments involving reference images or patches
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01CMEASURING DISTANCES, LEVELS OR BEARINGS; SURVEYING; NAVIGATION; GYROSCOPIC INSTRUMENTS; PHOTOGRAMMETRY OR VIDEOGRAMMETRY
    • G01C21/00Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00
    • G01C21/20Instruments for performing navigational calculations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/62Extraction of image or video features relating to a temporal dimension, e.g. time-based feature extraction; Pattern tracking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/10Terrestrial scenes
    • G06V20/17Terrestrial scenes taken from planes or by drones
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/52Surveillance or monitoring of activities, e.g. for recognising suspicious objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30232Surveillance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30248Vehicle exterior or interior
    • G06T2207/30252Vehicle exterior; Vicinity of vehicle
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Remote Sensing (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Automation & Control Theory (AREA)
  • Image Analysis (AREA)

Abstract

The disclosure provides an unmanned aerial vehicle target tracking method, system, medium and electronic device, aiming at an application scene of an unmanned aerial vehicle for tracking a ground target in real time and accurately. Based on a SiamRPN network, alexNet is used as a feature extraction network, an improved FPN feature pyramid structure is introduced, the feature expression capability is enhanced, and the target tracking precision is improved. Meanwhile, the accuracy and timeliness requirements of the unmanned aerial vehicle path planning algorithm are considered, the ego-planner track optimization strategy is used, on the premise that the track optimization accuracy is met, rear-end unconstrained optimization is carried out, the optimal track which meets the dynamic constraint is obtained, and the unmanned aerial vehicle tracking target is finally achieved.

Description

Unmanned aerial vehicle target tracking method, system, medium and electronic equipment
Technical Field
The disclosure belongs to the technical field of target tracking, and particularly relates to a target tracking method, a target tracking system, a target tracking medium and electronic equipment for an unmanned aerial vehicle.
Background
The statements in this section merely provide background information related to the present disclosure and may not necessarily constitute prior art.
The complexity and the variability of the visual target tracking scene of the unmanned aerial vehicle bring huge challenges to the realization of accurate target tracking and rapid obstacle avoidance of the unmanned aerial vehicle. The traditional target detection speed based on the correlation filtering algorithm is high, but the manual characteristics such as color and gray scale extracted by the traditional target detection speed are rough, and the tracking precision is not high. With the development of the deep learning theory, the tracking method for extracting the features by using the convolutional neural network can achieve good balance in the aspects of tracking precision and speed. At present, a ResNet depth residual error network is adopted in most feature extraction networks, the operation of the ResNet depth residual error network on NVIDIA TitanXpGPU can only reach about 30fps, the operation burden of equipment is increased, and the ResNet depth residual error network cannot be applied to real-time tracking of onboard unmanned aerial vehicle targets with limited computational resources. In addition, due to the diversity of the unmanned aerial vehicle detection images, targets with different scales cannot be effectively represented by single-stage characteristic images, so that the unmanned aerial vehicle cannot accurately detect and track the target with larger size change amplitude when detecting the images.
Feature Pyramid Networks (FPNs) are one of the effective methods for solving the multi-scale problem of object detection. The current research on this network is mostly focused on the following aspects: (1) The feature extraction network backbone extracts semantic features of different levels from the input image, a shallow layer network generates more detail features, and a deep layer network generates more semantic features; (2) performing up-sampling processing on the feature map with low resolution; (3) And fusing the up-sampling operation result with a feature map of a scale corresponding to the generated feature map of the backbone network. Most research methods adopt upsampling operation for processing in step (2) and carry out edge filling processing on the feature map, so that translation invariance of the convolutional neural network is influenced, and tracking accuracy of the unmanned aerial vehicle is reduced. Meanwhile, for the processing in (3), most methods adopt a layered feature prediction target, or carry out splicing processing on a plurality of prediction feature layers in a dimension. This kind of FPN network structure is with reducing the detection speed cost, has got in exchange for the promotion that detects the precision, greatly increased unmanned aerial vehicle's operation burden.
When a target tracking task is executed, the unmanned aerial vehicle aims at threat interference and rapid obstacle avoidance in a complex airspace and is the key for completing a flight task. Conventional gradient-based path planning algorithms require the construction of a global euclidean symbol Distance field Map ESDF Map (hierarchical signaled Distance Functions Map). Because the trajectory only covers a small range of ESDF maps, the conventional algorithm spends a lot of time building the Map, limiting the use of the motion planning algorithm with limited resources.
Disclosure of Invention
In order to overcome the defects of the prior art, the present disclosure provides a target tracking method, system, medium and electronic device for an unmanned aerial vehicle, so as to realize that the unmanned aerial vehicle can quickly and accurately avoid obstacles; by effectively combining the target tracking and dynamic obstacle avoidance technologies, the application scene of efficiently tracking the target in the complex environment of the unmanned aerial vehicle is realized.
In order to achieve the above object, one or more embodiments of the present disclosure provide the following technical solutions:
in a first aspect, an unmanned aerial vehicle target tracking method is disclosed, including:
the unmanned aerial vehicle receives an unlocking instruction and a tracking instruction, initializes model hyper-parameters, and loads a pre-trained unmanned aerial vehicle target tracking model;
extracting template area image features and search area image features by using an unmanned aerial vehicle target tracking model, and performing multi-feature fusion to obtain a target boundary frame in a search area image;
converting the obtained position coordinates of the target boundary frame into coordinates in a world coordinate system by using an unmanned aerial vehicle monocular camera simulation model, and taking the coordinates as a target value of path planning;
and planning the optimal path in real time by using the obtained path planning target value.
Specifically, the unmanned aerial vehicle target tracking model adopts AlexNet as a feature extraction network.
As an alternative embodiment, a spatial perception sampling strategy is introduced, and the target position is shifted near the center point of the training sample through a uniformly distributed sampling mode.
According to the further technical scheme, based on an AlexNet feature extraction network, an FPN feature pyramid structure based on a multi-head attention mechanism is established, and multi-feature fusion is achieved;
the method specifically comprises the following steps:
the feature extraction network outputs predicted feature maps C3, C4 and C5 of the third layer, the fourth layer and the fifth layer;
performing 1x1 convolution operation and downsampling operation on the output third-layer prediction feature map and the output fourth-layer prediction feature map C3 and C4 respectively;
cutting the down-sampled feature map by taking the center as a reference to realize the size consistency with the predicted feature map output by the fifth layer to obtain cut feature maps M3 and M4;
and (5) overlapping the cut feature maps M3 and M4 and the predicted feature map C5 output by the fifth layer by using a multi-head attention mechanism, and realizing feature fusion of different scales.
In a further technical scheme, the step of superposing the cut feature maps M3 and M4 and the predicted feature map C5 output by the fifth layer by using a multi-head attention mechanism specifically comprises the following steps:
respectively flattening the cut feature maps M3 and M4 and a prediction feature map C5 output by the fifth layer and performing linear mapping to obtain a query value corresponding to each layer;
performing convolution, normalization and nonlinear transformation on the cut feature map M4, and increasing the channel dimension through a full connection layer to obtain a key value and an evaluation value;
respectively carrying out multi-head attention operation on M3, M4 and C5 layers, wherein the query value uses a query value corresponding to each layer, and the key value and the evaluation value use the key value and the evaluation value obtained after the feature map M4 is processed;
introducing a super-parameter vector, and performing linear interpolation of spatial dimensions on the output result of the multi-head attention mechanism.
Inputting the fused characteristic diagram into an RPN network model to perform up-channel Cross Correlation operation, and outputting a predicted characteristic diagram confidence coefficient and a target boundary box position;
and sorting the prediction results by adopting a cosine window and a scale change penalty to obtain a final target boundary frame.
According to a further technical scheme, the real-time planning of the optimal path by using the obtained path planning target value specifically comprises the following steps:
generating a trajectory without considering obstacles using an ego-planner;
based on the local obstacle information, adopting a B-spline curve to carry out track optimization;
and (4) judging the track where the dynamics are not feasible by the planner, and activating a refining process.
According to the further technical scheme, the track optimization is carried out by adopting a B-spline based on the local obstacle information, and the method specifically comprises the following steps:
extracting the information of the obstacle collided by the current track, and acquiring the control point Q of the B-spline curve passing through the obstacle i Each control point Q of the line segment in which the collision occurred i An anchor point p on the surface of the obstacle is generated ij And corresponding to a gradient vector pointing in the peripheral direction of the obstacle
Figure BDA0003886961650000041
Where i is the index of the control point and j is the index of the { p, v } pair; according to Q i Distance to jth obstacle
Figure BDA0003886961650000042
The trajectory is iterated continuously away from the obstacle.
In a second aspect of the present disclosure, an unmanned aerial vehicle target tracking system is provided, including:
the initialization module is used for receiving an unlocking instruction and a tracking instruction by the unmanned aerial vehicle, initializing a model hyperparameter and loading a pre-trained unmanned aerial vehicle target tracking model;
the feature extraction and fusion module is used for extracting template region image features and search region image features, and performing multi-feature fusion by using an unmanned aerial vehicle target tracking model to obtain a target boundary frame in a search region image;
the coordinate transformation module is used for converting the obtained position coordinates of the target boundary frame into coordinates under a world coordinate system by using a monocular camera simulation model of the unmanned aerial vehicle, and the coordinates are used as a target value of path planning;
and the path planning module is used for planning the optimal path by using the obtained path planning target value.
In a third aspect of the present disclosure, a medium is provided, on which a program is stored, and the program, when executed by a processor, implements the steps in the above-mentioned unmanned aerial vehicle target tracking method.
In a fourth aspect of the present disclosure, an electronic device is provided, which includes a memory, a processor, and a program stored in the memory and executable on the processor, where the processor implements the steps in the target tracking method for an unmanned aerial vehicle when executing the program.
The above one or more technical solutions have the following beneficial effects:
the unmanned aerial vehicle target tracking algorithm based on the improved SimRPN network multi-layer feature fusion is researched and developed by adopting a twin network frame, and compared with the existing SimRPN network model, the algorithm is tested on an OTB100 data set, the precision of the algorithm is improved by 3.6%, and the NVIDIA Jetson Xavier onboard reaches about 30fps, so that the requirement of the unmanned aerial vehicle for tracking the target in real time can be met. Considering the requirement of fast obstacle avoidance, the method adopts the ego-planner to realize the local planning path of the unmanned aerial vehicle based on the gradient, effectively estimate and calculate the gradient information in real time and finally generate a smooth track conforming to the dynamic constraint. The unmanned aerial vehicle target tracking algorithm based on the improved SimRPN network multi-layer feature fusion and the ego-planer track optimization strategy which are independently researched and developed are effectively combined, and the problems that the unmanned aerial vehicle obstacle avoidance real-time performance is poor and the target tracking precision is low are solved. The unmanned aerial vehicle has important significance in the application fields of follow-up unmanned aerial vehicle tracking and monitoring, city pursuit and fleeing, battlefield close range reconnaissance and the like.
According to the method, the AlexNet model is adopted to replace a ResNet50 model with a residual error structure, so that the parameter quantity of the model is greatly reduced, and the operation rate of the unmanned aerial vehicle equipment is improved; an FPN multi-feature fusion strategy based on shallow feature downsampling replaces an original FPN feature fusion method based on deep feature upsampling, translation invariance of a convolutional neural network is guaranteed, and target detection precision of the unmanned aerial vehicle is improved; a Multi-Head attention mechanism (Multi-Head-attentuation) is introduced to enhance the correlation of different prediction characteristic graphs, so that the model focuses on information of different target sizes more and extracts the information, and the training efficiency is greatly increased; a cosine window punishment mechanism is introduced to reduce background interference and improve the robustness of target detection; and (4) constructing a monocular camera simulation model to realize the coordinate transformation of the prediction target frame. On the basis of realizing accurate target tracking, an ego-planner path planning algorithm is adopted to estimate an optimal path and calculate map gradient information in real time, a smoother track is generated, and finally the unmanned aerial vehicle tracks the target in real time and the tracking accuracy is improved.
Under the environment of rviz and gazebo simulation, the target position of a first frame of unmanned aerial vehicle is manually selected, the pixel coordinate of the upper left corner of the target boundary frame, the width and the height of the target frame are extracted, the target image is captured to serve as a template image, and the feature is extracted according to an autonomously developed target tracking algorithm. And the unmanned aerial vehicle acquires a camera image in real time, predicts a target boundary frame and confidence coefficient of the next frame, and performs coordinate transformation on predicted target coordinate information to serve as a terminal value of the ego-planer trajectory optimizer. And finally, the unmanned aerial vehicle plans an optimal path in real time to meet the requirement of rapidly avoiding obstacles.
Advantages of additional aspects of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.
Drawings
The accompanying drawings, which are included to provide a further understanding of the disclosure, illustrate embodiments of the disclosure and together with the description serve to explain the disclosure and are not to limit the disclosure.
Fig. 1 is a schematic flow chart of an implementation of the target tracking method of the unmanned aerial vehicle of the present invention;
FIG. 2 is a flow chart of the unmanned aerial vehicle target tracking algorithm of the present invention;
FIG. 3 is a flow chart of the overlay process of the present invention;
FIG. 4 is a diagram of the effect of tracking a target on the rviz-based simulation platform according to the present invention;
FIG. 5 is a diagram illustrating the effect of tracking a target on a gazebo simulation platform according to the present invention.
Detailed Description
It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the disclosure. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure belongs.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments according to the present disclosure. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise.
The embodiments and features of the embodiments in the present disclosure may be combined with each other without conflict.
The method is based on a SiamRPN network, and adopts an AlexNet model to replace a ResNet50 model with a residual error structure, so that the operation rate of the unmanned aerial vehicle equipment is improved; an FPN multi-feature fusion strategy based on shallow feature downsampling replaces an original FPN method of deep feature upsampling, so that the calculation amount of equipment is reduced, the translation invariance of a convolutional neural network is guaranteed, and the target detection precision is improved; a Multi-Head attention mechanism (Multi-Head-attentuation) is introduced to enhance the correlation of different prediction layers, so that the model focuses on information of different target sizes and extracts the information, and the training efficiency is greatly increased; a cosine window punishment mechanism is introduced to reduce background interference and improve the robustness of target detection; and constructing a monocular camera simulation model to realize the coordinate transformation of the predicted target frame. On the basis of realizing accurate target tracking, an ego-planner path planning algorithm is adopted to estimate an optimal path and calculate map gradient information in real time, a smoother track is generated, and finally the target tracking of the unmanned aerial vehicle is realized.
Example one
The embodiment discloses an unmanned aerial vehicle target tracking method, which adopts the following technical scheme:
the unmanned aerial vehicle receives an unlocking instruction and a tracking instruction, initializes model hyper-parameters, and loads a pre-trained unmanned aerial vehicle target tracking model;
extracting template area image features and search area image features by using an unmanned aerial vehicle target tracking model, and performing multi-feature fusion to obtain a target boundary frame in a search area image;
converting the obtained position coordinates of the target bounding box into coordinates under a world coordinate system by using an unmanned aerial vehicle monocular camera simulation model, and taking the coordinates as a target value of path planning;
and planning the optimal path in real time by using the obtained path planning target value.
As shown in fig. 1, first, the drone waits to unlock and initialize the relevant parameters of the drone system, including: initializing relevant parameters of a monocular camera and a laser radar; loading a target tracking pre-training model; loading the rviz and gazebo simulation environment and the unmanned plane monocular camera simulation model. When the pre-training model parameters are initialized, an initialization image with the pixel of 0 is input at the same time. And finally waiting for an unmanned aerial vehicle unlocking instruction.
And after unlocking, the unmanned aerial vehicle carries out a target tracking task. The specific algorithm flow chart is shown in fig. 2, and comprises the following steps: (1) judging whether a target frame is manually selected or not; (2) extracting the image characteristics of the template region by the model; (3) acquiring image data in real time by a monocular camera; (4) extracting image characteristics of a search area by using the model; (5) performing multi-feature fusion on the FPN network; (6) predicting the confidence coefficient of the feature map and the position of a target frame by the RPN network; (7) sorting the proposal by using a cosine window and a scale change penalty, and selecting an anchor frame with the highest confidence coefficient to obtain a final target boundary frame.
On the basis of SiamRPN research, the present embodiment uses an AlexNet network as a feature extraction network.
By analyzing constraint conditions affecting the target Detection precision and speed, it is found that the accuracy of network prediction is reduced by a deep network, and the real-time performance of target tracking of the unmanned aerial vehicle system is reduced to a greater extent, so in the embodiment, alexNet is used as the feature extraction network N1, and feature extraction is respectively performed on the search Region image (Detection Region Frame) and the Template Region image (Template Region Frame) to serve as information sources of multi-feature fusion.
In this embodiment, in order to improve the generalization capability of the network model, a spatial perception sampling strategy is introduced to implement data enhancement.
In this embodiment, a GOT-10k data set is used as a training set, and a spatial perceptual sampling strategy is introduced, that is, a target position is shifted near a center point of a training sample in an evenly distributed sampling manner. Based on the SimRPN network structure, the training samples obtained by the sampling strategy can ensure the translation invariance of the convolutional neural network, so that the target detection precision is improved.
In the embodiment, an independently developed FPN characteristic pyramid network structure is introduced by analyzing constraint conditions influencing target detection precision. And establishing an improved FPN characteristic pyramid structure by considering a method for efficiently improving the target detection precision. The structure outputs the characteristic diagram containing strong semantic information, meanwhile, the calculated amount is not greatly increased, and therefore the detection precision is improved, and the requirement of real-time target tracking of the unmanned aerial vehicle is met. The method comprises the following specific steps:
(1) the feature extraction network N1 outputs the predicted feature maps of the third, fourth and fifth layers;
(2) respectively carrying out 1x1 convolution operation on the multiple characteristic graphs;
(3) performing down-sampling operation on the feature maps output by the third convolution layer and the fourth convolution layer;
(4) cutting feature maps of different scales by taking the center as a reference to realize the size consistency with the feature map output by the last layer;
(5) and (4) performing superposition processing on the multiple feature maps to realize feature fusion of different scales.
In the process of feature map overlay processing, in order to enhance the correlation between feature maps, the invention designs a feature pyramid fusion network based on multi-head attention, as shown in fig. 3. Most 'attention mechanism + feature pyramid' fusion algorithms adopt the coordinate operation of dimension level and refer to an SE-Net module to perform feature fusion among channels, and the operation amount of equipment is greatly increased. The method adopts 3 encoder structures based on the transform model, reduces the operation cost and enlarges the range of correlation between the characteristic layers. In addition, most algorithms are only directed at the correlation between a certain specific feature layer and other feature layers, neglecting the correlation between the feature layers and not beneficial to detecting the target of fine-grained features. Based on the method, the M3, the M5 and the M4 are respectively subjected to related operation, so that the fusion of the low-layer fine-grained characteristic and the high-layer semantic layer is greatly enhanced. The specific implementation is as follows:
(1) flattening different prediction characteristic graphs and performing linear mapping to obtain corresponding query values Q (query);
(2) performing convolution, normalization and nonlinear transformation on the characteristic diagram M4, and increasing the channel dimension through a full connection layer to finally obtain a key value K (key) and an evaluation value V (value);
(3) respectively carrying out multi-head attention operation on M3, M4 and C5 layers, wherein the query value uses a query value corresponding to each layer, and the key value and the evaluation value use the key value and the evaluation value obtained after processing the characteristic diagram M4;
the attention function MultiHead () is used to operate on Q, K, V, which can be expressed as MultiHead (Q, K, V) = Concat (H) 1 ,H 2 ...,H N )W O
Figure BDA0003886961650000091
Wherein C is a normalization parameter,
Figure BDA0003886961650000092
Figure BDA0003886961650000093
W o ∈R C×C represents the linear transformation parameter, N is the number of attention heads (attention head), d head Dimension representing each attention head equal to
Figure BDA0003886961650000094
(4) Taking into account the correlation between layers, a hyperparametric vector is introduced
Figure BDA0003886961650000095
Will output the result
Figure BDA0003886961650000096
Linear interpolation of spatial dimensions is performed, and the specific formula is as follows:
Figure BDA0003886961650000097
preferably, when
Figure BDA0003886961650000098
The final precision plot (precision plot) achieved the best on the OTB100 dataset.
The network structure finally inputs the fused characteristic diagram into an RPN network model to perform up-channel Cross Correlation operation. In the process of training a data set, a hyper-parameter training verification model such as iteration times, a learning rate, a loss function and the like is reasonably selected and stored.
In a preferred embodiment, in the process of training a data set, since the loss may be severely oscillated due to the random initialization of the weights of the training initial model, a StepScheduler warm-up learning rate adjustment strategy is selected, which is specifically implemented as follows: the learning rate is gradually increased by a step gradient strategy in 1-10 epochs, and is gradually reduced by a log curve by introducing a LogSchedule learning strategy in 11-100 epochs.
As the positive and negative samples are calibrated by setting the prior orbit, the condition of sample classification imbalance is likely to occur, and the model is biased to optimize the negative samples in the training process, therefore, penalty weight is introduced to the positive and negative samples in the loss function, so that the loss function dominates and optimizes the positive samples, and the training efficiency of the model is improved. In addition, the classification regression adopts a cross entropy loss function F.NLLloss to realize the logistic regression, and the position branch adopts smooth _ L1_ loss to realize the position regression.
In the training process, the UP-channel is easier to converge than the DW, the UP adopted by the final evaluation result is equivalent to the DW running speed, and the precision is improved, so that the network structure adopted by the embodiment has the advantage of processing speed and advancement.
In this embodiment, a pre-training model published by HonglinChu is adopted, an improved FPN feature pyramid network is introduced, and finally, evaluation is performed on an OTB100 data set and a VOT2016 data set, and the evaluation results are shown in table 1.
Table 1 evaluation of pre-trained models on OTB100 and VOT2016 datasets
Figure BDA0003886961650000101
It can be seen from the above experimental data that, in the embodiment, the Up-Channel Cross Correlation convolution Layer is adopted, the number of frames per second FPS of transmission of the Up-Channel Cross Correlation convolution Layer is above 120FPS, and compared with the Depth-separable convolution Layer Depth-wise Cross Correlation Layer, the embodiment is higher than the siamrpnp model in each index. The concrete expression is as follows:
(1) The success rate is improved by 0.016 and the precision is improved by 0.022 by evaluating on an OTB100 data set;
(2) The accuracy is improved by 0.046 and the robustness is improved by 0.065 by evaluation on the VOT2016 data set.
Compared with the same backbone network model, the embodiment introduces the improved FPN network structure, so that the FPS index is not obviously reduced and the precision index reaches the highest. The concrete expression is as follows:
(1) The success rate is improved by 0.036 compared with the lowest index and is improved by 0.02 compared with the highest index by evaluation on an OTB100 data set; the precision is improved by 0.043 compared with the lowest index and is improved by 0.036 compared with the highest index;
(2) And the accuracy is improved by 0.02 compared with the lowest index and is improved by 0.016 compared with the highest index by evaluation on the VOT2016 data set.
In addition, because the embodiment is applied to the unmanned aerial vehicle and can only be used for an NVIDIA Jetson Xavier onboard operation algorithm, the SiamRPN model based on the current yolov3/yolov5 and backbone as ResNet-50 or ResNet-101 can not realize normal operation due to onboard computing force limitation, and the network structure adopted by the embodiment can realize normal operation on the NVIDIA Jetson Xavier onboard.
In conclusion, the independently researched and developed FPN network structure is introduced, the target tracking speed meets the requirement of real-time tracking, and the target detection precision is optimal, so that the method is more advanced compared with the existing model.
In order to obtain a target value of path planning, the embodiment establishes a monocular camera simulation model of the unmanned aerial vehicle based on rviz and gazebo simulation environments. The method comprises the following specific steps:
(1) acquiring position information of an image target frame, wherein the position information comprises X and Y coordinates of the upper left corner of a target boundary frame, height and width, and the coordinates are coordinates under a pixel coordinate system;
(2) based on a ros robot operating system and a prometheus unmanned aerial vehicle open source project, coordinate conversion of different coordinate systems is realized, wherein the coordinate systems comprise a world coordinate system, a camera coordinate system, an image coordinate system and a pixel coordinate system;
the conversion relationship of the pixel coordinate system (u, v) to the image coordinate system (x, y) is shown by the following formula:
Figure BDA0003886961650000121
wherein, the pixel width dx and the pixel height dy respectively represent the actual physical dimension (unit: mm) of each pixel in the horizontal u and vertical v directions, namely the actual size of each photosensitive chip; the expression of the center of the image coordinate system in the pixel coordinate system is (u) o ,v o ) Drawing (1) ofThe projection point p (x, y) on the image coordinate system is expressed in the pixel coordinate system as:
Figure BDA0003886961650000122
the conversion is a matrix as in equation (1).
Image coordinate system (X, y) and camera coordinate system (X) c ,Y c ,Z c ) The conversion relationship of (a) is shown by the following formula:
Figure BDA0003886961650000123
wherein,
Figure BDA0003886961650000124
f is the effective focal length (distance of the optical center to the image plane), the coefficient Z c Represents a scale factor, the smaller the value, the same X C 、Y C The larger the corresponding x, y below.
Camera coordinate system (X) c ,Y c ,Z c ) To the world coordinate system (X) w ,Y w ,Z w ) The conversion of (a) is shown by the following equation:
Figure BDA0003886961650000125
wherein,
Figure BDA0003886961650000126
represents a 0 vector, T is a translation vector of 3X1, and R is a rotation matrix of 3X 3. The pixel coordinate system (u, v) to the world coordinate system (X) can be obtained from the formulas (1) - (3) w ,Y w ,Z w ) Is the formula (4):
Figure BDA0003886961650000127
(3) according to the formula (4), the solvapnp and solvanpransac functions provided by OpenCV are applied to solve the rotation matrix R and the translation vector T of the camera relative to the three-dimensional space coordinate system of the known object.
After the camera model external parameters are obtained, the unmanned aerial vehicle can subscribe topics of image messages in real time and convert the position coordinates of the target boundary frame predicted by the feature extraction network into coordinates under a world coordinate system. And the target value is used as a target value of path planning, so that the effective combination of target tracking and dynamic obstacle avoidance of the unmanned aerial vehicle is realized.
In order to quickly seek the optimal track, the embodiment uses an ego-planner to build a local ESDF map, and the map building time is shortened.
The ego-planner track optimization strategy is as follows: in the embodiment, a ros robot operating system is selected, the siamrpn _ tracker node receives the topic of the camera image message, coordinate transformation is performed by applying a formula (4) according to the obtained predicted target frame coordinate, and finally the topic of the target point coordinate message is output to the planner node. And after subscribing the coordinate topic by the uav1_ ego _ planer _ node, taking the subscribed coordinate topic as an endpoint value of the unmanned plane path planning, and generating a non-optimized B-spline curve without considering the obstacle. And then, the node subscribes to the local obstacle message in real time and carries out gradient-based B spline curve optimization. And finally, the uav1_ traj _ server node sends the topic of the optimized track coordinate message to the ego _ traj _ to _ cmd _ uav _1 node as a control instruction, so that the unmanned aerial vehicle can accurately plan a path to reach a target position. The final simulation examples are shown in fig. 4 and 5.
Based on an ego-planner path planning strategy, the embodiment adopts a rear-end unconstrained local optimization idea of a gradient and adopts a B-spline curve to perform track optimization, so that the dynamic feasibility and the accuracy of a dynamically infeasible track before fitting are balanced. The method comprises the following specific steps:
(1) the planner generates trajectories without considering obstacles;
(2) performing track optimization based on the local obstacle information;
the specific operation is as follows: extracting the information of the obstacle collided with by the current track, and acquiring the control point Q of the b-spline curve passing through the obstacle i Each control point Q of the colliding line segment i An anchor point (anchor po) on the surface of the obstacle is generatedint)p ij And corresponding to a gradient vector pointing in the peripheral direction of the obstacle
Figure BDA0003886961650000131
Where i is the index of the control point and j is the index of the { p, v } pair; the distance from Qi to the jth obstacle is:
Figure BDA0003886961650000132
finally, the planner keeps iterating the trajectory away from the obstacle according to formula (5);
(3) in order to obtain a curve conforming to the constraint of dynamics, a planner judges a track where the dynamics are infeasible and activates a thinning process; the accuracy punishments of the fitting of the track in the axial direction and the radial direction are different, so that the robustness of the planner model is improved.
The embodiment is characterized in that: an unmanned aerial vehicle target tracking algorithm based on improved SimRPN network multi-layer feature fusion is developed by adopting a deep learning algorithm, and the unmanned aerial vehicle can predict a target boundary box more accurately and quickly. Considering the requirements of high precision and rapid obstacle avoidance, an ego-planner is adopted to realize the local planning path of the unmanned aerial vehicle based on the gradient, the gradient information is effectively estimated and calculated in real time, and finally a smooth track conforming to the dynamic constraint is generated. According to the embodiment, an unmanned aerial vehicle target tracking algorithm which is independently researched and developed and based on the improved SimRPN network multi-layer feature fusion is effectively combined with an ego-planner track optimization strategy, and the problems of poor real-time performance and low obstacle avoidance accuracy of object tracking are solved. The unmanned aerial vehicle has important significance in the application fields of follow-up unmanned aerial vehicle tracking and monitoring, city pursuit and fleeing, battlefield close range reconnaissance and the like.
Example two
The embodiment discloses unmanned aerial vehicle target tracking system includes:
the initialization module is used for receiving an unlocking instruction and a tracking instruction by the unmanned aerial vehicle, initializing a model hyper-parameter and loading a pre-trained unmanned aerial vehicle target tracking model;
the characteristic extraction and fusion module is used for extracting template area image characteristics and search area image characteristics, and performing multi-characteristic fusion by using an unmanned aerial vehicle target tracking model to obtain a target boundary frame in a search area image;
the coordinate transformation module is used for converting the obtained position coordinates of the target boundary frame into coordinates under a world coordinate system by using a monocular camera simulation model of the unmanned aerial vehicle, and the coordinates are used as a target value of path planning;
and the path planning module is used for planning the optimal path by using the obtained path planning target value.
It should be noted that the initialization module, the feature extraction and fusion module, the coordinate transformation module, and the path planning module correspond to the steps in the first embodiment, and the modules are the same as the corresponding steps in the implementation example and application scenarios, but are not limited to the disclosure in the first embodiment. It should be noted that the modules described above as part of a system may be implemented in a computer system such as a set of computer executable instructions.
In the foregoing embodiments, the descriptions of the embodiments have different emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
The proposed system can be implemented in other ways. For example, the above-described system embodiments are merely illustrative, and for example, the division of the above-described modules is merely a logical functional division, and in actual implementation, there may be another division, for example, a plurality of modules may be combined or may be integrated into another system, or some features may be omitted, or not executed.
EXAMPLE III
The present embodiment provides a medium, on which a program is stored, which when executed by a processor implements the steps in the drone target tracking method described above.
Example four
The embodiment provides an electronic device, which includes a memory, a processor, and a program stored in the memory and executable on the processor, and when the processor executes the program, the steps in the unmanned aerial vehicle target tracking method are implemented.
It will be understood by those skilled in the art that the modules or steps of the present disclosure described above may be implemented by a general purpose computer device, and alternatively, they may be implemented by program code executable by a computing device, such that they may be stored in a storage device and executed by the computing device, or they may be separately fabricated into individual integrated circuit modules, or multiple modules or steps thereof may be fabricated into a single integrated circuit module. The present disclosure is not limited to any specific combination of hardware and software.
The above description is only a preferred embodiment of the present disclosure and is not intended to limit the present disclosure, and various modifications and changes may be made to the present disclosure by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present disclosure should be included in the protection scope of the present disclosure.
Although the embodiments of the present disclosure have been described with reference to the accompanying drawings, it is not intended to limit the scope of the present disclosure, and it should be understood by those skilled in the art that various modifications and variations can be made without inventive changes in the technical solutions of the present disclosure.

Claims (10)

1. An unmanned aerial vehicle target tracking method is characterized by comprising the following steps:
the unmanned aerial vehicle receives an unlocking instruction and a tracking instruction, initializes model hyper-parameters, and loads a pre-trained unmanned aerial vehicle target tracking model;
extracting template area image features and search area image features by using an unmanned aerial vehicle target tracking model, and performing multi-feature fusion to obtain a target boundary frame in a search area image;
converting the obtained position coordinates of the target boundary frame into coordinates in a world coordinate system by using an unmanned aerial vehicle monocular camera simulation model, and taking the coordinates as a target value of path planning;
and planning the optimal path in real time by using the obtained path planning target value.
2. The unmanned aerial vehicle target tracking method of claim 1, wherein the unmanned aerial vehicle target tracking model adopts AlexNet as a feature extraction network;
as an alternative embodiment, a spatial perception sampling strategy is introduced, and the target position is shifted near the center point of the training sample in a uniformly distributed sampling mode.
3. The unmanned aerial vehicle target tracking method of claim 2, wherein a multi-head attention mechanism-based FPN feature pyramid structure is established based on an AlexNet feature extraction network to realize multi-feature fusion;
the method specifically comprises the following steps:
the feature extraction network outputs predicted feature maps C3, C4 and C5 of the third layer, the fourth layer and the fifth layer;
performing 1x1 convolution operation and downsampling operation on the output third-layer prediction feature map and the output fourth-layer prediction feature map C3 and C4 respectively;
cutting the down-sampled feature map by taking the center as a reference to realize the size consistency with the predicted feature map output by the fifth layer, and obtaining cut feature maps M3 and M4;
and (3) overlapping the cut feature maps M3 and M4 and the predicted feature map C5 output by the fifth layer by using a multi-head attention mechanism to realize feature fusion of different scales.
4. The unmanned aerial vehicle target tracking method according to claim 3, wherein the clipped feature maps M3 and M4 are superimposed with a predicted feature map C5 output by a fifth layer by using a multi-head attention mechanism, specifically:
respectively flattening the cut feature maps M3 and M4 and a prediction feature map C5 output by the fifth layer and performing linear mapping to obtain a query value corresponding to each layer;
performing convolution, normalization and nonlinear transformation on the cut characteristic graph M4, and increasing the channel dimension through a full connection layer to obtain a key value and an evaluation value;
respectively carrying out multi-head attention operation on M3, M4 and C5 layers, wherein the query value uses a query value corresponding to each layer, and the key value and the evaluation value use the key value and the evaluation value obtained after the feature map M4 is processed;
introducing a super-parameter vector, and performing linear interpolation of spatial dimensions on the output result of the multi-head attention mechanism.
5. The unmanned aerial vehicle target tracking method of claim 3, wherein the fused feature map is input into an RPN network to perform up-channel Cross Correlation operation, and the confidence of the predicted feature map and the position of a target bounding box are output;
and sorting the prediction results by adopting a cosine window and a scale change penalty to obtain a final target bounding box.
6. The method for tracking the unmanned aerial vehicle target according to claim 1, wherein the step of planning the optimal path in real time by using the obtained path planning target value specifically comprises:
generating a trajectory without considering obstacles using an ego-planner;
based on the local obstacle information, adopting a B-spline curve to carry out track optimization;
and (4) judging the track with infeasible dynamics by the planner and activating a refining process.
7. The unmanned aerial vehicle target tracking method of claim 6, wherein the trajectory optimization is performed by using a B-spline curve based on the local obstacle information, and specifically comprises:
extracting the information of the obstacle collided by the current track, and acquiring the control point Q of the B-spline curve passing through the obstacle i Each control point Q of the colliding line segment i An anchor point p on the surface of the obstacle is generated ij And corresponding to a gradient vector pointing in the peripheral direction of the obstacle
Figure FDA0003886961640000031
Where i is the index of the control point,j is the index of the { p, v } pair; according to Q i Distance to jth obstacle
Figure FDA0003886961640000032
The trajectory is iterated continuously away from the obstacle.
8. An unmanned aerial vehicle target tracking system, characterized by includes:
the initialization module is used for receiving an unlocking instruction and a tracking instruction by the unmanned aerial vehicle, initializing a model hyper-parameter and loading a pre-trained unmanned aerial vehicle target tracking model;
the characteristic extraction and fusion module is used for extracting template area image characteristics and search area image characteristics, and performing multi-characteristic fusion by using an unmanned aerial vehicle target tracking model to obtain a target boundary frame in a search area image;
the coordinate conversion module is used for converting the obtained position coordinates of the target boundary frame into coordinates under a world coordinate system by using an unmanned aerial vehicle monocular camera simulation model, and the coordinates are used as a target value of path planning;
and the path planning module is used for implementing and planning the optimal path by using the obtained path planning target value.
9. A medium having a program stored thereon, wherein the program, when executed by a processor, performs the steps of a method for drone target tracking according to any one of claims 1-7.
10. An electronic device comprising a memory, a processor and a program stored on the memory and executable on the processor, wherein the processor when executing the program performs the steps of a drone target tracking method according to any one of claims 1-7.
CN202211246719.4A 2022-10-12 2022-10-12 Unmanned aerial vehicle target tracking method, system, medium and electronic equipment Pending CN115578416A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211246719.4A CN115578416A (en) 2022-10-12 2022-10-12 Unmanned aerial vehicle target tracking method, system, medium and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211246719.4A CN115578416A (en) 2022-10-12 2022-10-12 Unmanned aerial vehicle target tracking method, system, medium and electronic equipment

Publications (1)

Publication Number Publication Date
CN115578416A true CN115578416A (en) 2023-01-06

Family

ID=84585879

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211246719.4A Pending CN115578416A (en) 2022-10-12 2022-10-12 Unmanned aerial vehicle target tracking method, system, medium and electronic equipment

Country Status (1)

Country Link
CN (1) CN115578416A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116309727A (en) * 2023-05-26 2023-06-23 中南大学 Unmanned aerial vehicle target tracking method and simulation system based on deep learning algorithm
CN117079196A (en) * 2023-10-16 2023-11-17 长沙北斗产业安全技术研究院股份有限公司 Unmanned aerial vehicle identification method based on deep learning and target motion trail
CN118210321A (en) * 2024-05-21 2024-06-18 鹰驾科技(深圳)有限公司 Unmanned aerial vehicle pedestrian tracking system based on 360-degree looking around camera

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116309727A (en) * 2023-05-26 2023-06-23 中南大学 Unmanned aerial vehicle target tracking method and simulation system based on deep learning algorithm
CN117079196A (en) * 2023-10-16 2023-11-17 长沙北斗产业安全技术研究院股份有限公司 Unmanned aerial vehicle identification method based on deep learning and target motion trail
CN117079196B (en) * 2023-10-16 2023-12-29 长沙北斗产业安全技术研究院股份有限公司 Unmanned aerial vehicle identification method based on deep learning and target motion trail
CN118210321A (en) * 2024-05-21 2024-06-18 鹰驾科技(深圳)有限公司 Unmanned aerial vehicle pedestrian tracking system based on 360-degree looking around camera
CN118210321B (en) * 2024-05-21 2024-07-26 鹰驾科技(深圳)有限公司 Unmanned aerial vehicle pedestrian tracking system based on 360-degree looking around camera

Similar Documents

Publication Publication Date Title
Huang et al. Autonomous driving with deep learning: A survey of state-of-art technologies
CN111201451B (en) Method and device for detecting object in scene based on laser data and radar data of scene
CN115578416A (en) Unmanned aerial vehicle target tracking method, system, medium and electronic equipment
EP3690744B1 (en) Method for integrating driving images acquired from vehicles performing cooperative driving and driving image integrating device using same
CN116385761A (en) 3D target detection method integrating RGB and infrared information
CN116210030A (en) Semi-supervision key point-based model
Xian et al. Location-guided lidar-based panoptic segmentation for autonomous driving
Saleem et al. Neural network-based recent research developments in SLAM for autonomous ground vehicles: A review
Liu et al. Autonomous lane keeping system: Lane detection, tracking and control on embedded system
Sun et al. RobNet: real-time road-object 3D point cloud segmentation based on SqueezeNet and cyclic CRF
Li et al. Multi-modal neural feature fusion for automatic driving through perception-aware path planning
Zuo et al. LGADet: Light-weight anchor-free multispectral pedestrian detection with mixed local and global attention
Wang et al. Sparse u-pdp: A unified multi-task framework for panoptic driving perception
CN118306419A (en) End-to-end automatic driving method, system, equipment and medium
CN118038396A (en) Three-dimensional perception method based on millimeter wave radar and camera aerial view fusion
CN118096819A (en) Unmanned aerial vehicle image target tracking method based on space-time combination
Liu et al. Moving vehicle tracking and scene understanding: A hybrid approach
Luo et al. IDS-MODEL: An efficient multi-task model of road scene instance and drivable area segmentation for autonomous driving
CN118072196A (en) YOLOv 5-improvement-based remote beach obstacle target identification method and system
Li et al. Monocular 3-D Object Detection Based on Depth-Guided Local Convolution for Smart Payment in D2D Systems
CN117576149A (en) Single-target tracking method based on attention mechanism
CN117058641A (en) Panoramic driving perception method based on deep learning
Liu et al. Real-time monocular depth estimation for low-power embedded systems using deep learning
Afif et al. Indoor objects detection system implementation using multi-graphic processing units
Li et al. Enhancing Real-time Target Detection in Smart Cities: YOLOv8-DSAF Insights

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination