CN114627156A

CN114627156A - Consumption-level unmanned aerial vehicle video moving target accurate tracking method

Info

Publication number: CN114627156A
Application number: CN202210296764.4A
Authority: CN
Inventors: 赵长贵
Original assignee: Individual
Current assignee: Individual
Priority date: 2022-03-25
Filing date: 2022-03-25
Publication date: 2022-06-14

Abstract

This application improves in deformation, sheltering from and the three aspect of yardstick towards unmanned aerial vehicle motion platform on the basis based on relevance locking tracker: firstly, an association locking tracking algorithm based on decision perception is provided, a tracking framework based on decision perception is adopted, fusion is carried out on a decision layer from coarse to fine through feature association selection, feature decision perception and decision perception weight calculation, multiple features are used for describing a target, and tracking drift of the target caused by deformation caused by various factors is reduced; solving the problem of tracking drift and even tracking failure generated after a target is shielded, providing an association locking tracking method based on difference superposition detection, quickly locking a moving target, tracking the association locking target based on the difference superposition detection, and providing a two-dimensional target robustness scale estimation method; according to the method and the device, the tracking accuracy and the tracking success rate are obviously improved, the target scale can be estimated more accurately, and the tracking precision and performance are improved.

Description

Consumption-level unmanned aerial vehicle video moving target accurate tracking method

Technical Field

The application relates to a method for tracking a video moving target of an unmanned aerial vehicle, in particular to a method for accurately tracking a video moving target of a consumer-grade unmanned aerial vehicle, and belongs to the technical field of unmanned aerial vehicle video target tracking.

Background

The target tracking is to analyze a series of continuous video images acquired by a visual sensor such as a camera and acquire information such as position, size, motion state and the like of a specific target or a plurality of targets, and the visual tracking is to predict the state behind the target by using an initialized marked target frame in an image sequence.

The development and application of the target tracking algorithm are greatly promoted by the proposal and application of intelligent transportation and unmanned vehicles and the high-speed development of unmanned vehicles. In the tracking process, due to some external factors, such as illumination change, shooting view angle change, target shielding, and the like, the main challenges in current tracking are: the target is deformed due to self-motion or external factors, is shielded by other objects in motion, changes of scene illumination, motion blur generated by the motion of the target and the like. Because the object and scene in the real world have variability, the prior art has no reliable method for tracking the target in any scene. Therefore, one of the current research and development and application hotspots is the target tracking for the unmanned aerial vehicle video monitoring platform, on one hand, the unmanned aerial vehicle technology is developed at a high speed at present, the application of the unmanned aerial vehicle is more and more extensive, the unmanned aerial vehicle gradually permeates various scenes of production and life, and the target tracking based on the platform has urgent market demand; on the other hand, when the unmanned aerial vehicle moves, the target is easy to deform and has severe scale change in a short time, which is also a difficulty of target tracking, so that the tracking algorithm under the application background can be popularized to other application scenes.

In recent years, the performance of computers is continuously improved, and basic platform support is provided for the development of the field of visual tracking. The continuous emergence of new methods and the cross reference of other subject methods promote the continuous forward development of visual tracking and emerge a plurality of development results.

Prior art target appearance model: the method comprises a target appearance description method and a model learning method, wherein the appearance description method solves the problem of designing a descriptor with strong robustness and discrimination, and the model learning method solves the problem of learning an appearance model of a target from an existing sample in which mode is adopted. In the image processing, features are extracted to describe an image, and the features comprise color features, intensity information, texture characteristics, contour information, edge gradients and the like, and some more complex features with unchanged scales. The method comprises the steps of extracting features from a local area of an image, reflecting attribute changes such as image intensity and color, wherein some features can overcome adverse effects of illumination change, posture rotation and the like on a target. Under the framework based on the correlation filter, the fusion of multiple convolutional layers is still an open problem, and the tracking based on the fusion of multiple convolutional layers by the correlation filter adopts too many parameters, which easily causes an overfitting phenomenon.

For the design of the object appearance model, one is only focusing on the object itself, and the other is modeling analysis on both the object and the background, since the appearance of the tracked object may be very different in a section of video, it is not necessarily unreliable to estimate the model from only the first frame image, and use this single model to locate the object in the rest of the images. Generally, good tracking algorithms adaptively update the model by using useful information in the image, for example, updating the model by using the tracking result of each frame as training data. In online tracking, the current tracking result is often taken as a positive sample (sometimes also extended to very nearby locations as positive samples), the surrounding area of the location is taken as a negative sample, and then adaptive appearance modeling is performed. Model learning methods generally fall into two broad categories, generative models and discriminative models.

The prior art object motion model: the method is used for estimating the position of a target which can appear, and the method is based on that an object continuously moves in a visual field and the speed does not generate mutation, and obtains the motion track of the target according to the position information of the target in a current frame and a previous image frame, thereby estimating the position information of the target which can appear in the next frame. Common algorithms are: optical flow method, kalman locking method, particle locking, median drift method.

Object model update of the prior art: the method comprises the steps of distinguishing a target from a background by apparent modeling of the target, obtaining the position of the target in the next frame of image by motion estimation of the target, and correlating model updating and algorithm on the information of problems such as target deformation, shielding and the like, wherein a direct method is to replace a template, adopt a matching idea, update the existing model by taking a tracking result as an observation sample after obtaining the target of the current image, and evaluate the reliability of the template by adopting the method, otherwise, the tracking drifts once the target is shielded; the learning method based on the increment subspace uses the latest tracking result to construct a target model, template updating methods based on template replacement and increment subspace learning are based on a generative model, a discriminant model considers target model updating as a classification problem, and the core is to design a classifier. The model updating methods all face the same problem, the reliability of the sample is difficult to determine, and unreliable training samples can calculate background noise into the target appearance model, so that pollution is caused, and the model is degraded.

To sum up, the unmanned aerial vehicle video target tracking of prior art still has the problem, and the difficult point of this application and the problem of waiting to solve mainly focus on following aspect:

firstly, in the process of tracking a target of a consumption-level unmanned aerial vehicle, some external factors increase the tracking difficulty, including illumination change, target deformation, motion blur, shielding and the like, and because of the variability of objects and scenes in the real world, no reliable method can solve the target tracking in any scene in the prior art, and the problem of difficult points in the application scene of robust tracking of an unmanned aerial vehicle motion platform is obvious; compared with the traditional pan-tilt-zoom-; limited prior information about the target in the tracking process, and a tracking model has to have robustness to the appearance change of the object, the former easily causes the model to be too complex, and the test error is too large while the pursuit of small training error is achieved; in the latter, because the target is not invariable in the tracking process, the target is changed by factors such as illumination, shielding, motion blur, rotation and the like, and the prior art lacks an effective solution for the target deformation in the tracking process;

secondly, in the prior art, a cyclic matrix is adopted to realize dense sampling, but in a tracking environment facing a consumer-grade unmanned aerial vehicle, a target may be deformed or even blurred in motion, so that after the cyclic matrix is constructed, tracking errors occur, and drift is caused; the uncertainty and complexity of a consumer-grade unmanned aerial vehicle scene enable target tracking to face a plurality of challenges, and the difficult problems possibly encountered in the tracking process are not beneficial to forming stable and reliable visual tracking, wherein the shielding problem is a difficult problem in the unmanned aerial vehicle target tracking, firstly, the reasons for shielding are complex and various, no priori knowledge is provided, the shielding time and degree are unpredictable, the influence of partial shielding and total shielding on a target appearance model is different, the target appearance is seriously polluted by the total shielding, and once the target is shielded, the target appearance and the motion estimation model are directly influenced and updated; secondly, the occlusion itself is difficult to identify, and human eyes can easily judge that the target is occluded, but this situation is a big problem for machine learning, and if the occluded region cannot be identified, once the occluded sample is applied to the sample update, the target appearance model will be affected and degraded or even polluted, and further tracking drift may be caused, and how to reduce the influence of the occlusion on the sample update and even the whole tracking process is an urgent problem to be solved.

Thirdly, the prior art adopts a circulant matrix to solve the problem of large calculation amount, and utilizes fast fourier transform to reduce the calculation amount for training and detection, so that the method is particularly suitable for tracking, but has some disadvantages: firstly, an input image is subjected to cyclic shift, a cyclic sliding window is adopted on a training set to train a classifier, however, due to the periodic assumption, training images have period overlapping, due to discontinuity of sample edges, an obvious dividing line is generated after cyclic shift, and the dividing line is represented as high-frequency noise on a frequency domain, so that adverse effects are caused on frequency domain processing; secondly, the target response in the filter training stage is independent of the observed image, and when the system is in scenes such as rapid motion, shielding, motion blur and the like, the cyclic motion and the actual translation of the target do not correspond to each other one by one; the samples used for training the tracking filter are not artificially labeled, but the tracking result of the previous frame is adopted, the tracking result of the previous frame is based on the mark of the algorithm, the factors are all that the image with the deviation is used as the positive sample to be learned and trained, the target detection part of the tracker is inaccurate, the adverse effect is caused on the target position information of the image of the next frame, the parameter updating of the filter is influenced, and finally the potential problem of drift in tracking is solved; when the target is half-shielded or even completely shielded, the appearance model of the target loses the function, and the tracking failure is directly caused; the situation that the motion blur causes target distortion seriously damages training samples, so that the appearance model of the target loses distinguishing and distinguishing capacity, and the tracking accuracy is reduced.

Fourth, the prior art adopts a strategy for determining the target location information: the position corresponding to the maximum filter response value is the location of the target center, but this sample update method has the following two disadvantages: firstly, no matter whether the current image can be accurately tracked or not, the tracking result is accumulated to the next frame of image, and the subsequent tracking effect is influenced; the circular motion is not in one-to-one correspondence with the actual translation of the target, the circular motion is only an approximate value of the actual translation in the image, express motion, shielding and the like in a real tracking scene become unreliable, and the performance of the tracker is hindered by adopting a single central Gaussian as target response, so that unrecoverable drift is caused. Secondly, in the tracking algorithm of the prior art, continuous sample updating with determined weight is adopted in sample updating, namely, the result obtained by each frame affects the subsequent target, the sample updating weight is directly determined in initialization, and then the influence of each frame image on the tracking result is also determined, however, in an actual tracking scene, the effect of each frame image generated in sample training is different, especially when the target is shielded, the extracted target appearance characteristic is greatly different from the original target and is basically a target background, and once the target is mistakenly determined to be the target and added into the sample training, the training sample is polluted.

Disclosure of Invention

This application is with visual tracking towards unmanned aerial vehicle motion platform as the scene, improves in deformation, shelters from and the three aspect of yardstick on the basis based on relevance locking tracker: firstly, an association locking tracking algorithm based on decision perception is provided, a tracking framework based on decision perception is adopted, fusion is carried out on a decision layer from coarse to fine through feature association selection, feature decision perception and decision perception weight calculation, multiple features are used for describing a target, and tracking drift of the target caused by deformation caused by various factors is reduced; the method comprises the steps of solving tracking drift and even tracking failure generated after a target is shielded, providing an association locking tracking method based on difference superposition detection, rapidly locking a moving target, tracking the association locking target based on the difference superposition detection, and providing a two-dimensional target robustness scale estimation method.

In order to realize the technical characteristic advantages, the technical scheme adopted by the application is as follows:

a consumer-grade unmanned aerial vehicle video moving target accurate tracking method is improved in three aspects of deformation, shielding and dimension on the basis of an associated locking tracker: firstly, a decision perception-based association locking tracking algorithm is provided, which includes: firstly, a tracking framework based on decision perception; second, feature association selection, third, feature decision perception, and fourth, decision perception weight calculation; secondly, an association locking tracking method based on difference superposition detection is provided, which includes: firstly, quickly locking a moving target, and secondly, tracking a related locking target based on difference superposition detection; thirdly, a two-dimensional target robustness scale estimation method is provided, which comprises the following steps: firstly, scale estimation based on a perspective projection model, secondly, a two-dimensional scale estimation tracking frame based on association locking, thirdly, a two-dimensional scale prediction evaluation strategy, fourthly, self-adaptive search window scale selection and fifthly, a scale prediction estimation method process;

(1) aiming at the drift caused by target deformation in tracking, different features are adopted for respectively tracking, and fusion is carried out in a decision layer to obtain a final result, wherein one is to construct two correlation filters, HOG features with target structure information and CNs with target color information are respectively adopted, after the tracking result is calculated, the maximum response value of the filters is utilized in the decision layer to determine the weight of each to the final result, and the other is to utilize correlation locking and a tracking frame based on a color histogram, and then the results of the correlation locking and the tracking frame are fused to obtain the final tracking result;

(2) aiming at the problem of target occlusion in tracking, an association locking tracking method based on difference superposition detection is provided, firstly, a structure risk function in ridge regression is converted into a regularization function, components are added to better cope with the occlusion problem, then difference superposition characteristics are extracted in a search frame and are superposed with an original sample, the difference between a target and a surrounding background is increased, meanwhile, the potential risk of a cyclic matrix is eliminated, a method based on a detection result is adopted on sample updating, drift is reduced on model updating, and a discontinuous updating mode is adopted to cope with the target occlusion problem in the tracking process;

(3) aiming at the problem of target scale change in the tracking process, a two-dimensional target robustness scale estimation method is provided, firstly, rough scale estimation of a target is obtained by utilizing sensor data and internal parameters of a shooting camera, a two-dimensional correlation filter is designed for estimating the target scale, the length and width dimensions of the target are respectively estimated, the real size of the target in a visual field is accurately estimated when the target is deformed and subjected to scale transformation, in addition, a self-adaptive search window method is provided, the size of a search frame is changed according to the displacement of the target in two adjacent frames, and the situation that the target moves to the outside of the search window due to too fast movement is avoided; and the samples are down-sampled on a scale target filter, so that the calculated data volume is reduced.

Preferably, the decision-aware based tracking framework: the optimal tracking effect is achieved by using a model which is as simple as possible and a small number of parameters, and the optimal tracking effect is set in the t picture I_tThe tracking problem is considered as being from the set S of candidate regions_tFind the most probable (highest scoring) region p_t：

Where T refers to some transformation of the image, f (T (I)_tP); theta) represents the image I_tScore of middle rectangular frame p under model parameter theta, modeThe type parameter satisfies a minimization loss function:

wherein the loss function L (theta; chi)_t) In relation to the previous image and the position of the object in the image,

for the whole model parameter space, R (theta) and lambda are regularization factors for preventing overfitting caused by too complicated models, and the problem of effectively tracking and converting the model into selection functions f and L is solved;

the method comprises the steps of obtaining position and scale information of a target when the target is initialized, finding a region which is most similar to the target in an alternative image sequence to be a tracking target, judging whether two signals are related or not to measure the similarity between the two signals, calculating the similarity between a sample of the alternative region and the target by performing correlation operation on the two signals, selecting and calculating the distance between two vectors by using a loss function, and constructing a regressor by using a ridge regression by using a positive sample containing the target and a negative sample which is all background to train an appearance model of the target.

Preferably, the feature association selects: based on the complementary characteristics, multi-characteristic decision perception is carried out on the moving target, and a frame is designed to be fused according to different influences of each characteristic on a tracking result so as to obtain a more accurate and stable tracking effect;

the HOG feature selection adopts the steps that a sample region is divided into a plurality of sub-regions, then 32-dimensional features are extracted in each region, 31-dimensional features are left except that the last dimension is all 0, and the texture features adopt an RFS filter bank of a fast anisotropic Gaussian filter, wherein the features have 8 dimensions in total.

Preferably, the feature decision perception: adopting different input samples, training two filters, wherein one input image is a target frame and a circle of surrounding background and contains spatial context information, namely the size is window _ sz, selecting HOG characteristics for better difference superposition and distinguishing the target and the background, and adding cosine window function processing to the extracted characteristics for eliminating boundary influence; the other only contains the target, the size of the input image is smaller than the size of the target, namely the size of the input image is sz, and CNs characteristics are adopted to ensure that the tracker still has better robustness when the target is deformed greatly; the former adds background information to enlarge the searching area of the target, and the latter only aims at the target to improve the accuracy; the two filters update the templates on line, the tracking result is more accurate, multiple characteristics are adopted to decide and sense the target, more target information is reserved, the target tracking is carried out through multiple means, and the emergency situation in the tracking process is better coped with.

Preferably, the decision perception weight calculation: the method comprises the following steps of adopting an additive model of multi-feature perception, under an additive fusion strategy, carrying out additive fusion on final results after n decision perceptions, wherein the final results are insensitive to noise, increasing robustness of a tracking algorithm, and obtaining a final target position according to a filter response value and a target position as follows:

wherein res1 and res2 are maximum response values of two filters, p₁And p₂The positions of the targets corresponding to the maximum response values are respectively;

an algorithm flow adopting HOG and CNs decision perception is as follows:

inputting: image sequence

Target initial position p₀；

And (3) outputting: position information of target in each frame of image

The first process is as follows: initializing, wherein t is 1;

and a second process: by the use of I₀And p₀Initializing trackers Track1 and Track 2;

and a third process: for t is 1: t is

And (4) a fourth process: output p according to the previous frame_t-1Calculating a current searching sub-window;

and a fifth process: for the tracker Track1, HOG characteristics are adopted, target structure information is utilized, a search window contains spatial context information, and Hanning window is adopted for window processing;

and a sixth process: for the tracker Track2, CNs characteristics are adopted, global color information of a target is utilized, a search window is the target, and window processing is not carried out;

a seventh process: obtaining response graphs of the two trackers from the tracking model;

and (eight) flow: finding a maximum response value in the response map, the maximum response value corresponding to a most likely position of the target;

the process is nine: fusing by an additive model of multi-feature perception to obtain a final tracking result p of the target₀；

And (3) outputting: position information of target in each frame of image

Preferably, the moving object is locked rapidly: the method comprises the steps that a filter W is associated with an alternative area based on a tracking algorithm for quickly locking a moving target, the position of the target is estimated, wherein an image block which contains the target and is larger than the target in size is used as a search window, cyclic shift is conducted on the search window, different alternative windows are obtained, actual displacement of the target is approximated through cyclic motion, and finally a data matrix of a cyclic structure is constructed to obtain a training sample;

assume kernel matrix K^zPerforming kernel correlation operation on a matching template and a search window sample, wherein the matching template X and the detection sample Z are cyclic matrixes obtained by shifting a vector X and a vector Z, and K is^zIs also a circulant matrix:

K^z＝C(k^xz) Formula 4

Wherein k is^xzFor the kernel correlation operation of vector x and vector z, after introducing the kernel function in least squares, the regression function is f (z):

f(z)＝(K^z) Alpha formula 5

Alpha is a transformation coefficient, and DFT operation is carried out on two sides to obtain:

wherein z is predicting the image block of the search window including the object, detecting the position of the object, x is the object model learned in the image, the right side of the equation is a linear combination of k multiplied by a transformation coefficient α, the corresponding elements in the vector are multiplied one by one, the x is common, the DFT inverse operation is performed on the equation 6 to obtain the response matrix of the filter, the maximum value in the matrix corresponds to the position of the object, and the model is updated as follows:

wherein, eta is a learning factor,

and

respectively the fourier transform of the coefficients a of the current frame t and the previous frame t-1,

and

the fourier transforms of the target matching templates for the current frame and the previous frame are represented separately.

Preferably, the correlation lock target tracking based on the difference superposition detection: the method is further improved in two parts of sample training and filter parameter updating of the filter;

1. difference stack detection

The method includes the steps of adding regularization parameters to prevent overfitting of a model, reducing test errors, adopting sparse parameters to effectively cope with shielding, taking difference superposition areas into consideration in a training sample stage, preferentially processing parts which are easy to attract attention, facilitating improvement of tracking performance when a target is shielded, placing a focus on a background, removing statistical redundancy of input signals, removing redundant information of the target to obtain a significant target of the image, and adopting a log spectrum of the image, wherein the difference of the average log amplitude spectrum is obtained by removing the log amplitude spectrum of one image.

Keeping original information of a target while considering the difference superposition information of images, wherein in the process of sample training, omega (u) is u + S_BB(t) where u is the input image, S_BB(t) is a region of significance;

removing redundant information of the image, performing deeper difference superposition detection, and generating a series of Boolean graphs by using the feature graph of the input image according to prior distribution and a random critical value of a feature space based on Boolean mapping representation of instantaneous consciousness of a scene:

B_i＝THRESH(φ(I)，θ)，

φ～p_φ，θ～p_θformula 8

Wherein, the function THRESH (x, theta) binarizes the image by theta, phi (I) represents the characteristic diagram of the image I and is normalized to be between 0 and 255, p_φAnd p_θRepresenting a prior distribution, the effect of the boolean plot on visual attention is represented by the attention plot a (b):

wherein, I is an input image,

the average attention graph is used for carrying out subsequent processing to obtain a final difference superposition graph S;

2. target sample update

In the training stage, the response value is considered in the weight of sample updating, the result detected in the tracking algorithm is utilized, a binary strategy is not simply adopted, firstly, the peak-to-lobe ratio is utilized to determine whether the tracking is effective, then, the influence of the current tracking result on the tracking process of the next frame is directly determined according to the maximum response value corresponding to the tracking result, the assumption is that from the second frame image, the maximum response value of each image is stored in a response _ all, and the updating of the sample is as follows:

wherein, eta is a learning factor,

and

the Fourier transformation of the target matching templates of the current frame and the previous frame is respectively represented, the detection result is considered in the sample updating, when the target is polluted, the occupied weight is reduced, and the influence on the subsequent target is reduced;

3. tracking model updates

A sparse updating strategy is adopted during model updating, the model is updated once every Ns frame, samples still need to be updated in every frame, the model updating frequency is reduced, time is saved, the problem of model drift is solved, and the value of Ns is finally 6.

Preferably, the tracking framework is estimated based on the two-dimensional scale of the association lock: two consistent correlation filters are designed and defined as a position target filter and a scale target filter, tracking and scale transformation of a target are respectively realized, the former is used for positioning a current frame target, the latter is used for scale estimation of the current frame target, the two filters are relatively independent, different feature types and feature calculation modes are selected for training and testing, the two-dimensional filters are adopted, the length and the width of the target are determined not to be the same, and even if the target is greatly deformed, the target scale can still be accurately estimated.

The whole framework of the tracker added with the two-dimensional target robustness scale estimation is divided into two parts: position prediction and scale size estimation, specifically comprising the following steps: obtaining training samples in a search area around a target through intensive sampling, extracting features, carrying out Fourier transform on the samples, and training a filter in a frequency domain by using least square regression; obtaining a position kernel correlation filter through online learning, then mapping an output value of the filter to a time domain through discrete inverse Fourier transform, finding a coordinate of a maximum point in a response diagram, wherein the point corresponds to the central position of a target in a sample; then, a scale prediction process based on the correlation filter is similar to a position process of a prediction target, and after the position of the target is obtained, downsampling and upsampling are carried out on the periphery of the target according to a preset scale value to obtain a series of image blocks with different scales; then, carrying out bilinear interpolation on the image blocks, and changing the size of the image blocks to be consistent with the designed scale model to obtain a training sample; next, performing feature extraction to obtain HOG features of the image blocks, training a least square classifier, and obtaining a scale target tracker; in addition, after the characteristics of the image block are obtained, windowing is carried out on the characteristics by using a Hamming window so as to inhibit high-frequency noise caused by the image boundary in a frequency domain due to the adoption of a cyclic matrix;

when the scale target tracker is applied to a new image, calculating the score on a scale space, namely the response value of a filter, and calculating the final scale of the position of the maximum value in a response image of the scale target filter relative to the target;

and constructing a position target filter to obtain the position of the target to be tracked, constructing a scale target filter, utilizing the target scale obtained by the target center position obtained by the position target filter, transmitting the final target information to the position target filter of the next frame of image, and operating the position target filter and the scale target filter together to update parameters on line and improve the performance of the whole tracker.

Preferably, the two-dimensional scale prediction evaluation strategy is: detecting the scale change of a target by learning a two-dimensional scale correlation filter, and reasonably limiting a target search area according to the scale change of the target;

the most important ring of target scale estimation is to give a target specific scale transformation alternative: in the scale prediction evaluation process, for the current image, the size of the target is P × R, and the size of the scale target filter is S × S, for each

The size of the object is extracted from the length dimension and the width dimension of the object respectively

Image block J_n×nAnd a is a filter parameter factor, and for the target with larger scale change, the alternative scales have a little difference as much as possible, otherwise, the alternative scales should have a little change.

Self-adaptive search window scale selection: the size of a search frame is constrained according to the offset of target positions of two adjacent frames, the size of a filter is initially determined during initialization when correlation operation is carried out on the basis of a tracker of a correlation locking frame, the size of a search area is changed according to the offset of the target positions of the two adjacent frames, the size of a search window is selected in a self-adaptive mode by comparing the offset of the centers of the two adjacent frames, a critical value is preset, and when the offset is larger than the critical value, the search window is enlarged.

Preferably, the scale prediction estimation method flow comprises: in the whole tracking algorithm after adding the scale estimation, firstly, a position target filter is used for obtaining position information, then, a scale target filter is moved to obtain the current scale change of a target, and a kernel function involved in the algorithm is a Gaussian kernel function;

the HOG characteristics are adopted in a scale estimation module, firstly, the characteristics of the image are extracted, and a training sample f of the d-dimensional characteristics of the image is obtained^lWhere l ═ { l,2, …, d }, each feature dimension corresponds to an associated filter h^lMinimizing the loss function:

wherein g is the output of the filter designed corresponding to the training sample f, λ is more than or equal to 0, and is a regularization parameter, the structural error is controlled, and the filter obtained by solving the equation is:

adding a regularization factor, and carrying out l on the filter₂Constraint, so that f has no non-zero item, the problem that the filter has zero in the frequency domain is relieved, the phenomenon that the denominator is zero is avoided, the updating problem of a linear equation set with dimension of d x d is solved on line, and the correlation filter H is updated for obtaining an approximation with robustness^l _tMolecule A of (a)^l _tAnd denominator B_t：

Where η is the learning rate, the output of the final filter, i.e. the correlation score:

the scale transformation corresponding to the maximum value in the relevance response graph is the current scale size of the target;

when extracting features from a target candidate region, respectively carrying out scale change on the length and width of a target to obtain candidate regions, accurately estimating the scale change of the target but increasing the calculated amount, adopting principal component feature transformation to reduce dimensions, and solving the feature value and the feature vector of a covariance matrix of high-dimensional data, wherein the feature vector is a selected base vector, the feature value corresponding to the feature vector represents the projection of original data on the vector, and the larger the feature value is, the more information of the original data reserved by the corresponding feature vector is, the data redundancy information is removed, and the calculated data amount is reduced;

in addition, on the scale target filter, the size of the target is concerned, and in order to reduce the calculation amount and keep the overall information of the target, the target is down-sampled to a certain size before the target characteristic is calculated, and the whole target does not need to be processed.

Compared with the prior art, the technical scheme has the following innovation points and advantages:

first, correlation-based filters show superior performance in tracking speed, but with major challenges in tracking: the violent appearance change, shelter from, the yardstick of deformation production changes, scene illumination changes etc. still has considerable difficult point, and this application uses the visual tracking towards unmanned aerial vehicle motion platform as the scene, improves in deformation, shelters from and the three aspect of yardstick on the basis based on relevant locking tracker: firstly, an association locking tracking algorithm based on decision perception is provided, a tracking framework based on decision perception is adopted, fusion is carried out on a decision layer from coarse to fine through feature association selection, feature decision perception and decision perception weight calculation, multiple features are used for describing a target, and tracking drift of the target caused by deformation caused by various factors is reduced; secondly, solving the tracking drift and even the tracking failure generated after the target is shielded, providing an association locking tracking method based on difference superposition detection, rapidly locking the moving target, tracking the association locking target based on difference superposition detection, and thirdly, providing a two-dimensional target robustness scale estimation method;

secondly, aiming at drift caused by target deformation in tracking, the decision perception combines multiple characteristics to describe a target in multiple aspects, finally different characteristics are adopted for tracking respectively, fusion is carried out on a decision layer to obtain a final result, one of the two correlation filters is constructed, HOG characteristics with target structure information and CNs with target color information are adopted respectively, after a tracking result is calculated, the maximum response value of the filter is utilized to determine the weight of each final result on the decision layer, the other one of the correlation filters is utilized to lock and a tracking frame based on a color histogram, and then the results of the two are fused to obtain the final tracking result; under the condition of target deformation, better tracking performance is achieved in the fusion of a decision layer than in a feature layer, and the target tracking is achieved by adopting the multi-feature perception association locking, so that a series of problems caused by large deformation and scale change of the target in a consumer-grade unmanned aerial vehicle video in a short time are solved, and the method has great significance and great practical value;

thirdly, aiming at the problem of target occlusion in tracking, the method improves target tracking under the occlusion condition from the aspects of motion estimation and sample updating of the target, and provides a kernel association filter tracking algorithm based on difference superposition₁The components better deal with the problem of shielding, then difference superposition characteristics are extracted from a search frame and are superposed with an original sample, the difference between a target and the surrounding background is increased, meanwhile, the potential risk of a cyclic matrix is eliminated, a method based on a detection result is adopted in sample updating, the drift is reduced in model updating, and a discontinuous updating mode is adopted to deal with the problem of shielding of the target in the tracking process; the method and the device are further improved in two parts of sample training and filter parameter updating of the filter; the existing tracking algorithm based on the DCF sacrifices the tracking speed and the real-time performance while pursuing the tracking effect, and the application improves the tracking algorithm and improves the speed in the link of model updating;

fourthly, aiming at the problem of target scale change in the tracking process, the application provides a two-dimensional target robustness scale estimation method, firstly, sensor data and internal parameters of a shooting camera are utilized to obtain rough scale estimation of a target, a two-dimensional correlation filter is designed to estimate the target scale, the length dimension and the width dimension of the target are respectively estimated, and the real size of the target in a visual field is accurately estimated when the target is deformed and subjected to scale transformation; in order to improve the calculation speed, down-sampling is carried out on the samples on the scale target filter, and the calculation data volume is reduced; the situation experiment results which possibly occur in the tracking process of target deformation, scale change, shielding, disappearance and the like show that the method has better tracking accuracy and overlapping rate, and also shows excellent tracking performance after the target is shielded.

Drawings

Fig. 1 is a perspective imaging schematic based on scale estimation of a perspective projection model.

Fig. 2 is a schematic diagram of a central imaging based on scale estimation of a perspective projection model.

FIG. 3 is a diagram of a two-dimensional scale estimation tracking framework based on associative locking.

Fig. 4 is a graph comparing the bouat 1 test video grountruth and different tracking effects.

Fig. 5 is a graph comparing the bouat 2 test video grountruth and different tracking effects.

Fig. 6 is a graph comparing the bouat 3 test video grountruth and different tracking effects.

Fig. 7 is a graph comparing the car4 test video groudtuth and different tracking effects.

Figure 8 is a comparison of wakeboard5 test video groudtruth and different tracking effects.

Detailed description of the invention

In order to make the objects, features, advantages and novel features of the present application more comprehensible and easy to implement, specific embodiments are described in detail below with reference to the accompanying drawings. Those skilled in the art may now do so without departing from the spirit and scope of the present application, and therefore the present application is not limited to the specific embodiments disclosed below.

The rise and the use of consumption level unmanned aerial vehicle have brought new application scene for the target tracking, however at the tracking in-process, often have some external factors to make the tracking degree of difficulty increase, including illumination change, target deformation, motion blur, shelter from etc.. The application aims at the difficult problem in the application scene of robustness tracking of a consumer-grade unmanned aerial vehicle motion platform, and improves the three aspects of deformation, shielding and scale on the basis of an associated locking tracker: firstly, a decision perception-based association locking tracking algorithm is provided, which includes: the method comprises the following steps of firstly, a tracking framework based on decision perception, secondly, feature association selection, thirdly, feature decision perception, fourthly, decision perception weight calculation, fusion from coarse to fine in a decision layer, and description of a target by multiple features, so that tracking drift of the target caused by deformation caused by various factors is reduced; secondly, solving the tracking drift and even the tracking failure generated after the target is shielded, and providing an associated locking tracking method based on difference superposition detection, which comprises the following steps: firstly, a moving target is quickly locked, secondly, an associated locked target tracking based on difference superposition detection is carried out, and thirdly, a two-dimensional target robustness scale estimation method is provided, and comprises the following steps: firstly, scale estimation based on a perspective projection model, secondly, a two-dimensional scale estimation tracking frame based on association locking, thirdly, a two-dimensional scale prediction evaluation strategy, fourthly, self-adaptive search window scale selection and fifthly, a scale prediction estimation method process;

(1) and aiming at the drift caused by target deformation in tracking, decision perception is carried out on a characteristic layer, multiple characteristics are connected in series, and the target is described in multiple aspects. Finally, respectively tracking by adopting different characteristics, fusing in a decision layer to obtain a final result, wherein one of the two correlation filters is constructed, the HOG characteristic with target structure information and the CNs with target color information are respectively adopted, after the tracking result is calculated, the maximum response value of the filter is utilized to determine the weight of each to the final result in the decision layer, and the other one of the two correlation filters is utilized to lock and a tracking frame based on a color histogram, and then the results of the two correlation filters and the tracking frame are fused to obtain the final tracking result; in the case of target deformation, fusion at the decision layer has better tracking performance than fusion at the feature layer.

(2) Aiming at the problem of target occlusion in tracking, an association locking tracking method based on difference superposition detection is provided, firstly, a structure risk function in ridge regression is converted into a regularization function, and l is added₁The components better cope with occlusion problems, thenAnd extracting difference superposition characteristics in the search frame and superposing the difference superposition characteristics with the original sample, increasing the difference between the target and the surrounding background, eliminating the potential risk of the cyclic matrix, adopting a method based on a detection result on sample updating, reducing drift on model updating, and solving the problem of target shielding in the tracking process by adopting a discontinuous updating mode.

(3) Aiming at the problem of target scale change in the tracking process, a two-dimensional target robustness scale estimation method is provided, firstly, rough scale estimation of a target is obtained by utilizing sensor data and internal parameters of a shooting camera, a two-dimensional correlation filter is designed for estimating the target scale, the length and width dimensions of the target are respectively estimated, the real size of the target in a visual field is accurately estimated when the target is deformed and subjected to scale transformation, in addition, a self-adaptive search window method is provided, the size of a search frame is changed according to the displacement of the target in two adjacent frames, and the situation that the target moves to the outside of the search window due to too fast movement is avoided; in order to improve the calculation speed, down-sampling is carried out on the samples on the scale target filter, and the calculation data volume is reduced; experiments show that the method has better tracking accuracy and overlapping rate, and also shows excellent tracking performance after the target is shielded.

First, associated locking target tracking based on decision perception

Compare with traditional cloud platform acquisition video, in the video that consumer-grade unmanned aerial vehicle acquireed, the target takes place great deformation and scale change in the short time more easily, and this has undoubtedly increased the pursuit degree of difficulty. And aiming at target deformation in the tracking process, performing target tracking by adopting multi-feature perception association locking.

Decision perception based tracking framework

The difficulty in the tracking process is limited a priori information about the target, and the tracking model must be robust to changes in the appearance of the object. The former easily causes the model to be too complex, and the test error is too large while the pursuit of small training error is achieved; in the latter, because the target is not invariable in the tracking process, the factors of illumination, shielding, motion blurring, rotation and the like all cause the target to generateA change is made. The method adopts the model which is as simple as possible and a small number of parameters to achieve the optimal tracking effect, and is arranged in the t picture I_tThe tracking problem is considered as being from the set S of candidate regions_tFind the most likely (highest scoring) region p in_t：

Where T refers to some transformation of the image, f (T (I)_tP); theta) represents the image I_tThe score of the middle rectangular frame p under the condition that the model parameter is theta meets the minimization loss function:

for the whole model parameter space, R (theta) and lambda are regularization factors for preventing the model from being too complex to cause overfitting, and the problem of converting effective tracking into selection functions f and L is achieved.

(II) feature association selection

Whether the HOG map, the color histogram, the color attribute or the texture feature has a certain effect on target tracking, but the single feature has limitation on complex environments, and the tracking effect is not ideal. The HOG characteristics are suitable for describing rigid objects and are insufficient for the situation that the target is deformed greatly; only by adopting the color characteristics, the situation that the color of the target is close to that of the background cannot be distinguished obviously, and the tracking effect is greatly reduced when the illumination changes. In order to cope with the phenomenon that the target is greatly deformed in the tracking process, so that the tracking is likely to drift, the multi-feature decision perception is carried out on the moving target on the basis of the complementary features, and a reasonable frame is designed to be fused to obtain a more accurate and stable tracking effect according to different influences of each feature on the tracking result.

(III) feature decision perception

Training two filters by adopting different input samples, wherein one input image is a target frame and a circle of surrounding background and contains spatial context information, namely the size is window _ sz, selecting HOG characteristics for better difference superposition and distinguishing the target and the background, and adding cosine window function processing to the extracted characteristics for eliminating boundary influence; the other only contains the target, the size of the input image is smaller than the size of the target, namely the size of the input image is sz, and CNs characteristics are adopted to ensure that the tracker still has better robustness when the target is deformed greatly; the former adds background information to expand the search area of the target, and the latter only aims at the target to improve accuracy; the two filters update the template on line, so that the tracking result is more accurate. The strategy adopts various characteristic decisions to sense the target, more information of the target is reserved, and the target tracking is carried out through multiple means, so that the emergency situation in the tracking process is better coped with.

(IV) calculating decision perception weight

The method comprises the following steps of adopting an additive model of multi-feature perception, under an additive fusion strategy, carrying out additive fusion on final results after n decision perceptions, wherein the final results are insensitive to noise, increasing robustness of a tracking algorithm, and obtaining a final target position according to a filter response value and a target position as follows:

wherein res1 and res2 are maximum response values of two filters, p₁And p₂The positions of the targets corresponding to the maximum response values are respectively.

Adopting an algorithm flow of HOG and CNs decision perception:

inputting: image sequence

Target initial position p₀；

And (3) outputting: position information of target in each frame of image

The first process is as follows: initializing, wherein t is 1;

and a third process: for 1: t is

the process is nine: fusing by additive model of multi-feature perception to obtain targetFinal trace result p₀；

And (3) outputting: position information of target in each frame of image

Second, correlation locking tracking based on difference superposition detection

The tracking algorithm with the correlation filter as a basic frame adopts an circulant matrix to realize dense sampling, samples are circularly moved to construct a circulant data matrix, and fast calculation is carried out by adopting discrete Fourier change in a frequency domain based on the property of the circulant matrix, so that the calculation efficiency is greatly improved.

The uncertainty and complexity of consumer-grade drone scenes presents many challenges to target tracking, such as scale changes, occlusion, appearance changes, motion blur, lighting effects, and so on. These difficult problems that may be encountered in the tracking process are not conducive to forming robust and reliable visual tracking, where the problem of occlusion is a difficult problem in target tracking of unmanned aerial vehicles, the main reason being: firstly, reasons for shielding are complex and various, prior knowledge is not available, shielding time and degree are unpredictable, influence of partial shielding and total shielding on a target appearance model is different, the target appearance is seriously polluted by the total shielding, and the target appearance and the motion estimation model are directly influenced to update once the target is shielded due to the contingency; secondly, the shielding is difficult to identify, human eyes can easily judge that the target is shielded, but the condition is a great problem for machine learning, if a shielded area cannot be identified, once a shielded sample is applied to sample updating, a target appearance model is affected and degraded or even polluted, and further tracking drift can be caused. Therefore, how to reduce the influence of occlusion on the sample update and even the whole tracking process is an urgent problem to be solved.

The following analyzes the problems existing in the update stage in the tracking framework based on decision perception, improves the target tracking under the shielding condition from the aspects of motion estimation and sample update of the target, and provides a kernel correlation filter tracking algorithm based on difference superposition.

Quick locking of moving target

The method comprises the steps of associating a filter W with an alternative area based on a tracking algorithm for quickly locking a moving target, estimating the position of the target, wherein an image block which contains the target and is larger than the target in size is used as a search window, cyclic shifting is carried out on the search window to obtain different alternative windows, actual displacement of the target is approximated through cyclic motion, and finally a data matrix of a cyclic structure is constructed to obtain a training sample.

K^z＝C(k^xz) Formula 4

f(z)＝(K^z) Alpha formula 5

wherein z is the image block of the search window for predicting the image block including the target to detect the position of the target, x is the target model learned from the image, the right side of the equation is a linear combination of k multiplied by a transformation coefficient alpha, corresponding elements in the vector are multiplied one by one, x is common, the DFT inverse operation is performed on the formula 6 to obtain the response matrix of the filter, the maximum value in the matrix corresponds to the position of the target, and the model is updated as follows:

wherein, eta is a learning factor,

and

and

which represent the fourier transforms of the target matching templates of the current frame and the previous frame, respectively.

Advantages and disadvantages of the (two) circulant matrix

The problem of large calculation amount is well solved by adopting the cyclic matrix, data is constructed into the cyclic matrix, the pixel-by-pixel motion of a target is simulated, samples are expanded, and the information amount carried by the first row of the matrix can represent the whole matrix according to the property of the cyclic matrix. Therefore, the whole circulant matrix does not have to be concerned, and only the elements of a certain row or column are known. Furthermore, the algorithm makes use of the fast fourier transform so that the training and detection computation is significantly reduced, and is particularly suitable for tracking, where the training data is sparse and the computation efficiency is crucial for real-time tracking, but there are some disadvantages: firstly, an input image is subjected to cyclic shift, a cyclic sliding window is adopted on a training set to train a classifier, however, due to the assumption of periodicity, the training images have period overlapping, due to the discontinuity of sample edges, an obvious dividing line is generated after cyclic shift, and the dividing line is represented as high-frequency noise on a frequency domain, so that adverse effects are caused on frequency domain processing; secondly, the target response in the filter training stage is independent of the observed image, and when the system is in scenes such as rapid motion, shielding, motion blurring and the like, the cyclic motion and the actual translation of the target do not correspond to each other one by one.

The samples used for training the tracking filter are not labeled manually, but the tracking result of the previous frame is adopted, and based on the labeling of the algorithm itself, these factors inadvertently learn and train the image with the deviation as the positive sample, and the target detection part of the tracker is inaccurate (such as the target moves rapidly or moves blurrily), which may adversely affect the target position information of the next frame of image. Since the target response of the current frame is independent, such errors will affect the parameter update of the filter, and finally the tracking has the potential problem of drift. Therefore, when the target rotates or deforms, the tracking result is easy to be inaccurate; when the target is half-shielded or even completely shielded, the tracking failure is directly caused because the appearance model of the target basically loses the effect under the condition; the motion blur may cause the target distortion, seriously damage the training sample, make the appearance model of the target lose the distinguishing discrimination ability, and reduce the tracking accuracy.

The target drift is the most important problem of online tracking, and the most important reason for causing the drift is that the accuracy of a sample adopted by a classifier in updating has a problem, namely, an error accumulation problem exists in the tracking process, if the tracking result of the previous frame has a deviation, the error is accumulated on the classifier, the result of the next frame has an error, the target starts to drift in visual performance, and finally the tracking fails.

(III) associated locking target tracking based on difference superposition detection

The method provides a tracking algorithm for detecting the correlation filter based on difference superposition, and further improves two parts of sample training and filter parameter updating of the filter.

1. Difference stack detection

In order to correspond to the situation that a target is shielded, regularization parameters are added to prevent overfitting of a model, test errors are reduced to be small, sparse parameters are adopted to effectively correspond to shielding, difference superposition areas are taken into consideration in a training sample stage, parts which are easy to attract attention are preferentially processed, the tracking performance is improved when the target is shielded, an attention point is placed on a background, statistical redundancy of input signals is removed, redundant information of the target is removed to obtain a significant target of the image, a log spectrum of the image is adopted, and the difference superposition part of the image is obtained by reducing an average log amplitude spectrum from the log amplitude spectrum of one image.

In order to not lose the details of the target, the original information of the target is kept while the difference superposition information of the images is considered, and in the process of sample training, omega (u) is u + S_BB(t) where u is the input image, S_BB(t) is a region of significance.

B_i＝THRESH(φ(I)，θ)，

φ～p_φ，θ～p_θformula 8

wherein, I is an input image,

the average attention map is used for subsequent processing to obtain a final difference overlay map S.

2. Target sample update

Intercepting the alternative image blocks of the next frame around the target of the current frame, training a filter on line in real time, then obtaining the target position of the next frame, mutually updating and tracking all the time, and judging the position information of the target by adopting the following strategy: the position corresponding to the maximum filter response value is the location of the target center.

However, this sample update method has the following two disadvantages:

firstly, no matter whether the current image can be accurately tracked or not, the tracking result is accumulated to the next frame of image, and the subsequent tracking effect is influenced; the circular motion is not in one-to-one correspondence with the actual translation of the target, the circular motion is only an approximate value of the actual translation in the image, express motion, shielding and the like in a real tracking scene become unreliable, and the performance of the tracker is hindered by adopting a single central Gaussian as target response, so that unrecoverable drift is caused.

Secondly, in the tracking algorithm of the prior art, continuous sample updating with determined weight is adopted in sample updating, namely, the result obtained by each frame affects the subsequent target, the sample updating weight is directly determined in initialization, and then the influence of each frame image on the tracking result is also determined, however, in an actual tracking scene, the effect of each frame image generated in sample training is different, especially when the target is shielded, the extracted target appearance characteristic is greatly different from the original target and is basically a target background, and once the target is mistakenly determined to be the target and added into the sample training, the training sample is polluted.

In the method, the response value is considered in the weight of sample updating in the training stage, the result detected in the tracking algorithm is utilized, a binary strategy is not simply adopted, even if the target is shielded, the information in the target is utilized, the target drift is avoided after the target is shielded, whether the tracking is effective is determined by utilizing the peak-to-side lobe ratio, then the influence of the current tracking result on the tracking process of the next frame is determined directly according to the maximum response value corresponding to the tracking result, the assumption is that from the second frame of image, the maximum response value of each image is stored in response _ all, and the updating of the sample is as follows:

wherein, eta is a learning factor,

and

the Fourier transformation of the target matching templates of the current frame and the previous frame is respectively represented, the detection result is considered in the sample updating, when the target is polluted, the occupied weight is reduced, and the influence on the subsequent target is reduced.

3. Tracking model updates

At present, a tracking algorithm based on DCF sacrifices the tracking speed and real-time performance while pursuing the tracking effect. Therefore, the method and the device aim at the target shielding situation in the tracking process, the link of model updating is adopted, the tracking algorithm is improved, and the speed is increased.

A sparse updating strategy is adopted during model updating, the model is updated once every Ns frame, samples still need to be updated in every frame, the model updating frequency is reduced, time is saved, the problem of model drift can be avoided, and the improvement effect is achieved to a certain extent. But Ns cannot be set too large, otherwise, the model cannot follow the change of the target, and the value of Ns is 6 finally, because more and more parameters are adopted in order to pursue higher accuracy, the appearance model and the motion model are more and more complex, and for tracking, overfitting is easy to occur due to the lack of a large number of training samples, the model can be effectively prevented from drifting in a non-frame updating mode, and meanwhile, the calculation time is saved.

Three-dimensional and two-dimensional target robustness scale estimation

In the prior art, the center of gravity is placed at the position of an estimated target by a target tracking algorithm, and the scale of a video moving target of a consumer-grade unmanned aerial vehicle is changed while the video moving target is deformed. The method only predicts the position of the target, limits the tracking performance, is difficult to realize better tracking, and estimates the scale change of the target, thereby being beneficial to improving the tracking accuracy. After the position information of the target is obtained, the two-dimensional filter is trained to respectively estimate the length and the width of the target, and the scale of the target can be estimated more robustly.

Scale estimation based on perspective projection model

In alreadyKnowing the movement speed of the unmanned aerial vehicle movement platform, estimating the scale change of the target, and setting the perspective imaging principle as shown in figure 1, wherein of is focal length, on is image distance, and om is object distance, according to the convex lens optical imaging principle, the object distance is far greater than the image distance (M > f), at this time, the focal length and the image distance are determined to be approximately equal, the central imaging model approximately replaces the perspective imaging model, the detail is shown in figure 2, in the figure, M is a point under a camera coordinate system, M is the projection of the point M under the image coordinate system, and the vector expression of the point M is set as M (x is equal to the vector expression of the point M) (x is set as_M，y_M，z_M)^TThe vector expression of the point m is m ═ x_m，y_m)^TUnder the perspective imaging model, the transformation relationship between two points is as follows:

the non-linear transformation formula from the camera coordinate system to the image coordinate system is:

if the distance between the moving platform and the target is known, the dimension of the target in the phase plane is estimated by combining the internal parameters of the camera.

(II) two-dimensional scale estimation tracking framework based on association locking

Two consistent correlation filters are designed and defined as a position target filter and a scale target filter, tracking and scale transformation of a target are respectively realized, the former is used for positioning a current frame target, the latter is used for scale estimation of the current frame target, the two filters are relatively independent, different feature types and feature calculation modes are selected for training and testing, the two-dimensional filters are adopted, the length and the width of the target are determined not to be the same, and even if the target is greatly deformed, the target scale can still be accurately estimated.

As shown in fig. 3, the overall framework of the tracker incorporating two-dimensional target robustness measure estimation is divided into two parts: position prediction and scale size estimation, specifically comprising the following steps: obtaining training samples in a search area around a target through intensive sampling, extracting features, carrying out Fourier transform on the samples, and training a filter in a frequency domain by using least square regression; obtaining a position kernel correlation filter through online learning, then mapping an output value of the filter to a time domain through discrete inverse Fourier transform, finding a coordinate of a maximum point in a response diagram, wherein the point corresponds to the central position of a target in a sample; then, a scale prediction process based on the correlation filter is similar to a position process of a prediction target, and after the position of the target is obtained, downsampling and upsampling are carried out on the periphery of the target according to a preset scale value to obtain a series of image blocks with different scales; then, carrying out bilinear interpolation on the image blocks, and changing the sizes of the image blocks to be consistent with the designed scale model to obtain a training sample; next, performing feature extraction to obtain HOG features of the image blocks, training a least square classifier, and obtaining a scale target tracker; after the features of the image blocks are obtained, the features are subjected to windowing processing by using a Hamming window so as to inhibit high-frequency noise caused by image boundaries in a frequency domain due to the adoption of a cyclic matrix.

When the scale target tracker is applied to a new image, the score on the scale space, namely the response value of the filter, is calculated, and the position of the maximum value in the response graph of the scale target filter is corresponding to the final scale of the target.

(III) two-dimensional scale prediction evaluation strategy

The scale change of the target is detected by learning a two-dimensional scale correlation filter, and the target searching area is reasonably limited according to the scale change of the target, so that unnecessary calculation is avoided.

One of the most important rings in target scale estimation is to give a target specific scaleDegree transformation alternative value: in the scale prediction evaluation process, for the current image, the size of the target is P × R, the size of the scale target filter is S × S, and for each

Image block J of_n×nAnd a is a filter parameter factor, and for the target with larger scale change, the alternative scales have a little difference as much as possible, otherwise, the alternative scales should have a little change.

(IV) adaptive search window size selection

Two classifications are adopted for tracking based on the discriminant model, the target is separated from the background, the motion information of the target is taken into consideration, the position where the target of the next frame appears is determined to be in the neighborhood taking the target of the current frame as the center, and the value of padding is taken into consideration: the relative displacement delta p of the target center between two adjacent frames under the unmanned aerial vehicle motion platform is more variable relative to a common scene, so that the target part of the next frame is sometimes not in a searching sub-window, the tracking result does not correspond to a target, tracking drift occurs, the position of the target of the next frame obtained by the algorithm is certain in a window with the current image target as the center and the window size of window _ sz, once the space context is determined, the actual target is not in the area, no matter how strong the robustness of the early-stage classifier model is, the target cannot be detected, the tracking fails, the padding value is too small, and the target is not in a searching frame with the previous frame image target as the center; if padding is too large, the retained background information is increased, false detection is easily caused when objects similar to the target appear in the background, and in addition, the calculated amount is increased due to excessive data.

In order to solve the problem that the distance between the target positions of two adjacent frames is large, the method adopts a self-adaptive search window strategy, and restricts the size of a search frame according to the offset of the target positions of the two adjacent frames;

the tracker based on the association locking frame is characterized in that the scale of a filter is determined at the beginning of initialization when association operation is carried out, the size of a search area is changed according to the offset of the target positions of two adjacent frames, the size of a search window is selected in a self-adaptive mode by comparing the offset of the centers of the targets of the two adjacent frames, a critical value is preset, and when the offset is larger than the critical value, the search window is enlarged.

(V) flow of scale prediction estimation method

In the whole tracking algorithm after the scale estimation is added, firstly, a position target filter is used for obtaining position information, then, a scale target filter is moved to obtain the current scale change of a target, and a kernel function involved in the algorithm is a Gaussian kernel function.

and the scale transformation corresponding to the maximum value in the relevance response graph is the current scale size of the target.

When the feature of the target candidate region is extracted, the length and the width of the target are respectively subjected to scale change to obtain candidate regions, the scale change of the target is accurately estimated, but the calculated amount is increased, principal component feature transformation dimensionality reduction is adopted, the feature value and the feature vector of a covariance matrix of high-dimensional data are solved, the feature vector is a selected base vector, the feature value corresponding to the feature vector represents the projection of original data on the vector, the larger the feature value is, the more information of the original data reserved by the corresponding feature vector is, data redundant information is removed, and the calculated data amount is reduced.

Fourth, experimental results and analysis

(ii) qualitative results analysis

The whole experiment is completed based on simulation of a matlab platform, the problem of scale estimation of the target when the target is subjected to scale change, out-of-plane rotation and other scale changes in the tracking process facing the motion platform is mainly solved, and the adopted video sequence comprises a ship on the water surface, a running automobile and people on the water surface.

With DSST as a basic algorithm, no matter position estimation or scale estimation of a target, the selected characteristics are HOG, an algorithm scale target filter is concerned, and parameters in an experiment are set as follows: the padding value is 2, the number of scale candidate values is 17, the step length is 1.02, key frames in the image sequence are intercepted, and the tracking effect of the algorithm is visually judged.

As in fig. 4, in the boot 1 test video, the background is simple, the object is slowly deformed and the scale changes are small. The tracking result shows that the DSST tracking algorithm is influenced by the initialization shape of the target, the tracking result cannot change along with the deformation of the target, and the algorithm is more accurate than the DSST scale estimation.

In the test video of the boat2, as shown in fig. 5, the target is largely deformed and the boat enters the field of view from the side and then turns around, and then drives far away. From the view of the tracking box, the tracking results of the two are approximate, but obviously, the algorithm proposed by the application is closer to groudtuth, and the tracking performance is better.

As shown in fig. 6, in the test video of the boot 3, the background is simple, the target is mainly deformed, and it can be seen from the tracking result that the DSST tracking algorithm is affected by the initialized shape of the target, and after the view angle is changed, the shape of the tracking result cannot be changed along with the deformation of the target, so that the whole tracking result is affected. According to the method and the device, after the target becomes smaller, the tracking frame also becomes smaller, and the method and the device are more accurate than the scale estimation of the DSST.

Referring to fig. 7, in the car4 test video, the car drives from the circular turntable into the straight road and interferes with the occlusion, disappearance, similar background, etc., from the intercepted key frame, when the target is occluded to disappear completely and reappears in the field of view, the DSST algorithm fails to track, but the algorithm of the present application can still track the target, and when there is similar background around the target, the tracking is not affected.

Like figure 8, in wakeboard5 test video, the target mainly takes place big deformation and big size change, and the people jumps into the sea from the bank to begin the slide motion on water, the people is controlled quick motion and is followed upward jump, squat, upright etc. and move, and the people keeps away from the shooting platform simultaneously, diminishes gradually in the field of vision, when the gesture of target itself changes, the groudtruth that this application can be more close to the target, and it is also more accurate to track the result.

(II) analysis of quantitative results

And (4) carrying out quantitative analysis on the experimental data, and selecting evaluation parameters of distance precision and overlapping precision by adopting an OPE method. From experimental results, the tracking effect of the method is superior to that of DSST in both tracking accuracy and success rate, the target scale can be estimated more accurately by adopting a two-dimensional scale estimation algorithm, the whole tracking precision is improved, and the algorithm has better tracking performance.

Claims

1. The consumption-level unmanned aerial vehicle video moving target accurate tracking method is characterized in that improvement is performed on three aspects of deformation, shielding and dimension on the basis of an associated locking tracker: firstly, a decision perception-based association locking tracking algorithm is provided, which includes: firstly, a tracking framework based on decision perception; second, feature association selection, third, feature decision perception, and fourth, decision perception weight calculation; secondly, an association locking tracking method based on difference superposition detection is provided, which includes: firstly, quickly locking a moving target, and secondly, tracking a related locking target based on difference superposition detection; thirdly, a two-dimensional target robustness scale estimation method is provided, which comprises the following steps: firstly, scale estimation based on a perspective projection model, secondly, a two-dimensional scale estimation tracking frame based on association locking, thirdly, a two-dimensional scale prediction evaluation strategy, fourthly, self-adaptive search window scale selection and fifthly, a scale prediction estimation method process;

(1) aiming at drift caused by target deformation in tracking, respectively tracking by adopting different characteristics, and fusing in a decision layer to obtain a final result, wherein one is to construct two correlation filters, respectively adopt HOG characteristics with target structure information and CNs with target color information, calculate the tracking result, and then determine respective weights to the final result by using the maximum response value of the filters in the decision layer, and the other is to use correlation locking and a tracking frame based on a color histogram, and then fuse the results of the two to obtain the final tracking result;

(2) aiming at the problem of target occlusion in tracking, an association locking tracking method based on difference superposition detection is provided, firstly, a structural risk function in ridge regression is converted into a regularization function, components are added to better deal with the occlusion problem, then, difference superposition characteristics are extracted in a search frame and are superposed with an original sample, the difference between a target and a surrounding background is increased, meanwhile, the potential risk of a cyclic matrix is eliminated, a method based on a detection result is adopted on sample updating, drift is reduced on model updating, and a discontinuous updating mode is adopted to deal with the problem of target occlusion in the tracking process;

2. The consumer-grade unmanned aerial vehicle video moving target accurate tracking method according to claim 1, wherein a decision perception-based tracking framework: the optimal tracking effect is achieved by using a model which is as simple as possible and a small number of parameters, and the optimal tracking effect is set in the t picture I_tIn the above, the tracking problem is considered to be from the candidate region set S_tFind the most likely (highest scoring) region p in_t：

Where T refers to some transformation of the image, f (T (I)_tP); theta) represents the image I_tScore of middle rectangular frame p under model parameter thetaThe model parameters satisfy a minimization loss function:

3. The consumer-grade unmanned aerial vehicle video moving target accurate tracking method according to claim 1, characterized in that feature association selection: based on the complementary characteristics, multi-characteristic decision perception is carried out on the moving target, and a frame is designed to be fused according to different influences of each characteristic on a tracking result so as to obtain a more accurate and stable tracking effect;

4. The consumer-grade unmanned aerial vehicle video moving target accurate tracking method according to claim 1, wherein feature decision perception: adopting different input samples, training two filters, wherein one input image is a target frame and a circle of surrounding background and contains spatial context information, namely the size is window _ sz, selecting HOG characteristics for better difference superposition and distinguishing the target and the background, and adding cosine window function processing to the extracted characteristics for eliminating boundary influence; the other only contains the target, the size of the input image is smaller than the size of the target, namely the size of the input image is sz, and CNs characteristics are adopted to ensure that the tracker still has better robustness when the target is deformed greatly; the former adds background information to expand the search area of the target, and the latter only aims at the target to improve accuracy; the two filters update the templates on line, the tracking result is more accurate, multiple characteristic decisions are adopted to sense the target, more target information is reserved, and the target tracking is carried out through multiple means, so that the emergency situation in the tracking process is better coped with.

5. The consumer-grade unmanned aerial vehicle video moving target accurate tracking method according to claim 1, wherein the decision perception weight is calculated as follows: the method comprises the following steps of adopting an additive model of multi-feature perception, under an additive fusion strategy, carrying out additive fusion on final results after n decision perceptions, wherein the final results are insensitive to noise, increasing robustness of a tracking algorithm, and obtaining a final target position according to a filter response value and a target position as follows:

adopting an algorithm flow of HOG and CNs decision perception:

inputting: image sequence

Target initial position p₀；

And (3) outputting: position information of target in each frame of image

The first process is as follows: initializing, wherein t is 1;

and a third process: for 1: t is

and a sixth process: for the tracker Track2, CNs characteristics are adopted, global color information of a target is utilized, a search window is the target, and window processing is not performed;

And (3) outputting: position information of target in each frame of image

6. The consumer-grade unmanned aerial vehicle video moving target accurate tracking method according to claim 1, wherein the moving target is rapidly locked: the method comprises the steps that a filter W is associated with an alternative area based on a tracking algorithm for quickly locking a moving target, the position of the target is estimated, wherein an image block which contains the target and is larger than the target in size is used as a search window, cyclic shift is conducted on the search window, different alternative windows are obtained, actual displacement of the target is approximated through cyclic motion, and finally a data matrix of a cyclic structure is constructed to obtain a training sample;

K^z＝C(k^xz) Formula 4

f(z)＝(K^z) Alpha formula 5

wherein, eta is a learning factor,

and

and

7. The consumer-grade unmanned aerial vehicle video moving target accurate tracking method according to claim 1, wherein correlation locking target tracking based on difference superposition detection: the method is further improved in two parts of sample training and filter parameter updating of the filter;

1. difference stack detection

The method includes the steps that the original information of a target is kept while the difference superposition information of images is considered, and in the sample training process, omega (u) is u + S_BB(t), where u is the input image, S_BB(t) is a salient region;

B_i＝THRESH(φ(I)，θ)，

φ～p_φ，θ～p_θformula 8

wherein, I is an input image,

2. target sample update

In the training stage, the response value is considered to the weight of sample updating, the result detected in the tracking algorithm is utilized, rather than simply adopting a binary strategy, whether the tracking is effective is determined by utilizing the peak-to-side lobe ratio, then the influence of the current tracking result on the tracking process of the next frame is determined directly according to the maximum response value corresponding to the tracking result, and assuming that the maximum response value of each image is stored in response _ all from the second frame image, the updating of the samples is as follows:

wherein, eta is a learning factor,

and

fourier transformation of target matching templates respectively representing the current frame and the previous frame, and detection resultsConsidering that in the sample updating, when the target is polluted, the occupied weight is reduced, and the influence on the subsequent target is reduced;

3. tracking model updates

A sparse updating strategy is adopted during model updating, the model is updated every Ns frame, the samples still need to be updated every frame, the model updating frequency is reduced, time is saved, the model drifting problem is avoided, and the value of Ns is finally 6.

8. The consumer-grade unmanned aerial vehicle video moving target accurate tracking method according to claim 1, wherein a tracking framework is estimated based on a two-dimensional scale of association locking: designing two consistent correlation filters, defining the two consistent correlation filters as a position target filter and a scale target filter, and respectively realizing target tracking and scale transformation, wherein the former is used for positioning a current frame target, the latter is used for scale estimation of the current frame target, the two filters are relatively independent, different feature types and feature calculation modes are selected for training and testing, the two-dimensional filters are adopted, the length and the width of the target are determined not to be the same change, and even if the target is greatly deformed, the target scale can still be accurately estimated;

the whole framework of the tracker added with the two-dimensional target robustness scale estimation is divided into two parts: position prediction and scale size estimation, and the specific steps are as follows: obtaining training samples in a search area around a target through intensive sampling, extracting features, carrying out Fourier transform on the samples, and training a filter in a frequency domain by using least square regression; obtaining a position kernel correlation filter through online learning, then mapping an output value of the filter to a time domain through discrete inverse Fourier transform, finding a coordinate of a maximum point in a response diagram, wherein the point corresponds to the central position of a target in a sample; then, a scale prediction process based on the correlation filter is similar to a position process of a prediction target, and after the position of the target is obtained, downsampling and upsampling are carried out on the periphery of the target according to a preset scale value to obtain a series of image blocks with different scales; then, carrying out bilinear interpolation on the image blocks, and changing the size of the image blocks to be consistent with the designed scale model to obtain a training sample; next, feature extraction is carried out, after HOG features of the image blocks are obtained, a least square classifier is trained, and a scale target tracker is obtained; in addition, after the characteristics of the image block are obtained, windowing is carried out on the characteristics by using a Hamming window so as to inhibit high-frequency noise caused by the image boundary in a frequency domain due to the adoption of a cyclic matrix;

9. The consumer-grade unmanned aerial vehicle video moving target accurate tracking method according to claim 1, wherein the two-dimensional scale prediction evaluation strategy is: detecting the scale change of a target by learning a two-dimensional scale correlation filter, and reasonably limiting a target search area according to the scale change of the target;

the most important ring of target scale estimation is to give a target specific scale transformation alternative: in the scale prediction evaluation process, for the current image, the size of the target is P × R, the size of the scale target filter is S × S, and for each

Image block J of_n×nWherein a is a filter parameter factor, for an object with large scale change, the alternative scales have a little difference as much as possible, otherwise, the alternative scales should be changedThe size is reduced a little;

10. The consumer-grade unmanned aerial vehicle video moving target accurate tracking method according to claim 1, characterized in that the scale prediction estimation method flow comprises: in the whole tracking algorithm after adding the scale estimation, firstly, a position target filter is used for obtaining position information, then, a scale target filter is moved to obtain the current scale change of a target, and a kernel function involved in the algorithm is a Gaussian kernel function;

the HOG characteristics are adopted in a scale estimation module, firstly, the image is subjected to characteristic extraction, and a training sample f of the d-dimensional characteristics of the image is obtained^lWhere l ═ { l,2, …, d }, each feature dimension corresponds to an associated filter h^lMinimizing the loss function:

wherein g is the output of the filter designed corresponding to the training sample f, λ ≥ 0 is a regularization parameter, a structural error is controlled, and the filter obtained by solving the equation is as follows:

adding a regularization factor, and carrying out l on the filter₂Constraining so that f has no non-zero term, alleviating the problem of zero appearing in the frequency domain of the filter, and avoiding the denominator beingZero phenomenon, solving the updating problem of a linear equation set with dimension of d x d on line, and updating the correlation filter H for obtaining an approximation with robustness^l _tMolecule A of (a)^l _tAnd denominator B_t：