CN113033356A

CN113033356A - Scale-adaptive long-term correlation target tracking method

Info

Publication number: CN113033356A
Application number: CN202110265773.2A
Authority: CN
Inventors: 索继东; 王思鹏; 张伟红; 柳晓鸣; 陈晓楠
Original assignee: Dalian Maritime University
Current assignee: Dalian Maritime University
Priority date: 2021-03-11
Filing date: 2021-03-11
Publication date: 2021-06-25
Anticipated expiration: 2041-03-11
Also published as: CN113033356B

Abstract

The invention discloses a scale-adaptive long-term correlation target tracking method. Firstly, a first frame image is preprocessed to obtain a time context regression model R_cTarget appearance regression model R_tAnd a detector D_rf. For the tracking of the subsequent frame, a search area is created according to the target position of the previous frame, and HOG characteristics are extracted to train a related filter template; carrying out translation estimation to estimate the target position of the current frame; then, a scale pool is constructed, the optimal scale of the predicted target is estimated in a self-adaptive mode, and the target state of the current frame is obtained; if the maximum response y_sLess than threshold τ_rUsing D_rfExecuting redetection, and updating the position of the target; next, R is updated_c(ii) a If the maximum response y_sGreater than a threshold τ_aMore, moreNovel R_t(ii) a Then, D is updated_rf(ii) a Finally, the predicted target state R of the current frame is obtained_c、R_tAnd D_rf. The above steps are repeated until the video image sequence is finished. Compared with algorithms such as long-term correlation filtering (LCT) and the like, the method improves the performance of target tracking and has better robustness in various complex environments.

Description

Scale-adaptive long-term correlation target tracking method

Technical Field

The invention belongs to the field of visual target tracking, and particularly relates to a scale-adaptive long-term correlation target tracking method.

Background

The target tracking belongs to the content of video analysis, namely, the video image sequence is processed. The task of target tracking is to determine the position and size of a target in each subsequent frame by analyzing the group of video image sequences after information such as the position and size of the target in a first frame is given, and to accurately frame the target. The target tracking technology integrates knowledge of mathematics, physics, image processing and the like, and has wide application and development prospects in aspects of military, national defense, intelligent transportation and the like. For example, in the military field, for missile defense, guidance systems, air traffic control, etc.; the method is used for real-time monitoring of traffic flow, traffic accident detection, pedestrian counting and the like in the field of intelligent traffic.

The correlation filtering based tracking algorithm considers the tracking process as a process of template matching and ridge regression. The Kernel Correlation Filtering (KCF) is a correlation filtering target tracking algorithm added with a kernel function, the algorithm utilizes a multichannel directional gradient Histogram (HOG) to extract features, and then positive and negative samples are constructed by performing cyclic shift, but the KCF cannot deal with the problem of target scale change.

In order to solve the problem, a Discriminant Scale Space (DSST) target tracking algorithm adopts a three-dimensional scale space correlation filter joint tracking mode, firstly, a two-dimensional discriminant position filter is used for determining the position information of a target in a video sequence, and then, a one-dimensional scale filter is used for detecting a tracking target output by the position filter, so that the optimal scale of the current target is output. A long-term correlation filtering (LCT) algorithm is added with a correlation filter responsible for detecting confidence coefficient on the basis of a DSST algorithm position filtering and scale filtering framework, the added detection mechanism enables the algorithm to have good tracking performance in videos with the properties of shielding, exceeding visual fields and the like, but the tracking performance of the LCT algorithm needs to be improved in the environment with the properties of overlarge target scale change, low resolution, quick motion, illumination change and the like.

Disclosure of Invention

In view of this, the present invention provides a scale-adaptive long-term correlation (LCSA) target tracking algorithm to improve the accuracy of the existing target tracking algorithm, so that the target tracking algorithm can overcome the interference of various environmental factors.

Therefore, the invention provides the following technical scheme:

a scale-adaptive long-term correlation target tracking algorithm comprises the following steps:

(1) initializing a target detection frame; extracting the characteristics of the target according to the detection frame in the first frame, and initializing a time context regression model R_cTarget appearance regression model R_tAnd a detector D_rf； wherein R_cIs responsible for translation estimation, R_tResponsible for the scale estimation, D_rfIs responsible for the heavy detection;

(2) for the t-th frame, the target position (x) according to the t-1 th frame_t-1，y_t-1) Cutting a search window in the t frame, extracting HOG characteristics, and training a related filter template;

(3) performing translation estimation using R_cAnd calculating a response based on the correlation filter fraction obtained from the correlation filter template

And estimating the current frame position

(4) Constructing a scale pool using a multi-scale search strategy, mapping y by correlation_s and R_tSelf-adaptive estimation of optimal rulerDegree of rotation

Obtaining the initial predicted target state of the t-th frame

(5) If it is not

Use of D_rfPerforming double detection, finding candidate state set X, and for each state X 'in X'_iCalculate confidence score y'_iIf max (y'_i)＞τ_tThen, then

wherein ,τ_rIs a first threshold value, τ_tIs the second threshold value, and is,

a correlation map representing the prediction; obtaining the final predicted target state of the t frame

(6) Updating R_c；

(7) If it is not

Updating R_t； wherein ,τ_aIs a third threshold;

(8) update D_rf；

(9) And repeating (2) to (8) for the t +1 th frame until the video sequence is ended.

Further, the correlation filter template trained in the step (2) is specifically:

where, w denotes the correlation filter,x_m,nrepresenting image blocks x having m x n pixels, y (m, n) being denoted x_m,nAs a gaussian sample label generated for the training sample,

represents the mapping to kernel space and λ represents the regularization parameter.

wherein the coefficient a is defined by the formula:

f denotes the discrete Fourier operator, x_m,nThe image block x is represented by m × n pixels, and x and y represent pixel coordinates.

Further, the tracking task is to calculate a correlation map by means of the image blocks z of a new one of the image frames of size m x n, said step (3) being responsive

Determined according to the following formula:

wherein f denotes a learned target appearance model,. indicates a Hadamard product, by finding

Finds the predicted position of the target.

Further, the scale search policy of step (4) is:

the dimension of the template is fixed as S_T＝(s_x,s_y) The scale pool is set to S ═ t₁,t₂,…t_kFor the current frame at t_is_t|t_iSample k sizes in e S to find a sumThe appropriate target scale is then interpolated bilinearly so that the samples at each scale become S_TThe size of the samples is consistent;

the final target scale response value is calculated as follows:

is the ith scale sample in the scale pool, with size t_iS_t。

Further, R is updated_cUpdate R_tThe method comprises the following steps:

for R_c and R_tThe coefficients f and a in the model are updated frame by frame at the learning rate α as:

further, D_rfA support vector machine detector;

update D_rfThe method comprises the following steps:

in each frame, a training set is given { (v)_i,c_i) 1, 2., N } and N samples, where v_iIs the feature vector generated by the ith sample, c_iE { +1, -1} is a sample label, and the objective function for solving the support vector machine detector hyperplane h is:

wherein ,

<h,v>denotes h andthe inner product of v; λ represents a regularization parameter;

updating hyperplane parameters using a passive algorithm:

wherein ,

is the gradient of the loss function with respect to h, and τ e (0, ∞) is a hyper-parameter that controls the update rate of h.

Further, τ_rIs 0.15, τ_tIs 0.5, τ_aIs 0.38.

The invention has the following beneficial effects:

according to the scale-adaptive long-term correlation target tracking method provided by the invention, a scale adaptive strategy and an LCT target tracking frame are effectively fused, and firstly, a scale pool is introduced, so that an algorithm can adaptively select the optimal scale for finding the position of a tracking target. The multi-scale search can be combined with the position estimation filter more stably, and the situation that the scale estimation is over-biased is not easy to occur.

The scale self-adaptive long-term correlation target tracking method provided by the invention has the advantages that the tracking precision is obviously improved compared with that of an LCT algorithm in classical target tracking scenes such as scale change, aspect ratio, low resolution, rapid movement, complete shielding, partial shielding, beyond visual field, illumination change, visual field point conversion, camera movement, similar objects and the like on an Unmanned Aerial vehicle 123(UAV 123).

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.

FIG. 1 is a flowchart of a scale-adaptive long-term relevance target tracking method according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of a scale-adaptive long-term correlation target tracking method according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of an adaptive scale model of a scale-adaptive long-term correlation target tracking method according to an embodiment of the present invention;

FIG. 4 is a graph of tracking accuracy of the present invention and other algorithms in a UAV123 data set over 123 video sequences;

FIG. 5 is a graph of the success rate of the present invention and other algorithms in a UAV123 data set at 123 video sequences;

FIG. 6 is a graph of tracking accuracy and success rate in the context of Scale Variation in the UAV123 data set by the present invention and other algorithms;

FIG. 7 is a graph of tracking accuracy and success rate for the present invention and other algorithms in the context of Aspect Ratio Change in the UAV123 data set;

FIG. 8 is a graph of tracking accuracy and success rate for the present invention and other algorithms in the context of Low Resolution in the UAV123 data set;

FIG. 9 is a graph of tracking accuracy and success rate for the present invention and other algorithms in the context of Fast Motion in the UAV123 data set;

FIG. 10 is a graph of tracking accuracy and success rate for the present invention and other algorithms in the context of Full Occlusion in the UAV123 dataset;

FIG. 11 is a graph of tracking accuracy and success rate for the present invention and other algorithms in the Partial Occupusion context in the UAV123 dataset;

FIG. 12 is a graph of tracking accuracy and success rate for the present invention and other algorithms in the UAV123 data set Out-of-View (Out-of-View) context;

FIG. 13 is a graph of tracking accuracy and success rate for the present invention and other algorithms in the context of a Background around the UAV123 dataset;

FIG. 14 is a graph of tracking accuracy and success rate for the present invention and other algorithms in the context of the drilling Variation in the UAV123 data set;

figure 15 is a graph of tracking accuracy and success rate for the present invention and other algorithms in the context of a Viewpoint Change in the UAV123 dataset;

FIG. 16 is a graph of tracking accuracy and success rate for the present invention and other algorithms in the context of Camera Motion in the UAV123 data set;

FIG. 17 is a graph of tracking accuracy and success rate for the present invention and other algorithms in the context of Similar objects in the UAV123 data set.

Detailed Description

In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

Referring to fig. 1 and fig. 2, two flowcharts of a scale-adaptive long-term correlation target tracking method in an embodiment of the present invention are respectively shown, where the method includes the following steps:

pretreatment: and reading the target position of the first frame to obtain a large background detection frame, a small background detection frame and a Gaussian regression label corresponding to each detection frame.

Processing of the video of the first frame: extracting target direction gradient Histogram (HOG) features according to the detection frame, and performing Fourier transform to a frequency domain to obtain x_fAdding a Gaussian kernel function to obtain a Gaussian response k in the frequency domain_fWherein the Gaussian kernel function is

Both regression models are mapped to kernel space, defined as k (x, x ') ═ Φ (x) · Φ (x'). Using x_f and k_fCalculating classifier parameters to obtain a time context regression model R_cTarget appearance regression model R_tAnd a detector D_rf, wherein R_cIs responsible for translation estimation, R_tResponsible for the scale estimation, D_rfIs responsible for the retesting.

Processing of the current frame (tth frame):

(1) target position (x) according to t-1 th frame_t-1，y_t-1) Cutting a search window in the t frame, extracting HOG characteristics, and training a related filter template;

wherein, the trained relevant filter template specifically comprises:

wherein w denotes a correlation filter, x_m,nRepresenting image block x having M x N pixels, y (M, N) being represented by x_m,nAs a gaussian sample label generated for the training sample,

representing to kernel spaceλ represents a regularization parameter.

The above formula can also be written as w ═ Σ_m,na(m,n)φ(x_m,n)。

Wherein the coefficient a is defined by the formula:

(2) Performing translation estimation using a context regression model R_cAnd the associated filter fraction computing response y_tAnd estimating the position (x) of the t-th frame_t，y_t)；

The tracking task may calculate the correlation map by means of the image blocks z of a new one of the image frames of size m x n. The response of step (2) may be determined according to the following equation:

Finds the predicted position of the target.

(3) As shown in FIG. 3, a scale pool is constructed using a multi-scale search strategy, with a relevant mapping y_sAnd a target appearance regression model R_tSelf-adaptive estimation of optimal scale

Obtaining the initial predicted target state of the t-th frame

Wherein, the scale search strategy is as follows: the dimension of the template is fixed as S_T＝(s_x,s_y) The scale pool is set to S ═ t₁,t₂,…t_kFor the current frame at t_is_t|t_iE to S), k sizes are sampled to find a proper target scale, and then bilinear interpolation is adopted to enable the samples of all scales to become S and S_TThe size of the samples is consistent.

The final target scale response value is calculated as follows:

is the ith scale sample in the scale pool, with size t_iS_t. Through the process of bilinear interpolation,

will be adjusted to S_T。

(4) Setting a first threshold τ_rSecond threshold τ_t，

A correlation map representing the prediction; if it is not

Use of D_rfPerforming a redetection to find a set of candidate states X, for each state X in X_i'calculating confidence score y'_iIf max (y'_i)＞τ_tThen, then

Obtaining the final predicted target state of the t frame

(5) Updating model R_c；

(6) Setting a third threshold τ_aIf, if

Updating model R_t；

The steps of (A), (B), (C5) In (6), for R_c、R_tThe model, the coefficients f and A in the model are updated frame by frame at a learning rate alpha as follows:

(7) update detector D_rf；

In the examples of the present invention, D_rfIs a Support Vector Machine (SVM) detector. For SVM, in each frame, a training set is given { (v)_i,c_i) 1, 2., N } and N samples, where v_iIs the feature vector generated by the ith sample, c_iE { +1, -1} is a sample label, and the objective function for solving the hyperplane h of the SVM detector is:

wherein ,

<h,v>represents the inner product of h and v.

Hyperplane parameters are efficiently updated using a passive algorithm:

wherein ,

Obtaining the final predicted target state of the t frame from the steps (2) to (4)

Obtaining R of the current frame from the steps (5) - (7)_c、R_t and D_rf；

(8) Repeating (1) to (7) for the t +1 th frame until the end of the video sequence.

According to the scale-adaptive long-term correlation target tracking method provided by the embodiment of the invention, a scale adaptive strategy and an LCT target tracking frame are effectively fused, and firstly, a scale pool is introduced, so that an algorithm can adaptively select the optimal scale for finding the position of a tracking target. The multi-scale search can be combined with the position estimation filter more stably, and the situation that the scale estimation is over-biased is not easy to occur.

Based on the above embodiments, the present embodiment provides a simulation experiment.

Simulation conditions are as follows: the emulation provided in this embodiment is performed in the Intel (R) core (TM) i3-4170CPU @3.70GHz 3.70GHz, hardware environment with 4.00GB memory, and software environment with MATLAB R2016 a. The experimental parameters were set as follows: regularization parameter λ 10^-4The gaussian kernel σ is 0.1, the learning rate α is 0.01, and the threshold τ is set to_r＝0.15，τ_a＝0.38，τ_tScale pool set to [1, 0.99, 1.01, ] 0.5]. The algorithm presented herein is then compared to LCT and other existing classical target tracking algorithms.

Simulation content: the method provided by the evaluation is carried out on a large reference data set UAV-123 containing 123 videos, and the evaluation mode selects one-pass success rate (OPE), namely, the target position given by the first frame starts to track, and the target position cannot be reinitialized after the tracking fails.

Fig. 4 to 17 are graphs of experimental results of the experiment, where the LCSA represents the scale-adaptive long-term correlation filtering tracking method proposed by the present invention, and the LCT, KCF _ GaussHog, CSK, IVT, and DFT represent other excellent target tracking algorithms, respectively. The LCSA algorithm provided by the invention has the score of 0.40 on the success rate curve comparison graph in FIG. 5 and the score of 0.58 on the accuracy curve comparison graph in FIG. 4. By taking a long-term correlation tracking algorithm (LCT) as a benchmark algorithm, the LCSA is improved by 2.56% in AUC success rate and is improved by 5.26% in accuracy compared with the LCT according to experimental data on the UAV-123. Although the accuracy is slightly lower than that of the LCT under the background speckle condition on the UAV123, the performance is improved compared with classical target tracking algorithms such as the LCT under classical target tracking scenes such as scale change, aspect ratio, low resolution, rapid motion, complete shielding, partial shielding, beyond visual field, illumination change, visual field point transition, camera motion and similar objects.

Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims

1. A scale-adaptive long-term correlation tracking algorithm, comprising the steps of:

And estimating the current frame position

(4) Constructing a scale pool using a multi-scale search strategy, mapping y by correlation_s and R_tSelf-adaptive estimation of optimal scale

Obtaining the initial predicted target state of the t-th frame

(5) If it is not

i＝arg max_iy′_i； wherein ,τ_rIs a first threshold value, τ_tIs the second threshold value, and is,

(6) Updating R_c；

(7) If it is not

Updating R_t； wherein ,τ_aIs a third threshold;

(8) update D_rf；

2. The scale-adaptive long-term correlation tracking algorithm according to claim 1, wherein the correlation filter template trained in step (2) is specifically:

wherein w denotes a correlation filter, x_m,nRepresenting image blocks x having m x n pixels, y (m, n) being denoted x_m,nAs a gaussian sample label generated for the training sample,

3. The scale-adaptive long-term correlation tracking algorithm according to claim 1, wherein the correlation filter template trained in step (2) is specifically:

wherein the coefficient a is defined by the formula:

4. A scale-adaptive long-term correlation tracking algorithm according to claim 3, wherein the tracking task is to calculate a correlation map by means of image blocks z of a new frame in an image frame of size m x n, and said step (3) is responsive

Determined according to the following formula:

Finds the predicted position of the target.

5. The scale-adaptive long-term correlation tracking algorithm according to claim 1, wherein the scale search policy of step (4) is:

the dimension of the template is fixed as S_T＝(s_x,s_y) The scale pool is set to S ═ t₁,t₂,…t_kFor the current frame at t_is_t|t_iE S, sampling k sizes to find a proper target scale, and then adopting bilinear interpolation to enable the samples of all scales to become S_TThe size of the samples is consistent;

the final target scale response value is calculated as follows:

is the ith scale sample in the scale pool, with size t_iS_t。

6. The scale-adaptive long-term correlation tracking algorithm according to claim 3, wherein R is updated_cUpdate R_tThe method comprises the following steps:

7. the scale-adaptive long-term correlation tracking algorithm according to claim 1, wherein D is_rfA support vector machine detector;

update D_rfThe method comprises the following steps:

wherein l (h, (v, c)) -max {0,1-c < h, v > }, < h, v > represents the inner product of h and v; λ represents a regularization parameter;

updating hyperplane parameters using a passive algorithm:

wherein ,

8. The scale-adaptive long-term correlation tracking algorithm according to claim 1, wherein τ is_rIs 0.15, τ_tIs 0.5, τ_aIs 0.38。