CN110895820A

CN110895820A - KCF-based scale self-adaptive target tracking method

Info

Publication number: CN110895820A
Application number: CN201910190867.0A
Authority: CN
Inventors: 赵运基; 范存良; 刘晓光
Original assignee: Henan University of Technology
Current assignee: Henan University of Technology
Priority date: 2019-03-14
Filing date: 2019-03-14
Publication date: 2020-03-20
Anticipated expiration: 2039-03-14
Also published as: CN110895820B

Abstract

The invention discloses a KCF-based scale self-adaptive target tracking method, which comprises the following steps of: step 1, selecting a tracking target area, creating an initial tracking window and a Padding window, and constructing a tracking target area label; step 2, constructing a nuclear correlation filtering model; step 3, constructing a scale feature set of the tracking target according to the initial tracking target; step 4, determining the maximum displacement in the response image; step 5, determining the maximum response displacement; step 6, determining the rough position of the tracking target; step 7, extracting SPP-FHOG characteristics at the rough position, and correcting a cosine window; step 8, updating a circular matrix model and parameters of kernel correlation filtering; and 9, circularly executing the steps to realize continuous tracking of the tracking target. The method can solve the problem that the KCF algorithm is insensitive in scale, and the problem that the tracking target in the KCF algorithm exceeds the Padding window and is lost in tracking is solved.

Description

KCF-based scale self-adaptive target tracking method

Technical Field

The invention relates to a scale self-adaptive target tracking method based on kernel-Correlation filtering KCF (Kernelized Correlation filters), in particular to a scale self-adaptive rapid target tracking method based on FHEG (fused texture estimation) feature extraction of a tracked target region and kernel-Correlation filtering.

Background

Vision-based target tracking is one of the research hotspots in the field of computer vision. In recent years, with further research on deep learning related theories, some methods for realizing target tracking by applying the deep learning related theories also appear, but the method based on deep learning is limited by the quantity influence of tracking target samples (the tracking target is selected in an initial frame, so that the quantity of the tracking target samples is too small, overfitting of a deep algorithm is easily caused by expanding positive and negative samples of a tracking target area), and meanwhile, the target tracking efficiency based on the deep learning algorithm is low. The related filtering algorithm is widely applied to target tracking, and the tracking efficiency is relatively high. The MOSSE (Minimum Output Sum of Squared error filter) algorithm introduces a related filtering method into the target tracking field for the first time, the tracked target is selected as a gray image, and the algorithm is insensitive to scale transformation of the tracked target, so that the tracking effect is not ideal. On the basis of the MOSSE algorithm, the CSK (circular Structure of tracking-by-detection with Kernels) algorithm adopts a dense sampling method, and fully utilizes the characteristics of a tracking target. The method introduces a nuclear correlation method at the same time, maps the linear space to a high-order nonlinear space, and finally realizes that the linear inseparable is changed into the linear separable. The KCF algorithm provides a multi-feature fusion method on the basis of an MOSSE algorithm and a CSK algorithm, and meanwhile, the tracking efficiency of the tracking algorithm is improved and the tracking speed is increased by applying the property that the Fourier transform of a cyclic matrix can be diagonalized. Although the algorithm can realize effective tracking of the tracking target, the algorithm has two obvious defects: the candidate region prediction of the tracking target is in a Padding window, if the tracking target exceeds the Padding window region due to large motion amplitude, the algorithm cannot track the target, namely the tracking target is lost; when the scale of the tracking target changes, the window function of the tracking target area cannot change along with the scale of the tracking target, and the tracking target is lost along with the updating of the tracking target area. Aiming at the defect that the algorithm is insensitive to Scale transformation, such as DSST (cognitive Scale Space tracker) type algorithm, a method based on a Scale pool is provided, the method needs to search and determine the final Scale of a tracking target in the Scale pool in the tracking process, the searching of the tracking algorithm in the Scale pool affects the tracking real-time performance of the tracking algorithm, and the more the scales of the tracking target in the Scale pool, the lower the searching efficiency and the poorer the tracking real-time performance.

Disclosure of Invention

In order to overcome the defects in the KCF algorithm, one of the objectives of the present invention is to provide a method for determining the scale of a tracking target, which is characterized in that a target scale space feature set is constructed from selected tracking targets, in order to improve the robustness of the tracking target feature to rotation and scale, an idea of Pyramid mean Pooling is applied to extract an SPP-FHOG (spatial gradient Fused custom ordered gradient) feature set of the tracking target, the final scale of the tracking target is determined in the set, and after the scale of the tracking target is determined, the scale space feature set is updated offline (the SPP-FHOG feature does not necessarily achieve linear separability, and the features are subjected to kernel transformation, or whether the incremental learning method is capable of achieving linear separability or not). The second purpose of the present invention is to provide a processing method for the tracking target area exceeding the tagging area, which includes tracking the final response image and the corresponding window area in the KCF algorithm, counting the mean value of the response values in the window area, determining whether the tagging area contains the tracking target by comparing the mean value of the response image in the window area in the new frame, automatically expanding the tagging area by one time if the tracking target is not contained, then reducing the expanded tagging area to ensure the consistency with the characteristic parameters extracted in the original KCF algorithm, determining the position of the tracking target in the new tagging, updating the tagging position, and updating the window and scale of the tracking target, and the method can effectively solve the second defect of the conventional KCF algorithm.

In order to achieve one of the above purposes, the invention provides the following technical scheme:

a KCF-based scale adaptive target tracking method comprises the following steps:

step 1, selecting a tracking target area, creating an initial tracking window and a panning window, constructing a cosine window according to the size of a tracking target and the panning window, and constructing a tracking target area label;

step 2, extracting FHOG characteristics in the Padding window image, windowing the FHOG characteristics, converting the FHOG characteristics into a Fourier space, determining a cyclic matrix of an initial model, and determining a kernel correlation filter model;

step 3, constructing tracking targets with different scales according to the initial tracking target, solving FHOG characteristics in the panning window with different scales under the condition that the scaling of the panning window is not changed, and finally constructing a scale characteristic set of the tracking target;

step 4, extracting FHOG characteristics from the current frame according to the Padding size and the position of the previous frame, windowing, constructing a response image of a candidate image according to a KCF algorithm, and determining a maximum response value in the response image;

step 5, determining whether the candidate area contains the tracking target according to the maximum value and the threshold value of the response value, and if not, introducing an expansion mechanism of a Padding window to finally determine the real maximum response and the maximum displacement of the response image;

step 6, determining the rough position of the tracking target according to the maximum displacement and the position of the previous frame;

step 7, extracting SPP-FHOG characteristics at the rough position, determining the accurate scale of the tracking target in the constructed scale characteristic set by the extracted characteristics, and correcting the cosine window according to the scale;

step 8, after the tracking target is determined, updating a circular matrix model and parameters of kernel correlation filtering;

and 9, circularly executing the

steps

4, 5, 6, 7 and 8 to realize continuous tracking of the tracking target.

Further, in step 3, according to the initial tracking target, a scale feature set of the tracking target is constructed, including:

in the initial frame, a tracking target is determined according to the initial target position and the size of the initial target in the group _ rect.txt in the tracking standard video. The area of the rectangular window is the tracking target;

in the process of calculating the FHOG characteristics of the tracking target area, the size of the Cell is selected to be 4 multiplied by 4, so that when the scale set of the tracking target area is determined, the scale of the tracking target cannot be too small, otherwise, the extraction of the FHOG characteristics of the tracking target area is influenced. Aiming at different scales of tracking targets in different tracking videos, the set rule is as follows: the scale of the selected tracking target in the initial frame is 1, the central point of the tracking target is taken as the center, and the rectangular frame of the tracking target is processed according to the scale coefficientAnd line reduction is carried out, and the area image of the tracking target extracted from the reduced rectangular frame is the tracking target image corresponding to the scale. M corresponding to minimum scale of tracking target_t×N_tImage, the number of image pixel points of the minimum scale should satisfy min (M)_t,N_t) Not less than 16. According to the rule, an image set with 15-level scales is created, and the scales of the corresponding images are S₁…S₁₅. Wherein S is₁The image corresponding to the scale is the minimum scale image M_t,1×N_t,1. Obtaining the minimum image M_t,1×N_t,1FHIG characteristic m of₁×n₁ X 31. FHOG of the image corresponding to the dimension i is characterized by m_i×n_i X 31. FHIG feature set of images with different scales is { m₁×n₁×31…m₁₅×n₁₅X 31} in which m₁₅×n₁₅The x 31FHOG feature has the largest dimension. m is₁×n₁The x 31FHOG feature corresponds to the smallest dimension. Finally, constructing feature description sets of images under different scales

Further, in step 3, constructing a sample description set classified based on KPCA, including:

collecting training sample characteristic into image

Dimension of (5) is m_i×n_iAnd pooling the x 31 characteristic image set by adopting a pyramid mean pooling mode. The pooling process is divided into three sections: the first part is to find m_i×n_iFinally obtaining the integral 31-dimensional integral feature by the average value of all the 31-dimensional features; the second part is that the whole m is divided into_i×n_iDividing the image into 4 areas with 2 multiplied by 2 in total, respectively calculating the mean value of the 31-dimensional features in each area, and finally obtaining the 4 multiplied by 31-dimensional features; the third part is that the whole m is divided into_i×n_iThe method comprises the steps of dividing the image into 16 regions of 4 x 4, respectively calculating the mean value of the 31-dimensional features of each region, and finally constructing the 16 x 31-dimensional features. Of each stageThe feature fusion finally obtains the (1+4+16) × 31 ═ 21 × 31 dimensional pyramid mean pooled FHOG features. And finally constructing a pyramid mean value pooled FHOG feature SPP-FHOG feature set. Feature set of training sample

The FGOG feature set in (1) is converted into the SPP-FGOG feature set with 21 x 31 dimensions

The set of SPP-FHOG features is converted into a vector form, i.e. each 21 x 31-dimensional feature is converted into a line vector form. Finally, the SPP-FHOG feature combination is constructed into a training feature description matrix with dimensions of 15 x (21 x 31).

Further, in step 3, constructing a feature description of the training sample based on the basis vector of the KPCA, including:

the main application of PCA is to reduce the dimensionality of the vector. KPCA is based on PCA, where the input space is mapped to a high-dimensional space by a non-linear function, so that the original linearly indivisible samples become linearly separable. The key point of the method is that the kernel function is introduced to convert the inner product operation of the feature space after nonlinear transformation into the kernel function calculation of the original space, thereby greatly simplifying the calculation amount. Calculating process based on KPCA:

1) constructing n indexes of m original samples into an m multiplied by n matrix form X ═ X₁x₂…x_m]^TI.e. in the form of a matrix of dimensions 15 x (21 x 31) of the SPP-FHOG signature;

2) calculating a kernel matrix, selecting a Gaussian radial basis kernel function, determining a correlation coefficient, and calculating a kernel matrix K corresponding to the SPP-FHOG characteristics; the calculation formula is shown in formula 1:

wherein: p is more than or equal to 1 and less than or equal to m; q is more than or equal to 1 and less than or equal to m; delta is an experience setting parameter and is set through experiments;

3) the kernel matrix is centered using equation 2:

wherein: k_μvThe kernel matrix is the kernel matrix after the kernel matrix K is centralized; k_μwResults representing kernel matrix row centering, and K_wvShows the results of column centering, K_wτA w row τ column matrix representing a kernel matrix;

4) method for solving centralized kernel matrix K by using Jacobi iteration_μvThe characteristic values and the corresponding characteristic vectors are arranged in descending order, and the corresponding characteristic values are correspondingly adjusted; finally obtaining the adjusted characteristic value lambda₁,…λ_mAnd a feature vector v₁,…ν_m；

5) Orthogonalizing the feature vectors by Schmidt orthogonalization β₁,…β_m；

6) Calculating cumulative contribution rate B of characteristic value₁,…B_mThe calculation process is shown in formula 3. According to a given extraction rate p, if B_t≥p，B_t-1P, then select t principal components β₁,…β_t；

7) Computing kernel matrix at t principal components β₁,…β_tThe projection result is expressed as Y ═ K_μvβ, wherein β ═ (β)₁,…β_t)；

The finally obtained projection data Y is the result of KPCA projection operation on the original sample matrix; and (3) constructing the SPP-FHOG feature combination into a 15 x (21 x 31) dimensional training feature description matrix, and converting the training feature description matrix into a final projection matrix after KPCA dimension reduction.

Further, in step 7, the method for determining the scale of the tracking target, which corrects the cosine window, includes:

according to size and bit of Padding in previous frame imageAnd determining a candidate target area in the current frame image. Extracting FHOG characteristics of the candidate region, then carrying out Gaussian windowing processing, converting the processing result into a Fourier space, and determining a final response image according to the theory of kernel correlation of KCF. And searching a maximum response value point in the final response image, wherein the determined position of the point is the final position of the tracking target. After the initial position of the tracking target is determined, the specific scale of the tracking target is uncertain, therefore, at the position of the tracking target, the FHOG characteristic of the area is calculated by Cell with the size of 4 multiplied by 4, the dimension of the FHOG characteristic is the same as the dimension of the FHOG characteristic of the minimum scale image in the scale characteristic set, and the dimension of the FHOG characteristic are m₁×n₁ X 31. And performing spatial pyramid mean pooling on the FHOG characteristics, and finally converting the FHOG characteristics into 21X 31 SPP-FHOG characteristics. According to a KPCA-based classification method, solving Gaussian radial basis-based nuclear correlation results of SPP-FHOG characteristics with 21 x 31 to be measured and characteristics in a scale characteristic set SPP-FHOG15 x (21 x 31), centralizing, and finally obtaining K_h. Calculating K_hProjections on t principal components (obtained by calculation in S14), Y_h＝K_hβ calculating the projection result Y_hAnd 2 norms between samples in the projection result Y (including the result of 15 projections) of the sample library, wherein the scale of the projection result of the scale sample corresponding to the minimum norm result is Y_hAnd the corresponding scale is the scale of the tracking target. And after the scale of the tracking target is determined, updating the cosine window and labels corresponding to the cosine window according to the scale of the tracking target.

Further, step 8 comprises:

according to a model creating method in KCF, a characteristic circular matrix X of a Fourier space corresponding to a current frame is created_f,kParameters α corresponding to the model_f,k(ii) a Then, the kernel-related feature cyclic matrix and the model-corresponding parameters that need to be used in the (k + 1) th frame are as shown in equation 3, equation 4:

X_k+1＝X_k(1-η)+ηX_k-1(3)

α_k+1＝α_k(1-u)+α_k-1(4)

wherein, X_k+1For the feature circulant matrix in the K +1 frame image, α_k+1It is the parameter of the model when the correlation filtering operation is performed in the (K + 1) th frame, η is the parameter of the update of the model circulant matrix, which is set by experiment, and u is the update coefficient of the parameter of the correlation filter model, which is also determined by experiment.

In order to achieve the second purpose, the invention provides the following technical scheme:

a method of processing to track a target area beyond a Padding area, comprising:

further, step 7 includes determining whether the tracking target exists in the response image:

determining a maximum response value omega in a response image of a kth frame_kAnd the displacement of the maximum response corresponding to the maximum value, and summing the target position in the (k-1) th frame with the displacement of the maximum response to determine the final position of the tracking target, thereby determining the scale of the tracking target. After the position and the scale of the tracking target are determined, the finally determined central position of the tracking target is taken as a central coordinate, and an image with the size of a Padding window is selected from a (k + 1) th frame to serve as a candidate area of the tracking target. Determining a response image corresponding to the candidate area image by applying a KCF method, and searching a maximum response value omega in the response image_k+1If ω is_k+1≥(1-ε)ω_k-εω_k-1Wherein epsilon is set through experiments, the target to be tracked in the candidate target area can be determined, wherein omega_k-1Is the maximum response value in the finally determined response image in the (k-1) th frame. And after the tracking target is determined in the candidate region, extracting features at the position of the maximum value response, and finally determining the scale of the tracking target according to the scale feature set. If ω is_k+1<(1-ε)ω_k-εω_k-1Then the candidate target area does not contain the tracking target. An example of the relationship between different existing situations of the tracking target in the Padding window and the maximum response value is shown in fig. 6.

Further, step 7 further includes a process of tracking the target beyond Padding:

when the tracking target is determined not to be in the Padding, the Padding area is expanded to be 1.5 times of the size of the original Padding area, and an image of the Padding area 1.5 times is selected from the current frame to serve as a candidate target area EPaddressing. And integrating the candidate target area at the moment into an image with the original Padding size in a linear interpolation and down-sampling mode. And taking the processed image as a candidate tracking target area for feature extraction, and calculating a response image. And searching the maximum translation position in the response image, and mapping the maximum translation displacement into the EPadding to determine the actual maximum displacement. And finally determining the position of the tracking target according to the position of the last frame of image, and further determining the scale of the tracking target area.

Drawings

FIG. 1 is a flow chart of a KCF-based scale adaptive target tracking method according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of tracking targets and Padding areas in a standard tracking video CarScale;

FIG. 3 is a schematic diagram of FHIG feature calculation;

FIG. 4 is a diagram illustrating cosine window functions;

FIG. 5 is a schematic diagram of SPP-FHOG feature calculation;

fig. 6 is a maximum response value curve of the response image including the target and not including the target.

Detailed Description

The present invention will be further described with reference to the accompanying drawings and the detailed description, and it should be noted that any combination of the embodiments or technical features described below can be used to form a new embodiment without conflict.

Example one

In order to overcome the defects in the conventional KCF algorithm, the invention provides a target tracking method based on KCF scale self-adaptation, which comprises the steps of constructing a tracking target scale feature set under a KCF framework, determining the scale of a candidate target by comparing the feature of a candidate target area with the feature of the set, and introducing a judging and processing mechanism for judging whether the tracking target exists in the candidate target area or not, so that the problem that the conventional KCF algorithm is insensitive to the scale can be effectively solved. The overall flow of the system is shown in fig. 1, and includes the following steps:

110. and tracking the initial target position and the size of the initial target in the group _ rect.txt in the video library according to the standard. The area of the rectangular window is the tracking target; determining the scale of a Padding window according to the size of the initial tracking target area; the region selected by Padding is the (1+1.5) times image region of the initial tracking target scale. CarScale is selected as the video tracking image sequence in the standard tracking video. And determining a tracking target area and a Padding area according to the information in the grountruth _ rect. The tracking target and Padding area settings are shown in fig. 2.

Cell with size of 4 × 4 is applied to extract FHOG feature of Padding area image. A schematic flow chart of the calculation of FHOG feature is shown in fig. 3. In the CarScale video, the size of the tracking target area determined in the initial frame is as follows: 49 × 28, and the size of Padding area is 123 × 70. The FHOG characteristic of Padding area is 30 × 17 × 31. A cosine window of size 30 x 17 was constructed. A schematic diagram of the result of the cosine window construction is shown in fig. 4.

120. The FHOG feature 30 × 17 × 31 for the area inside the Padding window can be decomposed into FHOG feature images of 31 channels.

FHOG feature (M) for ith channel_p＝30)×(N_p17), M may be generated by means of shifting_pN_pSamples, thus constructing a circulant matrix

Label matrix of original tracking target

The label matrix is the cosine window function in M_p×N_pThe value of the range. Finally solving a kernel correlation coefficient matrix

By applying the relation between the circulant matrix and the Fourier transform, the obtained kernel matrix can be expressed as shown in formula 1, wherein K is the kernel correlation matrix, F₂Is a 2D Fourier transform matrix, and K' is a block circulant matrix KIs the matrix M_p×N_p，

Is the result of the 2D fourier transform that generates the matrix K'. The corresponding kernel-space ridge regression formula can be expressed as shown in formula 2, wherein 1 is^mM-dimensional column vector representing 1 finally α is obtained_iThe Fourier transform form of (A) is shown in formula 3, α_Mα, the corresponding response is as shown in equation 4, where K is^XZRepresenting a block circulant matrix K^ZDetermines the location of the tracked target and the final tracking result by determining the corresponding sample of the maximum in the response the model coefficient matrix α for the correlation filtering is finally constructed according to the method described above_M. Generating an original matrix X of a circulant matrix_p＝M_p×N_p*31。

130. Constructing tracking targets with different scales according to the initial tracking target, solving FHOG characteristics in a Padding window with different scales under the condition that the size of the Padding window is not changed, and finally constructing a scale characteristic set of the tracking targets;

in the initial frame, a tracking target is determined according to the initial target position and the size of the initial target in the group _ rect.txt in the tracking standard video. The area of the rectangular window is the tracking target.

In the process of calculating the FHOG characteristics of the tracking target area, the size of the Cell is selected to be 4 multiplied by 4, so that when the scale set of the tracking target area is determined, the scale of the tracking target cannot be too small, otherwise, the extraction of the FHOG characteristics of the tracking target area is influenced. Aiming at different scales of tracking targets in different tracking videos, the set rule is as follows: and reducing the rectangular frame of the tracking target according to the scale coefficient by taking the scale of the tracking target selected in the initial frame as 1 and the central point of the tracking target as the center, wherein the area image of the tracking target extracted from the reduced rectangular frame is the tracking target image corresponding to the scale, and the variation of the reduced scale is selected to be 0.01. M corresponding to minimum scale of tracking target_t×N_tImage, the number of image pixel points of the minimum scale should satisfy min (M)_t,N_t) Not less than 16. According to the rule, an image set with 15-level scales is created, and the scales of the corresponding images are S₁…S₁₅. Wherein S is₁The image corresponding to the scale is the minimum scale image M_t,1×N_t,1. Obtaining the minimum image M_t,1×N_t,1FHIG characteristic m of₁×n₁ X 31. FHOG of the image corresponding to the dimension i is characterized by m_i×n_i X 31. FHIG feature set of images with different scales is { m₁×n₁×31…m₁₅×n₁₅X 31} in which m₁₅×n₁₅The x 31FHOG feature has the largest dimension. m is₁×n₁The x 31FHOG feature corresponds to the smallest dimension. Finally, constructing feature description sets of images under different scales

Collecting images of training sample scale features

The feature image set in (1) is pooled by adopting a pyramid mean pooling mode. The pooling process is divided into three sections: the first part is to find m_i×n_iFinally obtaining the integral 31-dimensional integral feature by the average value of all the 31-dimensional features; the second part is that the whole m is divided into_i×n_iDividing the image into 4 areas with 2 multiplied by 2 in total, respectively calculating the mean value of the 31-dimensional features in each area, and finally obtaining the 4 multiplied by 31-dimensional features; the third part is that the whole m is divided into_i×n_iThe method comprises the steps of dividing the image into 16 regions of 4 x 4, respectively calculating the mean value of the 31-dimensional features of each region, and finally constructing the 16 x 31-dimensional features. And finally, fusing the features of each stage to obtain the FGOG feature with the pyramid mean pooling of (1+4+16) × 31 ═ 21 × 31 dimensions. The Pyramid mean-pooled FHOG feature SPP-FHOG (Spatial Pyramid Polling-Fused histogrammed Gradient) construction process is shown in FIG. 5. Training sample scale feature set

3) the kernel matrix is centered using equation 2:

wherein: k_μvThe kernel matrix is the kernel matrix after the kernel matrix K is centralized; k_μwRepresenting the result of the row centering of the kernel matrix, K_wvRepresenting the result of the centering of the columns of the kernel matrix, K_wτA w row τ column matrix representing a kernel matrix;

140. A response image and a maximum response displacement are determined.

In the CarScale video, the size of the tracking target area determined in the initial frame is as follows: 49 × 28, and the size of Padding area is 123 × 70. The FHOG characteristic of Padding area is 30 × 17 × 31. A cosine window of size 30 x 17 was constructed. The cosine window is the label of the tracking target area. And obtaining a label description yf in a frequency domain through Fourier transform. The FHOG characteristic of the Padding area to be obtained is 30 × 17 × 31. Windowing is carried out on the FHOG characteristics of the tracking target, then the FHOG characteristics are converted into a Fourier space, and the conversion result is used as a model Mode _ alphaf in kernel correlation filtering. And determining a kernel matrix Kf of the characteristic of the target area FHOG according to a Gaussian kernel correlation formula. The gaussian kernel correlation formula is shown in formula 7, wherein,

which represents the inverse of the fourier transform,

a complex conjugate of a fourier transform matrix being matrix x,

a transposed matrix of the fourier transform of x. In calculating Kf, the parameter in the gaussian kernel correlation matrix is set to δ equal to 0.5.

After determining the kernel correlation matrix of the tracking target Padding region feature yf and the FHOG feature, constructing a parameter matrix α of kernel correlation filtering according to the formula 8_fWherein λ is 10^-4。

α_f＝yf./(Kf+λ) (8)

After the kernel-dependent filter model Mode _ alp of the tracking target is determinedhaf, parameter matrix α_fThen, the candidate position for the tracking target in the next frame image is the same as the current frame tracking target position. And extracting an image area in the next frame image at the same position with the size of Padding, calculating FHOG characteristics of the Padding area, windowing, carrying out Fourier transform, and finally obtaining a Fourier description form zf of the FHOG characteristics of the candidate area. And calculating a Gaussian kernel correlation matrix Kzf between the FOG characteristic zf of the candidate target region and the model Mode _ alphaf according to the Gaussian kernel correlation function, as shown in formula 7. And determining a final response image according to the formula 9, wherein the position coordinate corresponding to the maximum value in the response image is determined to be the maximum displacement of the tracking target in the Padding window.

150. And correcting the maximum response shift according to the maximum response value in the response image.

In the response image RM, a maximum response value Max _ RM is determined, if the maximum response value is smaller than a set threshold value RM _ T equal to 0.06, the size of a Padding window in the current frame is doubled, an image area in the current frame is extracted according to the doubled Padding window, the extracted image is down-sampled, the size of the extracted image area is the same as that of the original Padding window after processing, FHOG features of the processed image area are extracted, windowing processing and fourier transform processing are performed, a fourier description zf ' is finally obtained, a kernel correlation matrix Kzf ' is calculated according to a kernel correlation filter model Mode _ alphaf and a gaussian kernel correlation function, and a response image RM ' is finally determined according to equation 9. And determining the maximum response displacement in the response image, and further determining the final position of the tracking target.

In the response image RM, the maximum response value Max _ RM is determined, and if the maximum response value is not less than the set threshold value RM _ T, which is 0.06, it is indicated that the tracking target is within the candidate Padding window, so that it is not necessary to enlarge the Padding window to redetermine the maximum displacement of the tracking target.

160. And determining the specific position of the tracking target.

In Padding windowAfter the maximum displacement of the tracking target is determined, the central position of the tracking target in the previous frame is (X)_C,k-1，Y_C,k-1) Maximum displacement of the tracking target in the current frame is

The final center position of the tracking target in the current frame is:

170. and determining a target scale according to the SPP-FHOG feature set, and correcting the cosine window.

And after the position of the tracking target is determined, taking the target position of the tracking result as the center, and extracting the SPP-FHOG characteristic in the current tracking target area. The feature is projected into the projection vector calculated in 130 to be projected, and a similar projection feature vector is determined in the scale projection data set, and finally the actual scale of the tracking target is determined. And the cosine window function is corrected according to the scale. Finally, the features in the Padding area of the tracked target in the current frame are determined. Finally, the circulant matrix is determined.

180. And updating the kernel correlation filtering model.

The characteristic cyclic matrix X of the Fourier space corresponding to the previous frame_kParameters α corresponding to the model_KThe kernel-related feature circulant matrix and the model corresponding parameters required in the (k + 1) th frame are shown in equation 10 and equation 11, where η is 0.075.

X_k+1＝X_k(1-η)+ηX_k-1(10)

α_k+1＝α_k(1-u)+α_k-1(11)

190. And circularly executing 140-180, and finally realizing target tracking of the self-adaptive scale adjustment.

The target tracking method provided by the invention is verified by a standard tracking video experiment, and the area overlapping rate of the tracking result is improved on the premise of not reducing the error of a tracking center.

The above-mentioned embodiments are merely preferred embodiments of the present invention, and the protection scope of the present invention is not limited thereby, and the insubstantial changes and substitutions made by those skilled in the art based on the present invention are within the protection scope of the present invention.

Claims

1. A KCF-based scale adaptive target tracking method is characterized by comprising the following steps:

step 5, determining whether the candidate area contains the tracking target according to the maximum response value and the response value threshold, and if not, introducing an expansion mechanism of a Padding window to finally determine the real maximum response and the maximum displacement of the response image;

step 7, extracting SPP-FHOG (Spatial Gradient-Fused texture ordered Gradient) features at the rough position, determining the accurate scale of the tracking target in the constructed scale feature set by the extracted features, and correcting the cosine window according to the scale;

and 9, circularly executing the steps 4, 5, 6, 7 and 8 to realize continuous tracking of the tracking target.

2. The method for scale-adaptive target tracking based on kernel-dependent filtering according to claim 1, wherein in step 3, constructing a scale feature set of a tracked target according to an initial tracked target comprises:

in the initial frame, determining the area of a tracking target rectangular window as a tracking target according to the initial target position and the size of the initial target in the group _ rect.txt in the tracking standard video;

in the process of calculating FHOG characteristics of the tracking target area, the size of the Cell is selected to be 4 multiplied by 4; aiming at different scales of the tracking target in different tracking videos, reducing a rectangular frame of the tracking target according to a scale coefficient by taking the scale of the tracking target selected in an initial frame as 1 and taking a central point of the tracking target as a center, wherein an area image of the tracking target extracted from the reduced rectangular frame is a tracking target image corresponding to the scale; m corresponding to minimum scale of tracking target_t×N_tImage, the number of image pixel points of the minimum scale should satisfy min (M)_t，N_t) Not less than 16; according to the rule, an image set with 15-level scales is created, and the scales of the corresponding images are S₁…S₁₅(ii) a Wherein, the image corresponding to the scale S1 is the minimum scale image M_t，1×N_t，1(ii) a Obtaining the minimum image M_t，1×N_t，1Characterization of FHIG m₁×n₁X 31, is marked as

Dimension S_iFHIG characterization of the corresponding image as m_i×n_iX 31, is marked as

I is more than or equal to 1 and less than or equal to 15; FHIG feature set of images with different scales is { m₁×n₁×31，…，m₁₅×n₁₅X 31 }; finally, constructing feature description sets of images under different scales

In step 3, constructing a sample description set classified based on KPCA, including:

collecting images of training sample scale features

Pooling is carried out by adopting a pyramid mean pooling mode; the pooling process is divided into three sections: the first part is to find m_i×n_iFinally obtaining the integral 31-dimensional integral feature by the average value of all the 31-dimensional features; the second part is that the whole m is divided into_i×n_iDividing the image into 4 areas with 2 multiplied by 2 in total, respectively calculating the mean value of the 31-dimensional features in each area, and finally obtaining the 4 multiplied by 31-dimensional features; the third part is that the whole m is divided into_i×n_iDividing the image into 16 areas of 4 multiplied by 4, respectively calculating the mean value of the 31-dimensional features of each area, and finally constructing the 16 multiplied by 31-dimensional features; fusing the features obtained by the three parts through calculation to finally obtain a pyramid mean value pooled FHIG feature set with dimensions of (1+4+16) × 31 ═ 21 × 31; aggregating sample scale features into an image

The FGOG feature set in (1) is converted into an SPP-FGOG feature set with 21 x 31 dimensions; the SPP-FHOG feature set is converted into a vector form, namely each feature with 21 x 31 dimensions is converted into a row vector form, and finally the SPP-FHOG feature combination is constructed into a training feature description matrix with 15 x (21 x 31) dimensions;

in step 3, constructing feature description of training samples based on basis vectors of KPCA, including:

1) constructing n indexes of m original samples into an m multiplied by n matrix form X ═ X₁x₂… x_m]^TI.e. in the form of a matrix of dimensions 15 x (21 x 31) of the SPP-FHOG signature;

3) the kernel matrix is centered using equation 2:

4) method for solving centralized kernel matrix K by using Jacobi iteration_μvThe characteristic values and the corresponding characteristic vectors are arranged in descending order, and the corresponding characteristic values are correspondingly adjusted; finally obtaining the adjusted characteristic value lambda₁，…λ_mAnd a feature vector v₁，…v_m；

5) Orthogonalizing the feature vectors by Schmidt orthogonalization β₁，…β_m；

6) Calculating cumulative contribution rate B of characteristic value₁，…B_mThe calculation process is shown in formula 3. According to a given extraction rate p, if B_t≥p，B_t-1P, then select t principal components β₁，…β_t；

7) Computing kernel matrix at t principal components β₁，…β_tThe projection result is expressed as Y ═ K_uvβ, wherein β ═ (β)₁，…βt)；

3. The kernel-correlation-based scale-adaptive target tracking method according to claim 2,

in step 7, determining the accurate scale of the tracking target in the constructed scale feature set according to the extracted features, and correcting the cosine window according to the scale, wherein the method comprises the following steps:

determining a candidate target area in the current frame image according to the size and the position of Padding in the previous frame image; extracting FHOG characteristics of the candidate region, then performing Gaussian windowing, converting a processing result into a Fourier space, and determining a final response image according to a kernel correlation theory of KCF; searching a maximum response value point in the final response image, wherein the position determined by the point is the final position of the tracking target; after the initial position of the tracking target is determined, the specific scale of the tracking target is uncertain, therefore, at the position of the tracking target, the FHOG characteristic of the area is calculated by Cell with the size of 4 multiplied by 4, the dimension of the FHOG characteristic is the same as the dimension of the FHOG characteristic of the minimum scale image in the scale characteristic set, and the dimension of the FHOG characteristic are m₁×n₁X 31; performing spatial pyramid mean pooling on the FHOG characteristic, and finally converting the FHOG characteristic into a 21 x 31 SPP-FHOG characteristic; according to a KPCA-based classification method, solving Gaussian radial basis-based nuclear correlation results of SPP-FHOG characteristics with 21 x 31 to be determined and 15 x (21 x 31) characteristics in a scale characteristic set SPP-FHOG, centralizing, and finally obtaining K_h(ii) a Calculating K_hProjections on t principal components (obtained by calculation in S14), Y_h＝K_hβ calculating the projection result Y_hAnd 2 norms between samples in the projection result Y (including the result of 15 projections) of the sample library, wherein the scale of the projection result of the scale sample corresponding to the minimum norm result is Y_hThe corresponding scale, namely the scale of the tracking target; after the scale of the tracking target is determined, the cosine window and the cosine window pair are updated according to the scale of the tracking targetThe corresponding labels;

step 7, determining whether the tracking target exists in the response image:

determining a maximum response value omega in a response image of a kth frame_kAnd the displacement of the maximum response corresponding to the maximum value, and summing the target position in the (k-1) th frame and the displacement of the maximum response to determine the final position of the tracking target, thereby determining the scale of the tracking target; after the position and the scale of the tracking target are determined, selecting an image with the size of a Padding window in a (k + 1) th frame as a candidate area of the tracking target by taking the finally determined central position of the tracking target as a central coordinate; determining a response image corresponding to the candidate area image by applying a KCF method, and searching a maximum response value omega in the response image_k+1If ω is_k+1≥(1-ε)ω_k-εω_k-1Where ε is a linearization factor, which can be determined experimentally initially. It may be determined that the candidate target region contains the target to be tracked, where ω_k-1The maximum response value in the finally determined response image in the (k-1) th frame; after a tracking target is determined in the candidate region, extracting features at the position of the maximum value response, and finally determining the scale of the tracking target according to the scale feature set; if ω is_k+1＜(1-ε)ω_k-εω_k-1If the candidate target area does not contain the tracking target;

step 7, further comprising the following steps of tracking the target beyond Padding:

when the tracking target is determined not to be in the panning, expanding the panning area to be 1.5 times of the size of the original panning area, and selecting an image of the panning area of 1.5 times as a candidate target area EPaddressing in the current frame; integrating the candidate target area into an image with the original Padding size in a linear interpolation and down-sampling mode; taking the processed image as a candidate tracking target area for feature extraction, and calculating a response image; searching the maximum translation position in the response image, and mapping the maximum translation displacement to the EPadding to determine the actual maximum displacement; and finally determining the position of the tracking target according to the position of the last frame of image, and further determining the scale of the tracking target area.

4. The kernel-correlation-based scale-adaptive target tracking method according to claim 1,

in step 8, updating the circulant matrix model and parameters of kernel correlation filtering, including:

according to a model creating method in KCF, a characteristic circular matrix X of a Fourier space corresponding to a current frame is created_f，kParameters α corresponding to the model_f，k(ii) a Then, the kernel-related feature cyclic matrix and the model-corresponding parameters that need to be used in the (k + 1) th frame are as shown in equation 3, equation 4:

X_k+1＝X_k(1-η)+ηX_k-1(3)

α_k+1＝α_k(1-u)+α_k-1(4)