CN111862167B - Rapid robust target tracking method based on sparse compact correlation filter - Google Patents

Rapid robust target tracking method based on sparse compact correlation filter Download PDF

Info

Publication number
CN111862167B
CN111862167B CN202010705423.9A CN202010705423A CN111862167B CN 111862167 B CN111862167 B CN 111862167B CN 202010705423 A CN202010705423 A CN 202010705423A CN 111862167 B CN111862167 B CN 111862167B
Authority
CN
China
Prior art keywords
sparse
filter
channel
correlation filter
group
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010705423.9A
Other languages
Chinese (zh)
Other versions
CN111862167A (en
Inventor
王菡子
梁艳杰
熊逻
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xiamen University
Original Assignee
Xiamen University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xiamen University filed Critical Xiamen University
Priority to CN202010705423.9A priority Critical patent/CN111862167B/en
Publication of CN111862167A publication Critical patent/CN111862167A/en
Application granted granted Critical
Publication of CN111862167B publication Critical patent/CN111862167B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/277Analysis of motion involving stochastic approaches, e.g. using Kalman filters
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

A fast robust target tracking method based on a sparse compact correlation filter relates to a computer vision technology. Constructing a basic sample by a target and the context thereof, constructing a training sample by all cyclic translation samples of the basic sample, and training a loss function of the multi-channel correlation filter by a DCF (digital-to-analog converter); in the multi-task learning, an exclusive sparse regular term and a group sparse regular term are integrated to construct an intra-group and inter-group sparse regular term, time consistency constraint is introduced in target tracking to relieve the problem that DCF degrades along with time, an intra-group and inter-group sparse regular term and a time regular term are introduced to define a regression loss function, and a sparse correlation filter is learned; channel pruning removes the redundant filters integrally, sorts the D channel filters according to the importance degree, and selects the channel filter sorted in front for tracking; and constructing a Lagrange function, and optimizing the regression loss by adopting an ADMM algorithm. The discriminability and the interpretability of the filter are effectively improved, the precision is high, and the speed is high.

Description

Rapid robust target tracking method based on sparse compact correlation filter
Technical Field
The invention relates to a computer vision technology, in particular to a fast robust tracking method based on a sparse compact correlation filter.
Background
The human body has high visual perception capability to the outside video, and the brain can quickly and accurately locate the moving target in the video. Computers are intended to mimic the visual perception of the human brain, to the human level in terms of speed and accuracy. Visual tracking is a fundamental problem in computer vision, and is the fundamental content of visual perception, and the speed and precision of the visual perception determine the real-time performance and precision of the visual perception. Target tracking is one of important research directions in the field of computer vision, and plays an important role in the fields of intelligent video monitoring, human-computer interaction, robot navigation, virtual reality, medical diagnosis, public safety and the like. The task first selects an object of interest in an initial frame of a video and then predicts the state of the object in the next successive frame. In addition, target tracking is a challenging task, and the target often changes in appearance (such as occlusion, deformation, rotation, etc.) during tracking, and is accompanied by complicated illumination changes, interference of similar targets in the background, and rapid movement of the target, which all make the task difficult. In recent years, a target tracking method based on correlation filtering and deep learning becomes a mainstream direction of current research due to good performance of the target tracking method.
Methods based on correlation filters have become one of the research hotspots in the field of target tracking in recent years, and have a good speed advantage and achieve good results in numerous data sets and various games. The DCF provides a hot application trend of the correlation filtering in the target tracking field. Subsequently, many researchers made improvements to DCF. In order to process scale and rotation change, the LDES proposes that a phase filter simultaneously estimates the scale and the rotation angle of a target in a polar coordinate system; the MCPF effectively embeds a correlation filter into the particle filter tracking framework to handle scale changes in the target tracking process. In order to effectively alleviate the problem of the spatial boundary effect of the filter, the DSARCF and ASRCF respectively introduce a dynamic significance response map and an adaptive spatial response map in filter learning to adaptively weight the filter coefficients. In order to effectively alleviate the problem of filter degradation over time, the STRCF and the LADCF introduce a time regularization term in filter learning to perform robust and fast target tracking. In order to train the filter by adopting more samples, the CACF and the BACF respectively use the context sample and the background sample for training the related filter, thereby ensuring real-time and greatly improving the precision. In order to select more robust and discriminative features, the HCF applies the multilayer depth features extracted from the VGG-Net into a related filtering tracking framework, and realizes accurate and robust tracking through fusion of multilayer response graphs. GFSDCF combines feature selection and filter learning, so that the learned filter has stronger discrimination capability, the problem of overfitting can be effectively relieved, and the target tracking precision is improved. The ECO uses a decomposition matrix to effectively compress the original depth features to train a continuous convolution filter, thereby achieving efficient and accurate target tracking. In order to enhance the discriminability of the response diagram, the LMCF introduces a correlation filter into a Struck tracking frame, and fully utilizes the characteristic of high speed of the correlation filter and the characteristic of strong distinguishing capability of the Struck to realize the fast and robust tracking.
In recent years, a method based on deep learning has become another research hotspot in the field of target tracking with its advantage of higher precision. Currently, target tracking methods based on deep learning can be roughly divided into two categories: the first type of deep learning-based method is to construct a deep network, select a sample for off-line training, and realize target tracking by on-line fine tuning of the network, and the representative method is MDNET. The MDNET-based tracking framework has the advantages that VITAL maintains the characteristic of good robustness in the tracking process through counterstudy, and ADNET predicts various states of a target in the tracking process through reinforcement study to adapt to a complex tracking environment. The tracking accuracy of the method is high, but the real-time tracking is difficult to achieve. The second category is that the target tracking problem is converted into an instance retrieval problem, and a matching function used for instance retrieval is obtained by off-line training of external video data. SINT and SimFC solve the problem of deep similarity measurement by training twin networks offline; DCFNet and CFNet add differentiable relevant filtering layers in the twin network to train end-to-end characteristic expressions suitable for relevant filtering; EAST introduces reinforcement learning in a twin network to adaptively select the depth feature of a certain layer to realize rapid and robust tracking; SiamRPN further introduces RPN networks in the twin network to effectively handle the scale and aspect ratio changes of the target during tracking. This off-line training method is mostly capable of achieving real-time, but its accuracy depends on the network and data used for training.
Disclosure of Invention
The invention aims to provide a rapid robust target tracking method based on a sparse compact correlation filter, which can effectively solve the problems of overfitting, high calculation complexity and the like of the traditional correlation filter and improve the robustness of an algorithm to shielding, deformation, rotation and background interference.
The invention comprises the following steps:
1) for a given target, constructing a basic sample by the target and the context thereof, wherein a training sample is composed of all cyclic translation samples of the basic sample, labels of the cyclic translation samples are determined by Gaussian functions, and DCF trains a loss function of a multi-channel correlation filter;
2) in the multi-task learning, integrating an exclusive sparse regular term and a group sparse regular term together to construct an intra-group-inter-group sparse regular term, introducing a time consistency constraint in filter learning by a tracker based on DCF to relieve the problem of DCF degradation along with time in target tracking, and introducing the intra-group-inter-group sparse regular term and the time regular term to define a regression loss function so as to learn a sparse correlation filter;
3) performing channel pruning based on the regression loss function defined in the step 2), integrally removing redundant filters to further accelerate the calculation process, and calculating the change of the regression loss by removing the filter of a certain specific channel; sorting the D channel filters according to the importance degree, and selecting C channel filters which are sorted in the front for tracking;
4) and constructing a Lagrange function, optimizing regression loss by adopting an ADMM algorithm, and completing the fast robust target tracking based on the sparse compact correlation filter.
In step 1), for a given target, a basic sample is constructed from the target and its context, a training sample is composed of all cyclic shift samples of the basic sample, labels of the cyclic shift samples are determined by gaussian functions, and a specific method for DCF training a loss function of a multichannel correlation filter may be:
in the t-th frame, for a given target, a basic sample is constructed by the target and the context thereof, a training sample is composed of all cyclic translation samples of the basic sample, labels of the cyclic translation samples are determined by Gaussian functions, and a loss function of the DCF training multichannel correlation filter is defined as follows:
Figure BDA0002594515090000031
wherein the content of the first and second substances,
Figure BDA0002594515090000032
for cyclic convolution operation symbols, Xt∈RM×N×DAnd Wt∈RM×N×DFor the base sample sum filter of the t-th frame, Y ∈ RM×NFor the label determined by the Gaussian function, M, N and D are respectively used for representing the width, the height and the number of channels, and xi is a regular term parameter; the goal of filter learning is to minimize the loss function
Figure BDA0002594515090000033
In equation (1), the multi-channel features representing the base samples are all used to train the multi-channel correlation filter.
In step 2), in the multitask learning, integrating the exclusive sparsity regular term and the group sparsity regular term together to construct an intra-group sparsity regular term, as follows:
Figure BDA0002594515090000034
wherein the content of the first and second substances,
Figure BDA0002594515090000035
Figure BDA0002594515090000036
represents WtOf the vector at the (m, n) position,
Figure BDA0002594515090000037
represents WtThe element in the (m, n, d) position, theta is a weight parameter to balance exclusive sparsity and group sparsity regularization terms;
group sparsity is performed on channels2The norm is then spatially scaled by l1A norm for removing spatially redundant features such that the filter is spatially sparse; exclusive sparse on channel1The norm is then spatially scaled by l2Norm, which is used to remove redundant features on the channel, so that the filter is sparse on the channel;
in target tracking, the DCF-based tracker introduces a temporal consistency constraint in filter learning to alleviate the DCF degradation over time, the introduced temporal regularization term is as follows:
Figure BDA0002594515090000041
wherein, Wt-1A filter representing a t-1 th frame;
introducing an intra-group-to-inter-group sparsity regularization term REG(Wt) And a temporal regularization term RT(Wt) Defining a regression loss function to learn the sparse correlation filter, the regression loss function being as follows:
Figure BDA0002594515090000042
wherein λ and μ are each REG(Wt) And RT(Wt) The regularization term parameter of (2).
In step 2), the weight parameter θ in equation (2) is 0.2, and the regularization term parameter λ in equation (4) is 1.0 × 10-4,μ=5。
In step 3), the specific method for performing channel pruning based on the regression loss function defined in step 2) is as follows:
ΔL=L(Xt,Y;W′t)-L(Xt,Y;Wt), (5)
wherein, WtAnd W'tRespectively representing filters which are not pruned and filters which are pruned; for a filter with D channels, channel pruning requires estimation of the loss function 2DChannel pruning can be completed in the next time;
calculating the change of the regression loss by removing a filter of a specific channel; order to
Figure BDA0002594515090000043
Is represented by
Figure BDA0002594515090000044
Generated response graph:
Figure BDA0002594515090000045
Dt={Xt,Y},
Figure BDA0002594515090000046
order to
Figure BDA0002594515090000047
To represent
Figure BDA0002594515090000048
The vector of (c) can be obtained:
Figure BDA0002594515090000049
wherein the content of the first and second substances,
Figure BDA00025945150900000410
vector representing current response graph
Figure BDA00025945150900000411
The loss after the pruning is carried out,
Figure BDA00025945150900000412
to represent
Figure BDA00025945150900000413
Loss of non-pruned branches; to pair
Figure BDA00025945150900000414
At the point of
Figure BDA00025945150900000415
The first order Taylor expansion is performed as follows:
Figure BDA00025945150900000416
wherein the content of the first and second substances,
Figure BDA00025945150900000417
representing a first-order residue in the Taylor representation; substituting equation (7) into equation (6) and removing
Figure BDA00025945150900000418
The following can be obtained:
Figure BDA0002594515090000051
thus, for each filter
Figure BDA0002594515090000052
Its degree of importance
Figure BDA0002594515090000053
The calculation formula is as follows:
Figure BDA0002594515090000054
wherein the content of the first and second substances,
Figure BDA0002594515090000055
tensor Z representing the response maptAn element located at the (m, n, d) position; according to the degree of importance
Figure BDA0002594515090000056
And sequencing the D channel filters, and only selecting the C channel filters sequenced at the front for tracking. This channel selection process is performed in the first frame and only the selected C channel filter remains in subsequent frames, thus the computational complexity can be significantly reduced.
In step 3), the channel parameter C is 64.
In step 4), the specific method for constructing the lagrangian function and optimizing the regression loss by using the ADMM algorithm may be:
in order to minimize the regression loss proposed by the formula (4) in the step 2), an ADMM algorithm is adopted for optimization; introduction considering that the sparse compact filter is compressed from the D channel to the C channel at the initial frameAn auxiliary variable Ut=WtAnd constructs the lagrange function as follows:
Figure BDA0002594515090000057
wherein, VtIs Lagrange multiplier, gamma is penalty factor; the naadmm algorithm alternately solves for the following variables:
for correlation filter WtOptimizing; firstly, the correlation filter W is processed by adopting the Pasteval theoremtThe conversion from the time domain to the frequency domain of (1) is as follows:
Figure BDA0002594515090000058
wherein the content of the first and second substances,
Figure BDA0002594515090000059
representing the sign of the discrete fourier transform,
Figure BDA00025945150900000510
represents a dot-by-symbol; similar to the solution of STRCF and LADCF,
Figure BDA00025945150900000511
each vector in
Figure BDA00025945150900000512
The solution of (a) is as follows:
Figure BDA00025945150900000513
wherein the content of the first and second substances,
Figure BDA00025945150900000514
Figure BDA00025945150900000515
to represent
Figure BDA00025945150900000516
The element at the (m, n) position in (c); computing
Figure BDA0002594515090000061
Then, the obtained product is subjected to inverse Fourier transform to obtain the product
Figure BDA0002594515090000062
For auxiliary variable UtOptimization is carried out in order to solve the auxiliary variable UtThe following sub-problems are optimized:
Figure BDA0002594515090000063
auxiliary variable UtEach element of
Figure BDA0002594515090000064
The solution of (a) is as follows:
Figure BDA0002594515090000065
Figure BDA0002594515090000066
wherein
Figure BDA0002594515090000067
(·)+For the contraction operator, the following is defined: (x)+=max(0,x)。
Updating lagrange multiplier VtAnd a penalty factor γ: known filter WtAnd an auxiliary variable UtFor lagrange multiplier VtAnd the penalty factor gamma is updated as follows:
Figure BDA0002594515090000068
γi+1=max(γmin,ργi) (ii) a Where i is the iteration index, γminIs the minimum value of γ, and ρ is the scale factor.
In step 4), the parameter γmin=0.002,γ0At 0.01, ρ 0.2, ADMM iterates 2 times.
Compared with the prior art, the invention has the following advantages:
the sparse and compact correlation filter provided by the invention is used for robust real-time target tracking, and can effectively relieve the problems of overfitting and high calculation complexity in the tracking process. The proposed intra-group-inter-group sparse regularization term can effectively select specific target features with discriminant power to train the filter, so that the discriminant and interpretability of the filter are effectively improved. On one hand, a new intra-group-inter-group sparse regular term is introduced in filter learning, so that the learned filter keeps sparse in space and channels simultaneously, the characteristic of a specific target with discriminability can be activated in the tracking process to effectively relieve the problem of overfitting, and the robustness of the algorithm to shielding, deformation, rotation and background interference is improved. On the other hand, a new channel pruning algorithm based on Taylor expansion is adopted to prune the filter, so that a small number of filters with strong response aiming at a specific target are effectively reserved, a large number of redundant filters are removed, the problem of overfitting can be further relieved, and the calculation complexity is effectively reduced. The solution of the correlation filtering uses an efficient ADMM algorithm, which can efficiently optimize the filter with only a few iterations. Experimental results on various challenging data sets show that the method can obtain a good tracking result, and is high in precision and high in speed. On the OTB-2015 data set, the DP/AUC score of the invention is 93.3%/70.0%, and the speed can reach 20 FPS.
Detailed Description
The present invention belongs to a target tracking method of the related filtering class, and the following embodiments will further describe the present invention.
The embodiment of the invention comprises the following steps:
A. in the t-th frame, for a given target, a basic sample is constructed by the target and the context thereof, a training sample is composed of all cyclic translation samples of the basic sample, labels of the cyclic translation samples are determined by Gaussian functions, and a loss function of the DCF training multichannel correlation filter is defined as follows:
Figure BDA0002594515090000071
wherein the content of the first and second substances,
Figure BDA0002594515090000072
for cyclic convolution operation symbols, Xt∈RM×N×DAnd Wt∈RM×N×DFor the base sample sum filter of the tth frame, Y ∈ RM×NFor the label determined by the Gaussian function, M, N and D are respectively used for representing the width, the height and the number of channels, and xi is a regular term parameter; the goal of filter learning is to minimize the loss function
Figure BDA0002594515090000073
In equation (1), the multi-channel features representing the base samples are all used to train the multi-channel correlation filter. However, a significant portion of these features are unrelated to the particular target being tracked or otherwise useless for distinguishing between background and target. In order to select discriminative and target-specific features to train the filter to effectively alleviate the problems of overfitting and high computational complexity, a sparse and compact correlation filter is proposed below for fast and robust tracking.
B. In step a, the correlation filter of each channel is usually trained by the feature of each channel individually. However, different signature channels exhibit different characteristics, some signature channels being mutually exclusive and some signature channels being mutually cooperative. In training a multichannel correlation filter, mutually exclusive eigen-channels require individual training of the correlation filter of the respective channel and mutually cooperating eigen-channels require joint training of their correlation filters. At this time, the learning problem of the multi-channel correlation filter can be converted into a multi-task learning problem, wherein each task corresponds to the correlation filter of each channel.
In multi-task learning, the exclusive sparse regular term can effectively promote model parameters of different tasks to be in a competitive state, and finally intra-group sparsity can be realized; the group sparse regular term can effectively promote model parameters of different tasks to be in a collaborative state, and finally, inter-group sparsity can be achieved. For target tracking, both intra-group sparsity and inter-group sparsity can effectively alleviate the over-fitting problem. In order to solve the problem of using only exclusive sparse or group sparse regularization terms, the two are integrated together to construct a new regularization term, i.e. an intra-group-inter-group sparse regularization term, as follows:
Figure BDA0002594515090000074
wherein the content of the first and second substances,
Figure BDA0002594515090000075
Figure BDA0002594515090000076
representing a filter WtOf the vector at the (m, n) position,
Figure BDA0002594515090000077
representing a filter WtIs located at the (m, n, d) position. θ is a weight parameter to balance the exclusive sparsity and the group sparsity regularization terms.
On the one hand, group sparseness is first performed on channels l2The norm is then taken over spatially1Norm, and thus, group sparseness can effectively remove spatially redundant features, making the filter spatially sparse. Exclusive sparseness, on the other hand, is first performed on the channel1The norm is then taken over spatially2Norm, and therefore exclusive sparsity, can effectively remove features that are redundant on the channel, making the filter sparse on the channel. In general, the proposed intra-group-to-inter-group sparse regularization term can effectively select specific target features with discriminative power to train a filterThereby effectively improving the discriminability and the interpretability of the filter.
In target tracking, some recent DCF-based trackers often introduce a temporal consistency constraint in filter learning to effectively alleviate the DCF degradation over time, and the temporal regularization term that is usually introduced is as follows:
Figure BDA0002594515090000081
wherein, Wt-1Representing the filter for frame t-1.
In order to fully utilize the sparsity in space, the sparsity in channels and the consistency in time, an intra-group-to-inter-group sparsity regular term R is simultaneously introduced into a regression loss functionEG(Wt) And a temporal regularization term RT(Wt) To learn the sparse correlation filter, as follows:
Figure BDA0002594515090000082
wherein λ and μ are each REG(Wt) And RT(Wt) The regularization term parameter of (2). By designing the regression loss function, the learned filter can effectively enhance the characteristics with discrimination power and can effectively relieve the problem of overfitting.
C. The regression loss function defined by step B can make the learned correlation filter sparse in space and channels, and can alleviate the problem of overfitting. However, the sparse correlation filters are not compact enough in structure, and the computation complexity is still high, so that in order to further accelerate the computation process, an effective solution is to remove the redundant filters as a whole, namely channel pruning. The goal of channel pruning, which is usually based on an evaluation criterion of the importance of the filter, is to minimize the impact of removing the filter. Oracle channel pruning is the best criterion for removing redundant filters, and it estimates the importance of a filter based on the variation of the loss as follows:
ΔL=L(Xt,Y;W′t)-L(Xt,Y;Wt), (5)
wherein, WtAnd W'tRespectively showing the filters without pruning and after pruning. Oracle channel pruning can achieve very high accuracy but its computational complexity is high. For a filter for the D channel, Oracle channel pruning requires estimation of the loss function 2DChannel pruning can be completed the next time.
Based on the idea of Oracle channel pruning, channel pruning based on taylor expansion calculates the change in regression loss by removing the filter of a particular channel. Order to
Figure BDA0002594515090000091
Is represented by
Figure BDA0002594515090000092
The generated response graph:
Figure BDA0002594515090000093
Dt={Xt,Y},
Figure BDA0002594515090000094
order to
Figure BDA0002594515090000095
To represent
Figure BDA0002594515090000096
The vector of (c) can be obtained:
Figure BDA0002594515090000097
wherein the content of the first and second substances,
Figure BDA0002594515090000098
vector representing current response graph
Figure BDA0002594515090000099
Damage after pruningIn the light of the above-mentioned problems,
Figure BDA00025945150900000910
representing response map vectors
Figure BDA00025945150900000911
Loss without pruning. To pair
Figure BDA00025945150900000912
At the point of
Figure BDA00025945150900000913
The first order Taylor expansion is performed as follows:
Figure BDA00025945150900000914
wherein the content of the first and second substances,
Figure BDA00025945150900000915
representing the first-order residue in the taylor equation. Substituting equation (7) into equation (6) and removing
Figure BDA00025945150900000916
The following can be obtained:
Figure BDA00025945150900000917
thus, for each filter
Figure BDA00025945150900000918
Its degree of importance
Figure BDA00025945150900000919
The calculation formula is as follows:
Figure BDA00025945150900000920
wherein
Figure BDA00025945150900000921
Tensor Z representing the response maptIs located at the (m, n, d) position. According to the degree of importance
Figure BDA00025945150900000922
And sequencing the D channel filters, and only selecting the C channel filters sequenced at the front for tracking. This channel selection process is performed in the first frame and only the selected C channel filter is retained in subsequent frames, thus the computational complexity can be significantly reduced.
D. To minimize the regression loss presented by equation (4) in step B, the problem was optimized using the ADMM algorithm. Considering that the sparse compact filter is compressed into a C channel from a D channel in an initial frame, an auxiliary variable U is introducedt=WtAnd constructs the lagrange function as follows:
Figure BDA00025945150900000923
wherein, VtIs lagrange multiplier and gamma is penalty factor. The ADMM algorithm is adopted to solve the following variables alternately:
to filter WtAnd (6) optimizing. To efficiently align the filter WtOptimizing by first applying Parceval's theorem to filter WtThe conversion from the time domain to the frequency domain of (1) is as follows:
Figure BDA0002594515090000101
wherein the content of the first and second substances,
Figure BDA0002594515090000102
a discrete fourier transform symbol is represented,
Figure BDA0002594515090000103
indicating a dot-by-symbol. Similar to the solution of STRCF and LADCF,
Figure BDA0002594515090000104
each vector in
Figure BDA0002594515090000105
The solution of (a) is as follows:
Figure BDA0002594515090000106
wherein the content of the first and second substances,
Figure BDA0002594515090000107
Figure BDA0002594515090000108
to represent
Figure BDA0002594515090000109
Is located at the (m, n) position. Computing
Figure BDA00025945150900001010
Then, the obtained product is subjected to inverse Fourier transform to obtain the product
Figure BDA00025945150900001011
For auxiliary variable UtAnd (6) optimizing. To solve for the auxiliary variable UtThe following sub-problems are optimized:
Figure BDA00025945150900001012
auxiliary variable UtEach element of
Figure BDA00025945150900001013
The solution of (a) is as follows:
Figure BDA00025945150900001014
Figure BDA00025945150900001015
wherein the content of the first and second substances,
Figure BDA00025945150900001016
(·)+for the contraction operator, the following is defined: (x)+=sign(x)max(0,x)。
Updating lagrange multiplier VtAnd a penalty factor γ: known filter WtAnd an auxiliary variable UtFor lagrange multiplier VtAnd the penalty factor gamma is updated as follows:
Figure BDA00025945150900001017
γi+1=max(γmin,ργi). Where i is the iteration index, γminIs the minimum value of γ, and ρ is the scale factor.
In step B, the weight parameter θ in formula (2) is 0.2, and the regularization term parameter λ in formula (4) is 1.0 × 10-4,μ=5。
In step C, the channel parameter C is 64.
In step D, the parameter γmin=0.002,γ0At 0.01, ρ 0.2, ADMM iterates 2 times.
Table 1 shows the accuracy, success rate and speed of the OTB100 data set of the present invention and several other correlation filter-based target tracking methods. Wherein SCCF is the method of the present invention.
TABLE 1
Tracking method CCOT MCPF ECO STRCF MCCT ASRCF LADCF GFSDCF SCCF
Precision (%) 89.6 87.3 90.9 88.0 91.7 91.9 90.6 92.5 93.3
Success rate (%) 66.6 62.8 68.7 67.5 68.2 68.9 69.6 68.9 70.0
Speed (FPS) 2.1 3.2 8.4 5.2 6.8 24.8 10.6 7.8 19.8
CCOT corresponds to the method proposed by Danelljan, M.et al (Danelljan, M., Robinson, A., Khan, F.S., Felsberg, M.: Beyond correction filters: left connected operation operators for visual tracking. in: ECCV. pp.472-488,2016);
MCPF corresponds to the method proposed by Zhang, T, et al (Zhang, T., Xu, C., Yang, M.H.: Multi-task correction parameter filter for robust object tracking. in: CVPR. pp.4819-4827,2017);
ECO corresponds to the method proposed by Danelljan, M. et al (Danelljan, M., Bhat, G., Khan, F.S., Felsberg, M.: ECO: effective restriction operators for tracking. in: CVPR. pp.6931-6939,2017);
STRCF corresponds to the method proposed by Li, F et al (Li, F., Tian, C., Zuo, W., Zhang, L., Yang, M.H.: leaving spatial-temporal regulated correlation filters for visual tracking. in: CVPR. pp.4904-4913,2018);
MCCT corresponds to the method proposed by Wang, N. et al (Wang, N., Zhou, W., Tian, Q., Hong, R., Wang, M., Li, H.: Multi-core correlation filters for robust visual tracking. in: CVPR.pp.4844-4853,2018);
ASRCF corresponds to the method proposed by Dai, K. et al (Dai, K., Wang, D., Lu, H., Sun, C., Li, J.: Visual tracking via adaptive sizing filters. in: CVPR. pp.4670-4679,2019);
LADCF corresponds to the methods proposed by Xu, T.et al (Xu, T., Feng, Z., Wu, X., Kittler, J.: Learning adaptive correction filters, visual temporal correlation prediction IEEE TIP 28(11), 5596-;
GFSDCF corresponds to the method proposed by Xu, T, et al (Xu, T., Feng, Z.H., Wu, X.J., Kittler, J.: Joint group failure selection and discrete filter leaving for generating visual object tracking. in: ICCV. pp.7950-7960,2019).
According to the invention, the sparse and compact correlation filter is learned to carry out rapid robust visual tracking, the learned correlation filter can adaptively select the characteristics related to the target and inhibit redundancy and characteristics unrelated to the target, the problems of overfitting and high calculation complexity of the traditional correlation filter can be effectively relieved, and the robustness of the algorithm to shielding, deformation, rotation and background interference is improved. Through sparseness and time consistency constraint, the correlation filter adaptively selects discriminant features of a small number of channels which are continuous in time and have regional characteristics. The derived correlation-filtered learning problem can be solved by ADMM, which can be solved efficiently with only a few iterations. Experiments are carried out on various challenging data sets (OTB-2013, OTB-2015, VOT-2016, VOT2017 and UAV20L), and the results show that the method can obtain better performance, high precision and high speed. Specifically, on the OTB-2015 dataset, the tracker AUC scored 70.0% with a velocity of approximately 20FPS when using the Handcrafted and CNN features.

Claims (10)

1. A fast robust target tracking method based on a sparse compact correlation filter is characterized by comprising the following steps:
1) for a given target, constructing a basic sample by the target and the context thereof, wherein a training sample is composed of all cyclic translation samples of the basic sample, labels of the cyclic translation samples are determined by Gaussian functions, and DCF trains a loss function of a multi-channel correlation filter;
2) in the multi-task learning, integrating an exclusive sparse regular term and a group sparse regular term together to construct an intra-group-inter-group sparse regular term, introducing a time consistency constraint in filter learning by a tracker based on DCF to relieve the problem of DCF degradation along with time in target tracking, and introducing the intra-group-inter-group sparse regular term and the time regular term to define a regression loss function so as to learn a sparse correlation filter;
3) performing channel pruning based on the regression loss function defined in the step 2), integrally removing redundant filters to further accelerate the calculation process, and calculating the change of the regression loss by removing the filter of a certain specific channel; sorting the D channel filters according to the importance degree, and selecting C channel filters which are sorted in the front for tracking;
4) and constructing a Lagrange function, optimizing regression loss by adopting an ADMM algorithm, and completing the fast robust target tracking based on the sparse compact correlation filter.
2. The sparse compact correlation filter-based fast robust target tracking method as claimed in claim 1, wherein in step 1), for a given target, a basic sample is constructed from the target and its context, the training sample is composed of all cyclic shift samples of the basic sample, the labels of the cyclic shift samples are determined by gaussian function, and the DCF training loss function of the multichannel correlation filter is specifically:
in the t-th frame, for a given target, a basic sample is constructed by the target and the context thereof, a training sample is composed of all cyclic translation samples of the basic sample, labels of the cyclic translation samples are determined by Gaussian functions, and a loss function of the DCF training multichannel correlation filter is defined as follows:
Figure FDA0003572314540000011
wherein the content of the first and second substances,
Figure FDA0003572314540000012
for cyclic convolutionCalculation of symbol, Xt∈RM×N×DAnd Wt∈RM×N×DRespectively basic sample and filter of the t-th frame, Y ∈ RM×NFor the label determined by the Gaussian function, M, N and D are respectively used for representing the width, the height and the number of channels, and xi is a regular term parameter; the goal of filter learning is to minimize the loss function
Figure FDA0003572314540000013
In equation (1), the multi-channel features representing the base samples are all used to train the multi-channel correlation filter.
3. The sparse compact correlation filter-based fast robust target tracking method according to claim 2, wherein in step 2), in the multitask learning, an exclusive sparse regularization term and a group sparse regularization term are integrated together to construct an intra-group-inter-group sparse regularization term, as follows:
Figure FDA0003572314540000021
wherein the content of the first and second substances,
Figure FDA0003572314540000022
Figure FDA0003572314540000023
represents WtOf the vector at the (m, n) position,
Figure FDA0003572314540000024
represents WtThe element in the (m, n, d) position, theta is a weight parameter to balance exclusive sparsity and group sparsity regularization terms;
group sparsity is performed on channels2The norm is then taken over spatially1A norm for removing spatially redundant features such that the filter is spatially sparse; exclusive sparse on channel1The norm is then spatiallyCarry out l2Norm, which is used to remove redundant features on the channel, so that the filter is sparse on the channel.
4. A fast robust target tracking method based on sparse compact correlation filter as claimed in claim 3 wherein the weight parameter θ is 0.2.
5. The sparse compact correlation filter-based fast robust target tracking method as claimed in claim 3, wherein in step 2), in the target tracking, the DCF-based tracker introduces a temporal consistency constraint in filter learning to alleviate the DCF degradation problem over time, and the introduced temporal regularization term is as follows:
Figure FDA0003572314540000025
wherein, Wt-1A filter representing a t-1 th frame;
introducing an intra-group-to-inter-group sparsity regularization term REG(Wt) And a temporal regularization term RT(Wt) Defining a regression loss function to learn the sparse correlation filter, the regression loss function being as follows:
Figure FDA0003572314540000026
wherein λ and μ are each REG(Wt) And RT(Wt) The regularization term parameter of (2).
6. The sparse compact correlation filter-based fast robust target tracking method as claimed in claim 5 wherein the regularization term parameter λ ═ 1.0 × 10-4,μ=5。
7. The sparse compact correlation filter-based fast robust target tracking method according to claim 5, wherein in step 3), the specific method for performing channel pruning based on the regression loss function defined in step 2) is as follows:
ΔL=L(Xt,Y;Wt')-L(Xt,Y;Wt), (5)
wherein, WtAnd Wt' filter without and after pruning respectively; for a filter with D channels, channel pruning requires estimation of the loss function 2DChannel pruning can be completed in the next time;
calculating the change of the regression loss by removing a filter of a specific channel; order to
Figure FDA0003572314540000031
Dt={Xt,Y},
Figure FDA0003572314540000032
Order to
Figure FDA0003572314540000033
To represent
Figure FDA0003572314540000034
The vector of (a) is obtained:
Figure FDA0003572314540000035
wherein the content of the first and second substances,
Figure FDA0003572314540000036
vector representing current response graph
Figure FDA0003572314540000037
The loss after the pruning is carried out,
Figure FDA0003572314540000038
represent
Figure FDA0003572314540000039
Loss of non-pruned branches; to pair
Figure FDA00035723145400000310
At the point of
Figure FDA00035723145400000311
The first order Taylor expansion is performed as follows:
Figure FDA00035723145400000312
wherein the content of the first and second substances,
Figure FDA00035723145400000313
representing a first-order residue in a Taylor expression; substituting equation (7) into equation (6) and removing
Figure FDA00035723145400000314
Obtaining:
Figure FDA00035723145400000315
thus, for each filter
Figure FDA00035723145400000316
Its degree of importance
Figure FDA00035723145400000317
The calculation formula is as follows:
Figure FDA00035723145400000318
wherein, the first and the second end of the pipe are connected with each other,
Figure FDA00035723145400000319
tensor Z representing the response maptAn element located at the (m, n, d) position; according to the degree of importance
Figure FDA00035723145400000320
And sequencing the D channel filters, and only selecting the C channel filters sequenced at the front for tracking.
8. The sparse compact correlation filter-based fast robust target tracking method of claim 7, wherein C-64.
9. The fast robust target tracking method based on the sparse compact correlation filter as claimed in claim 7, wherein in step 4), the specific method for constructing the lagrangian function and optimizing the regression loss by using the ADMM algorithm is as follows:
in order to minimize the regression loss proposed by the formula (4) in the step 2), an ADMM algorithm is adopted for optimization; considering that the sparse compact filter is compressed from the D channel to the C channel in the initial frame, an auxiliary variable U is introducedt=WtAnd constructs the lagrange function as follows:
Figure FDA00035723145400000321
wherein, VtIs Lagrange multiplier, gamma is penalty factor; the naadmm algorithm alternately solves for the following variables:
for correlation filter WtOptimizing; firstly, the correlation filter W is processed by adopting the Pasteval theoremtThe conversion from the time domain to the frequency domain of (1) is as follows:
Figure FDA0003572314540000041
wherein the content of the first and second substances,
Figure FDA0003572314540000042
indicates a discrete Fourier transform symbol, indicates a point-by-symbol,
Figure FDA0003572314540000043
each vector in
Figure FDA0003572314540000044
The solution of (a) is as follows:
Figure FDA0003572314540000045
wherein the content of the first and second substances,
Figure FDA0003572314540000046
Figure FDA0003572314540000047
represent
Figure FDA0003572314540000048
The element at the (m, n) position in (c); computing
Figure FDA0003572314540000049
Then, it is subjected to inverse Fourier transform to obtain
Figure FDA00035723145400000410
For auxiliary variable UtOptimization is carried out in order to solve the auxiliary variable UtThe following sub-problems are optimized:
Figure FDA00035723145400000411
auxiliary variable UtEach element of
Figure FDA00035723145400000412
The solution of (a) is as follows:
Figure FDA00035723145400000413
Figure FDA00035723145400000414
wherein the content of the first and second substances,
Figure FDA00035723145400000415
(·)+for the contraction operator, the following is defined: (x)+=max(0,x);
Updating lagrange multiplier VtAnd a penalty factor γ: known filter WtAnd an auxiliary variable UtFor lagrange multiplier VtAnd the penalty factor gamma is updated as follows:
Figure FDA00035723145400000416
γi+1=max(γmin,ργi) (ii) a Where i is the iteration index, γminIs the minimum value of γ, and ρ is the scale factor.
10. The sparse compact correlation filter-based fast robust target tracking method as claimed in claim 9 wherein the parameter γ ismin=0.002,γ0At 0.01, ρ 0.2, ADMM iterates 2 times.
CN202010705423.9A 2020-07-21 2020-07-21 Rapid robust target tracking method based on sparse compact correlation filter Active CN111862167B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010705423.9A CN111862167B (en) 2020-07-21 2020-07-21 Rapid robust target tracking method based on sparse compact correlation filter

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010705423.9A CN111862167B (en) 2020-07-21 2020-07-21 Rapid robust target tracking method based on sparse compact correlation filter

Publications (2)

Publication Number Publication Date
CN111862167A CN111862167A (en) 2020-10-30
CN111862167B true CN111862167B (en) 2022-05-10

Family

ID=73000807

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010705423.9A Active CN111862167B (en) 2020-07-21 2020-07-21 Rapid robust target tracking method based on sparse compact correlation filter

Country Status (1)

Country Link
CN (1) CN111862167B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112767450A (en) * 2021-01-25 2021-05-07 开放智能机器(上海)有限公司 Multi-loss learning-based related filtering target tracking method and system
CN113379804B (en) * 2021-07-12 2023-05-09 闽南师范大学 Unmanned aerial vehicle target tracking method, terminal equipment and storage medium
CN114117926B (en) * 2021-12-01 2024-05-14 南京富尔登科技发展有限公司 Robot cooperative control algorithm based on federal learning

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106203495A (en) * 2016-07-01 2016-12-07 广东技术师范学院 A kind of based on the sparse method for tracking target differentiating study
CN109859241A (en) * 2019-01-09 2019-06-07 厦门大学 Adaptive features select and time consistency robust correlation filtering visual tracking method
CN109859244A (en) * 2019-01-22 2019-06-07 西安微电子技术研究所 A kind of visual tracking method based on convolution sparseness filtering
CN110490907A (en) * 2019-08-21 2019-11-22 上海无线电设备研究所 Motion target tracking method based on multiple target feature and improvement correlation filter
CN111126132A (en) * 2019-10-25 2020-05-08 宁波必创网络科技有限公司 Learning target tracking algorithm based on twin network

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106203495A (en) * 2016-07-01 2016-12-07 广东技术师范学院 A kind of based on the sparse method for tracking target differentiating study
CN109859241A (en) * 2019-01-09 2019-06-07 厦门大学 Adaptive features select and time consistency robust correlation filtering visual tracking method
CN109859244A (en) * 2019-01-22 2019-06-07 西安微电子技术研究所 A kind of visual tracking method based on convolution sparseness filtering
CN110490907A (en) * 2019-08-21 2019-11-22 上海无线电设备研究所 Motion target tracking method based on multiple target feature and improvement correlation filter
CN111126132A (en) * 2019-10-25 2020-05-08 宁波必创网络科技有限公司 Learning target tracking algorithm based on twin network

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
correlation filter tracking with adaptive proposal selection for accurate scale estimation;luo xiong et al.;《arXiv》;20200714;全文 *
Learning Local Structured Correlation Filters for Visual Tracking via Spatial Joint Regularization;Chenggang Guo et al.;《IEEE Access》;20190320;第7卷;全文 *
Robust Correlation Filter Tracking with Shepherded Instance-Aware Proposals;Yanjie Liang et al.;《26th ACM international conference on Multimedia 2018》;20181026;全文 *
基于相关滤波器的目标跟踪算法研究;王欣远;《中国优秀博硕士学位论文全文数据库(硕士)信息科技辑》;20200215(第2期);全文 *

Also Published As

Publication number Publication date
CN111862167A (en) 2020-10-30

Similar Documents

Publication Publication Date Title
CN111862167B (en) Rapid robust target tracking method based on sparse compact correlation filter
Luo et al. Decomposition algorithm for depth image of human health posture based on brain health
CN108510012B (en) Target rapid detection method based on multi-scale feature map
Shiri et al. A comprehensive overview and comparative analysis on deep learning models: CNN, RNN, LSTM, GRU
CN109859241B (en) Adaptive feature selection and time consistency robust correlation filtering visual tracking method
CN112560656A (en) Pedestrian multi-target tracking method combining attention machine system and end-to-end training
CN108874149B (en) Method for continuously estimating human body joint angle based on surface electromyogram signal
CN110135365B (en) Robust target tracking method based on illusion countermeasure network
CN109508686B (en) Human behavior recognition method based on hierarchical feature subspace learning
CN113298036B (en) Method for dividing unsupervised video target
CN110189362B (en) Efficient target tracking method based on multi-branch self-coding countermeasure network
Zhang et al. Learning adaptive sparse spatially-regularized correlation filters for visual tracking
CN115147456B (en) Target tracking method based on time sequence self-adaptive convolution and attention mechanism
Zhang et al. Robust low-rank kernel subspace clustering based on the schatten p-norm and correntropy
CN112258557B (en) Visual tracking method based on space attention feature aggregation
CN110555864B (en) Self-adaptive target tracking method based on PSPCE
CN107239827B (en) Spatial information learning method based on artificial neural network
CN115238796A (en) Motor imagery electroencephalogram signal classification method based on parallel DAMSCN-LSTM
Lv et al. Learning to estimate 3-d states of deformable linear objects from single-frame occluded point clouds
Hashida et al. Multi-channel mhlf: Lstm-fcn using macd-histogram with multi-channel input for time series classification
CN112668543B (en) Isolated word sign language recognition method based on hand model perception
Jeong et al. Deep efficient continuous manifold learning for time series modeling
Wang et al. Human motion data refinement unitizing structural sparsity and spatial-temporal information
CN116597996A (en) Infant brain development quantitative evaluation system based on self-adaptive neighbor propagation self-clustering model
CN113298136B (en) Twin network tracking method based on alpha divergence

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant