CN110648351B

CN110648351B - Multi-appearance model fusion target tracking method and device based on sparse representation

Info

Publication number: CN110648351B
Application number: CN201910886309.8A
Authority: CN
Inventors: 汪芳; 周健
Original assignee: Anhui University
Current assignee: Anhui University
Priority date: 2019-09-19
Filing date: 2019-09-19
Publication date: 2022-02-01
Anticipated expiration: 2039-09-19
Also published as: CN110648351A

Abstract

The invention discloses a multi-appearance model fusion target tracking method and a device based on sparse representation, wherein the method comprises the following steps: building a particle filter frame; firstly, determining a target template set and a candidate target set of a tracking image, and then generating a sparse coefficient solution by linearly programming the target template set and trivial templates; establishing a local sparse appearance model; fusing the global sparse appearance model with the local sparse appearance model: calculating two similarity elements, calculating the weighted sum of the two elements as fusion similarity, and calculating the weighted sum of reconstruction error, fusion similarity and reconstruction error as a discrimination function; and assigning a weight value which is in direct proportion to the importance degree of the target template to the target template of the target template set, and updating the target template of the target template set. The invention reduces time complexity, has higher adaptability under the condition of larger change of the target appearance and larger occlusion area, accurately and robustly tracks the target, and has occlusion processing capability and high appearance model updating efficiency.

Description

Multi-appearance model fusion target tracking method and device based on sparse representation

Technical Field

The invention relates to a target tracking method in the technical field of target tracking, in particular to a sparse representation-based multi-appearance model fusion target tracking method and a sparse representation-based multi-appearance model fusion target tracking device of the tracking method.

Background

Target tracking aims at estimating the state of a moving target in a video sequence, and computer vision-based target tracking can be widely applied to safety monitoring, unmanned vehicle navigation, human-computer interaction, behavior detection and the like. Due to the variability of the target and the complexity of the scene, it is still challenging how to design a target tracking method to cope with the target tracking in a complex and dynamic environment caused by factors such as local occlusion, illumination change, deformation, size change, camera movement, background interference and angle change.

Current target tracking algorithms fall into two categories, one being decision model-based target tracking methods that treat target tracking as a binary problem in an attempt to identify the tracked target from its surrounding background, and the other being detection-based target tracking algorithms that treat target tracking as a detection problem in local search areas, with on-line learning classifiers to achieve target tracking. However, the existing target tracking methods are based on a single appearance model, and tracking failure is easy to occur in a scene with a target similar to a background, shielding, illumination change, posture change or rotation.

Disclosure of Invention

In order to solve the technical problem that the tracking failure is easy to occur in the existing target tracking method under the scenes that a target is similar to a background, and is shielded, illumination change, posture change or rotation exists, the invention provides a multi-appearance model fusion target tracking method based on sparse representation and a device thereof.

The invention is realized by adopting the following technical scheme: a sparse representation-based multi-appearance model fusion target tracking method is used for tracking a tracking target in a tracking image, and comprises the following steps:

(1) building a particle filter frame, and determining the state of the tracking target through a plurality of affine parameters to build a motion model of state transition of the tracking target;

(2) in the particle filter framework, a target template set and a candidate target set of the tracking image are determined, then the target template set and a trivial template for describing errors are subjected to linear programming to generate a sparse coefficient solution of candidate targets in the candidate target set, and a global sparse appearance model is constructed;

(3) establishing a local sparse appearance model; the method for establishing the local sparse appearance model comprises the following steps of: (3.1) extracting a plurality of overlapped local image blocks I in each target template of the target template set, and combining all the local image blocks into a local dictionary in a group manner; (3.2) firstly, extracting a plurality of overlapped local image blocks II from each candidate target in the candidate target set, and then coding the local image blocks II serving as local features by using the local dictionary to obtain a corresponding coding coefficient matrix; (3.3) dividing the coding coefficient matrix into a plurality of segment coefficients, weighting all the segment coefficients to obtain coefficient vectors, and forming a similarity matrix by all the coefficient vectors;

(4) fusing the global sparse appearance model with the local sparse appearance model, wherein the fusing method comprises the following steps: (4.1) firstly calculating two similarity elements on the main diagonal line of the similarity matrix, and then calculating the weighted sum of the two similarity elements as the fusion similarity of a target area; (4.2) calculating a reconstruction error after global representation according to the sparse coefficient solution; (4.3) calculating a weighted sum of the fusion similarity and the reconstruction error as a discriminant function of the corresponding candidate object;

(5) each target template in the target template set is endowed with a weight value which is in direct proportion to the importance of the target template, and the target templates in the target template set are updated, and the updating method comprises the following steps: (5.1) selecting a target template with the maximum sparse linear representation coefficient as a current optimal template, and resetting the weight of the current optimal template to 1; (5.2) calculating the similarity between the tracking target of the current frame and the current optimal template; (5.3) judging whether the similarity is greater than a similarity threshold, if so, judging whether the shielding degree of the tracking result is within a threshold range, and if so, updating the corresponding target template: and linearly representing an estimated target, updating a target template corresponding to the component with the minimum median of the sparse linear representation coefficient vector, and finally updating a corresponding local dictionary.

Under the framework of particle filtering, the invention firstly obtains the global, local and space structure information of the target by constructing a global sparse appearance model and a local sparse appearance model, then positions the target by weighting fusion among characteristics to fuse the two models, can effectively deal with various conditions which are changed due to internal and external factors of the target, adds global features to prevent the global information of the target from being lost, and finally adopts an increment principal component analysis method based on sparse constraint to adaptively update the target template, thereby not only having the capability of shielding processing, the method has the advantages of solving the technical problem that the tracking failure is easy to occur in the existing target tracking method under the scenes that the target is similar to the background, sheltered, changed in illumination, changed in posture or rotated, reducing time complexity, improving adaptability and accurately tracking the target, along with small calculated amount and high updating efficiency of the appearance model.

As a further improvement of the above scheme, the expression of the motion model is:

p(x_t|x_t-1)＝N(x_t；x_t-1，ψ)

wherein x is_tIs a state vector and is used for representing the tracking target state determined by a plurality of affine parameters; ψ denotes a diagonal covariance matrix whose elements correspond to the variances of a plurality of affine transformation parameters.

As a further improvement of the above scheme, the expression of the global sparse appearance model is:

wherein T ═ T₁,T₂,…,T_n]Representing the set of target templates; y ═ Y₁,y₂,…,y_m]For representing the set of candidate objects; i denotes the trivial template,

a sparse coefficient vector representing a candidate target yi to be solved; the global sparse appearance model also performs non-negative constraint on the sparse coefficient vectors, and the constraint formula is:

wherein a is an overcomplete group, and a ═ T, I ]; λ is a parameter that balances the weight between the L1 regularization term and the reconstruction error.

Further, the calculation formula of the sparse linear representation coefficient is as follows:

in the formula, p_iIs the ith local feature of the candidate object, b_iIs p_iA matrix of coding coefficients under the local dictionary;

the expression of the coefficient vector is:

wherein C is a normalization term,

is b is_iThe kth segment of (1); the coding coefficient matrix is represented as B ═ B₁,b₂,…,b_N]Said segment coefficients are represented as

As a further improvement of the above scheme, the calculation formula of the weighted sum of the two similarities is:

β＝η₁∑diag(V₁)+η₂∑diag(V₂)

in the formula eta₁And η₂Are all weighting coefficients; diag (V)₁) And diag (V)₂) The local blocks with different sizes are the sum of elements on the main diagonal of the similarity square matrix.

Further, the expression of the discriminant function is:

wherein gamma is a parameter for adjusting the weight of the discriminant function,

and taking the candidate particle with the maximum f value as the tracking result for the reconstruction error.

As a further improvement of the above scheme, the similarity is represented by an included angle between two normalized vectors, and the expression is:

g＝angle(y,T_j)

wherein g is the angle between two normalized vectors, y represents the tracking result, T_jRepresenting the current optimal template.

As a further improvement of the above solution, the linear expression formula of the estimation target is:

wherein I represents the trivial template, p represents an observation vector, U represents a matrix containing a plurality of basis vectors, q represents a coefficient vector of the basis vectors, and e represents an occluded pixel in p; the calculation formula of the sparse coefficient solution is as follows:

wherein H is [ U I ]]，c＝[q,e]^T；

The updating formula of the target template is as follows:

T_k＝Uq

wherein, T_kAnd representing the target template corresponding to the minimum component in the sparse linear representation coefficient vector.

Further, in the particle filter framework, a solving formula of the optimal state of the tracking target at the time t is as follows:

wherein, p (x)_t|z_1:t) The posterior probability of each sampling particle is expressed by the calculation formula:

p(x_t|z_1:t-1)＝∫p(x_t|x_t-1)p(x_t-1|z_1:t-1)dx_t-1

in the formula, p (x)_t|x_t-1) Representing the probability of a state transition, p (x)_t-1|z_1:t-1) A posteriori profile representing the last moment in timeRate, p (z)_t|x_t) Representing a state vector x_tLikelihood probability of (d).

The invention also provides a sparse representation-based multi-appearance model fusion target tracking device, which applies any sparse representation-based multi-appearance model fusion target tracking method, and comprises the following steps:

the particle filter framework building module is used for building a particle filter framework and determining the state of the tracking target through a plurality of affine parameters so as to build a motion model of state transition of the tracking target;

a global sparse appearance model establishing module, configured to, in the particle filter framework, first determine a target template set and a candidate target set of the tracking image, and then perform linear programming on the target template set and a trivial template for describing an error to generate a sparse coefficient solution for a candidate target in the candidate target set, and construct a global sparse appearance model;

a local sparse appearance model building module for building a local sparse appearance model; the local sparse appearance model building module comprises an extraction combination unit, an extraction coding unit and a division unit; the extraction and combination unit is used for extracting a plurality of overlapped local image blocks I in each target template of the target template set and combining all the local image blocks into one local dictionary in a group mode; the extraction coding unit is used for extracting a plurality of overlapped local image blocks II in each candidate target of the candidate target set, then using the local image blocks II as local features and coding the local features by using the local dictionary to obtain a corresponding coding coefficient matrix; the dividing unit is used for dividing the coding coefficient matrix into a plurality of segment coefficients, weighting all the segment coefficients to obtain coefficient vectors, and finally forming a similarity matrix by all the coefficient vectors;

a fusion module for fusing the global sparse appearance model with the local sparse appearance model; the fusion module comprises a similarity calculation unit, a reconstruction error calculation unit and a discriminant function unit; the similarity calculation unit is used for calculating two similarity elements on a main diagonal line of the similarity matrix, and then calculating a weighted sum of the two similarity elements as the fusion similarity of a target area; the reconstruction error calculation unit is used for calculating a reconstruction error after global representation according to the sparse coefficient solution; the discriminant function unit is used for calculating the weighted sum of the fusion similarity and the reconstruction error and taking the weighted sum as a discriminant function of a corresponding candidate target; and

the template updating module is used for giving a weight value which is in direct proportion to the importance degree of each target template of the target template set and updating the target templates of the target template set; the template updating module comprises a resetting unit, a similarity calculating unit and a judging and updating unit; the resetting unit is used for selecting the target template with the maximum sparse linear representation coefficient as the current optimal template and resetting the weight of the current optimal template to 1; the similarity calculation unit is used for calculating the similarity between the tracking target of the current frame and the current optimal template; the judgment updating unit is used for judging whether the similarity is greater than a similarity threshold, if so, judging whether the shielding degree of the tracking result is within a threshold range, and if so, updating the corresponding target template: and linearly representing an estimated target, updating a target template corresponding to the component with the minimum median of the sparse linear representation coefficient vector, and finally updating a corresponding local dictionary.

Compared with the existing target tracking method, the sparse representation-based multi-appearance model fusion target tracking method and the sparse representation-based multi-appearance model fusion target tracking device have the following beneficial effects:

1. the tracking method of the multi-appearance model fusion target based on sparse representation comprises the steps of firstly determining the state of a tracked target through a plurality of affine parameters under a particle filter frame, constructing a corresponding motion model, then obtaining the global characteristics, the local characteristics and the space structure information of the target through constructing a global sparse appearance model and a local sparse appearance model so as to deal with the problems of large-range shielding, illumination change, posture change, rotation and the like, reducing the sensitivity of the target to shielding and local deformation in the tracking process, then positioning the target through weighting fusion between the characteristics so as to fuse the two models, effectively dealing with various conditions caused by the change of internal and external factors of the target, simultaneously adding global characteristics to generate a discrimination function of a candidate target so as to prevent the global information of the target from being lost, and finally adopting an increment principal component analysis method based on sparse constraint to adaptively update a target template, the method has the advantages that the influence of shielding on model updating is reduced in the tracking process of the updating mode, the time complexity is reduced, and the method has high adaptability under the conditions that the appearance of the target is greatly changed and the shielding area is large, so that the target is accurately tracked.

2. The sparse representation-based multi-appearance model fusion target tracking method is compared with the existing multiple tracking methods in performance, the average central error of the tracking method is 6.64 pixels under 10 classical test sets, and the average overlapping rate is 69%, so that the method is superior to other existing tracking methods. Moreover, experimental results show that the target tracking method can robustly track the target in various videos containing similar background interference, shielding, illumination change, posture change or rotation.

Drawings

Fig. 1 is a flowchart of a sparse representation-based multi-appearance model fusion target tracking method according to embodiment 1 of the present invention;

fig. 2 is a tracking error curve diagram in a cross video set tracked by the sparse-representation-based multi-appearance model fusion target tracking method in embodiment 2 of the present invention;

fig. 3 is a tracking error curve diagram of a Fish video set tracked by the sparse representation-based multi-appearance model fusion target tracking method in embodiment 2 of the present invention;

fig. 4 is a graph of tracking error in a Girl video set tracked by the sparse representation-based multi-appearance model fusion target tracking method according to embodiment 2 of the present invention;

fig. 5 is a tracking error curve diagram in the Woman video set tracked by the sparse representation-based multi-appearance model fusion target tracking method in embodiment 2 of the present invention;

fig. 6 is a tracking error curve diagram in a Walking video set tracked by the sparse representation-based multi-appearance model fusion target tracking method in embodiment 2 of the present invention;

fig. 7 is a tracking error graph in a faceoc 1 video set tracked by the sparse representation-based multi-appearance model fusion target tracking method according to embodiment 2 of the present invention;

fig. 8 is a tracking error curve diagram of a basetball video set tracked by the sparse representation-based multi-appearance model fusion target tracking method according to embodiment 2 of the present invention;

fig. 9 is a graph of a tracking error in a Freeman3 video set tracked by the sparse representation-based multi-appearance model fusion target tracking method according to embodiment 2 of the present invention;

fig. 10 is a tracking error curve diagram of David2 video set tracked by the sparse representation-based multi-appearance model fusion target tracking method in embodiment 2 of the present invention;

fig. 11 is a tracking error curve diagram of a Dog1 video set tracked by the sparse representation-based multi-appearance model fusion target tracking method in embodiment 2 of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

Example 1

Referring to fig. 1, the present invention provides a sparse representation-based multi-appearance model fusion target tracking method, which is used for tracking a tracking target in a tracking image. The target tracking method of the present embodiment includes the following steps ((1) to (5)).

(1) And building a particle filter framework, and determining the state of the tracking target through a plurality of affine parameters to build a motion model of state transition of the tracking target. In the present embodiment, the particle filtering in the particle filtering framework is essentially a recursive bayesian estimation by non-parametric monte carlo simulation, i.e. the posterior probability density describing the system state is approximated by a random sample set with corresponding weights. Particle filtering generally proceeds with the following steps: firstly, an initial state: simulating X (t) by using a large number of particles, wherein the particles are uniformly distributed in space; a prediction stage: obtaining a predicted particle for each particle according to a state transition equation; ③ a correction phase: evaluating the predicted particles, wherein the closer to the particles in the real state, the higher the weight of the particles; resampling: screening the particles according to the weight of the particles, wherein a large number of particles with large weight are reserved and a small number of particles with small weight are reserved in the screening process; filtering: and substituting the resampled particles into a state transition equation to obtain new predicted particles.

Wherein an observation set z is given up to the time t-1_1:t-1＝[z₁,z₂,…,z_t-1]By using

The system state of the ith sampling particle at the time t is represented, and a solving formula of the optimal state of the tracking target at the time t is as follows:

posterior probability p (x) of each sampled particle_t|z_1:t) The calculation formula is as follows:

p(x_t|z_1:t-1)＝∫p(x_t|x_t-1)p(x_t-1|z_1:t-1)dx_t-1

in the formula, p (x)_t|x_t-1) The state that the target may exist at the time t is presumed on the basis of the estimated state at the time t-1, and is called a state transition probability. p (x)_t-1|z_1:t-1) Representing the posterior probability, p (z), of the previous moment in time_t|x_t) To representState vector x_tLikelihood probability of (d).

In this embodiment, the expression of the motion model is:

p(x_t|x_t-1)＝N(x_t；x_t-1，ψ)

wherein x is_tIs a state vector and is used to represent the tracking target state determined by the plurality of affine parameters. ψ denotes a diagonal covariance matrix whose elements correspond to the variances of a plurality of affine transformation parameters.

For convenience of explanation, the state of the object is determined using six affine parameters in the present embodiment, and definition is performed

The six parameters represent the center point coordinates, the scale, the rotation angle, the aspect ratio and the oblique inclination of the target respectively as the state vector. In other embodiments, the number of affine parameters may be less than six or more than six, which may be set according to actual needs.

(2) In the particle filter framework, a target template set and a candidate target set of a tracking image are determined, then the target template set and a trivial template for describing errors are subjected to linear programming to generate a sparse coefficient solution of candidate targets in the candidate target set, and a global sparse appearance model is constructed. In this embodiment, the global sparse appearance model (i.e., solving for any candidate object y)_iThe model of sparse coefficients) is:

wherein T ═ T₁,T₂,…,T_n]And is used for representing a target template set. Y ═ Y₁,y₂,…,y_m]For representing a set of candidate objects. I denotes a trivial template,

representing candidate objects y to be solved_iThe sparse coefficient vector of (2) converts the solving process intoObtaining candidate object y for a linear program, i.e. by solving the L1 minimization problem_iThe global sparse appearance model also performs non-negative constraint on the sparse coefficient vector, and the constraint formula is as follows:

wherein, a is an overcomplete base, and a ═ T, I ] describes the target template and can also reflect the partial occlusion and noise conditions existing in the target template. λ is a parameter that balances the weight between the L1 regularization term and the reconstruction error.

(3) Establishing a local sparse appearance model; the method for establishing the local sparse appearance model comprises the following steps of: (3.1) extracting a plurality of overlapped local image blocks I in each target template of the target template set, and combining all the local image blocks into a local dictionary in a group; (3.2) firstly, extracting a plurality of overlapped local image blocks II from each candidate target in the candidate target set, and then coding the local image blocks II serving as local features by using a local dictionary to obtain a corresponding coding coefficient matrix; and (3.3) dividing the coding coefficient matrix into a plurality of segment coefficients, weighting all the segment coefficients to obtain coefficient vectors, and finally forming a similarity matrix by all the coefficient vectors. In the present embodiment, the local dictionary is defined as D ═ D₁,d₂,…,d_(n×N)]Wherein d is_iAnd obtaining each local image block extracted from T through a listing process. Each local image block represents a local area of the tracked object and all sets of local image blocks may represent the complete structure of the object. Because these local blocks are obtained from many global templates, the dictionary characterizes the commonality between different templates and the variability between different templates.

For candidate target y_iIn the same way, a local image block is extracted as a local feature y ═ p₁,p₂,…,p_N]. These local features are encoded using a learned dictionary D to obtain corresponding encoding coefficients. In sparsenessUnder the assumption, the calculation formula of the sparse linear representation coefficient is as follows:

in the formula, p_iIs the ith local feature of the candidate object, b_iIs p_iA matrix of coding coefficients under the local dictionary D. The coding coefficient matrix is represented as B ═ B₁,b₂,…,b_N]The segment coefficient is expressed as

Denotes b_iThese segment coefficients are weighted to obtain v for the ith block_iI.e. the expression of the coefficient vector is:

wherein C is a normalization term,

is b is_iThe k-th section of (1). V of all local blocks of a candidate region_iForming a square matrix V.

(4) Fusing the global sparse appearance model with the local sparse appearance model, wherein the fusing method comprises the following steps: (4.1) calculating the sum of elements (two similarity elements) on the main diagonal of the similarity matrix, and then calculating the weighted sum of the two elements to be used as the fusion similarity of a target area; (4.2) calculating a reconstruction error after global representation according to the sparse coefficient solution; and (4.3) calculating the weighted sum of the fusion similarity and the reconstruction error as a discriminant function of the corresponding candidate target.

The existing tracking method based on the global appearance model selects the candidate target with the minimum reconstruction error as a tracking result by performing target template sparse linear representation on the candidate target, but the method only considers global information of the target and does not fully utilize sparse coding coefficients to distinguish the target from the background, so that similar object interference occurs in a scene or the target is easily lost when the object is shielded in a large area, illumination changes or the target deforms, and tracking drift is caused. Aiming at the problems, a plurality of tracking algorithms based on local appearance models are proposed, and the basic idea of the local sparse appearance model is to perform blocking processing on a target template and candidate targets and perform sparse representation of the candidate targets on local features. While such models address some of the problems in global sparse appearance models, the spatial structure of the target under global representation and the differences between different sized local blocks are not considered.

For this reason, in the present embodiment, when extracting a local image block, the local image block is extracted in two different local block sizes, two granularity-sized local dictionaries are formed, respectively, and the similarity between a candidate region and a target region in the local blocks of different sizes is obtained based on these local dictionaries. Among them, the coarse-grained local appearance model more easily explains the appearance change of the target due to large deformation, and the fine-grained local appearance model helps to distinguish the target region from the background region. In this embodiment, the similarity obtained under two sizes is fused to obtain the similarity of a target region, that is, the calculation formula of the weighted sum of the two similarities is:

β＝η₁∑diag(V₁)+η₂∑diag(V₂)

in the formula eta₁And η₂Are all weighting coefficients. diag (V)₁) And diag (V)₂) The sum of the elements on the main diagonal of the similarity matrix corresponding to local blocks of different sizes. And elements on the main diagonal line of the square matrix V are sparse coding coefficients of each local block of the candidate target on the local block corresponding to the position of the local dictionary.

In order to improve the tracking stability, the present embodiment fuses the global appearance model and the local appearance model. Since both the coarse-grained feature and the fine-grained feature obtained after the blocking belong to local features, and the candidate target discrimination is performed on the local blocks, the global information of the target is lost, so when the tracking result is determined, the global feature is added in the embodiment, and the expression of the final candidate target discrimination function is as follows:

for reconstruction error, the candidate particle with the largest f value is used as the tracking result.

(5) Each target template of the target template set is endowed with a weight value which is in direct proportion to the importance of the target template set, and the target templates of the target template set are updated, and the updating method comprises the following steps: (5.1) selecting a target template with the maximum sparse linear representation coefficient as a current optimal template, and resetting the weight of the current optimal template to 1; (5.2) calculating the similarity between the tracking target of the current frame and the current optimal template; (5.3) judging whether the similarity is greater than a similarity threshold, if so, judging whether the shielding degree of the tracking result is within a threshold range, and if so, updating the corresponding target template: and linearly representing an estimated target, updating a target template corresponding to the component with the minimum median of the sparse linear representation coefficient vector, and finally updating a corresponding local dictionary.

In general, due to the change of the target appearance in the tracking process, a fixed target template cannot meet the tracking requirement, and the target template needs to be updated adaptively. However, errors may be introduced into the target template at each update, and the accumulation of errors eventually causes distortion of the target template, thereby causing tracking drift, so that the target template cannot be updated too frequently. Although the old template selected by random probability can be replaced by the new template which is reconstructed by PCA in an incremental mode every five frames, the occlusion detection is not carried out on the tracking result, and the template to be replaced is selected by random probability, so that the algorithm is unstable in the tracking process. To this end, the present embodiment

This embodiment provides aA new PCA-based incremental updating strategy is provided for each template T_iIntroducing a weight W_i. The larger the weight, the more important the corresponding template is. Initially, a template set is generated by a first template through gaussian disturbance affine parameters, and initially, all template weights in the template set are specified to be 1. Therefore, the updating method is refined in this embodiment, and specifically, the updating process is as follows:

1) let a be the linear expression coefficient vector of the current frame target on the template set, and obtain from the above calculation, select the template T with the maximum coefficient when linearly expressing in the template set_jSetting the weight of the target template to 1 for the current optimal template;

2) calculating the similarity g between the target y tracked by the current frame and the current optimal template, and adopting an included angle (angle) between two normalized vectors to represent g, namely:

g＝angle(y,T_j)

wherein g is the angle between two normalized vectors, y represents the tracking result, T_jRepresenting the current optimal template;

3) when g is larger than the similarity threshold, the tracking result y and the optimal template T are explained_jThe deviation is large, the template updating is needed, however, if the worst template in the template set is directly replaced by the tracking result y with serious shielding, the tracking failure is easily caused, so the shielding detection pre-judgment is adopted, and the template updating is carried out when the shielding degree of the tracking result y is within the threshold range, and the specific steps are as follows:

and (I) linearly expressing an estimated target by using PCA basis vectors and a trivial template:

wherein I represents a trivial template, p represents an observation vector, U represents a matrix containing a plurality of basis vectors, q represents a coefficient vector of the basis vectors, and e represents an occluded pixel in p; the calculation formula of the sparse coefficient solution is as follows:

wherein H is [ U I ]]，c＝[q,e]^T；

(II) template T corresponding to component with minimum value in coefficient vector a_kUpdating is carried out, namely:

T_k＝Uq

wherein, T_kRepresenting a target template corresponding to the minimum component in the sparse linear representation coefficient vector;

and (III) updating the local dictionary, namely extracting local blocks from the updated template set T in the same way to form a local dictionary D.

In summary, compared with the existing target tracking method, the sparse representation-based multi-appearance model fusion target tracking method has the following advantages:

the tracking method of the multi-appearance model fusion target based on sparse representation comprises the steps of firstly determining the state of a tracked target through a plurality of affine parameters under a particle filter frame, constructing a corresponding motion model, then obtaining the global characteristics, the local characteristics and the space structure information of the target through constructing a global sparse appearance model and a local sparse appearance model so as to deal with the problems of large-range shielding, illumination change, posture change, rotation and the like, reducing the sensitivity of the target to shielding and local deformation in the tracking process, then positioning the target through weighting fusion between the characteristics so as to fuse the two models, effectively dealing with various conditions caused by the change of internal and external factors of the target, simultaneously adding global characteristics to generate a discrimination function of a candidate target so as to prevent the global information of the target from being lost, and finally adopting an increment principal component analysis method based on sparse constraint to adaptively update a target template, the method has the advantages that the influence of shielding on model updating is reduced in the tracking process of the updating mode, the time complexity is reduced, and the method has high adaptability under the conditions that the appearance of the target is greatly changed and the shielding area is large, so that the target is accurately tracked.

Example 2

The embodiment provides a sparse representation-based multi-appearance model fusion target tracking method, which performs a simulation experiment on the basis of embodiment 1 (in other embodiments, the simulation experiment may not be performed, and other experiment schemes may be used for performing experiments to determine relevant parameters and target tracking performance).

In this embodiment, a simulation experiment is performed on a tracking method in an intel core3.2ghz, 4GB memory and MATLAB2010b platform, the tracking method is performed in a particle filter framework, the number of particles is 600, each target image block size is 32 × 32 pixels, and the sizes of local image blocks extracted from a target region are 16 × 16 pixels and 8 × 8 pixels, respectively. Eta₁Corresponding to a local block weight coefficient, η, of 16 × 16 size₂We selected 10 representative videos for the experiment, respectively, corresponding to 8 × 8 local block weight coefficients. In the embodiment, a plurality of videos are adopted for testing, the videos have the influences of illumination, posture change, scale change, shielding and the like in different degrees, and tracking of the target in the video has certain challenges. In order to measure the performance of the tracking method, 8 classic tracking methods are selected for carrying out target tracking experiments, and the eight methods are as follows: TLD, ORI, L1-APG, IVT, MTT, SMS, ASLA and LSK. Wherein, table 1 lists the parameters γ, η corresponding to each video set when performing the simulation experiment₁，η₂The value of (a).

Table 1 table of parameter settings for different video sets

First, the experimental results of this example by qualitative evaluation are as follows.

Cross: the cross video has scale change, deformation, rapid movement, out-of-plane rotation and similar target interference, and has higher requirements on the robustness of the performance of the tracker. The ASLA tracking method and the tracking method in the text take local features into consideration compared with other tracking methods, so that the target can be effectively tracked. When the tracking target is interfered by similar targets, the LSK tracking method, the MTT tracking method, the L1APG tracking method, the TLD tracking method, the ORIA tracking method, the SMS tracking method and the IVT tracking method all have drifts of different degrees, and the average overlapping rate is higher because the global characteristics are considered by the method of the embodiment.

Fish: the main difficulty of the Fish video is that the light and shade change of the light is relatively sudden and violent. The IVT tracking method, the ASLA tracking method, the LIAPG tracking method and the method of the embodiment can effectively track the target, and when the target has illumination change, the LSK tracking method, the MTT tracking method, the TLD tracking method, the ORIA tracking method and the SMS tracking method have drift of different degrees.

Girl: the main difficulties of Girl video are scale change, object occlusion, rotation. At frame 124, the first rotation of the target, the LSK tracking method, the IVT tracking method, and the ORIA tracking method all drift to different degrees. In frame 440, when the tracking target is occluded, the MIL tracking method and the TLD tracking method also drift to different degrees, but the method of this embodiment always effectively tracks the target.

Woman: the main difficulties of the Woman video are illumination change, scale change, shielding, deformation, motion blur, rapid motion and out-of-plane rotation, and the robustness requirement on the performance of the tracker is high. When the tracking target is shielded, only the method of the embodiment can effectively track the target.

Most tracking methods can effectively track the target, and the average overlapping rate of the method is higher because the stability of the method of the embodiment is improved by multi-template fusion.

faceOcc 1: the main difficulty with FaceOcc1 video is target occlusion of different sizes. The MTT tracking method and the method of the embodiment can effectively track the target. When the tracking target is frequently shielded, other tracking methods have drifts of different degrees.

Basketball: the main difficulties of the Basketball video are illumination change, similar target occlusion, deformation, out-of-plane rotation and similar background interference. In 283 frames, under the condition of similar background interference, the MTT tracking method, the TLD tracking method, the ORIA tracking method, the SMS tracking method, the L1APG tracking method and the IVT tracking method all have drifts in different degrees. At frame 578, the tracked object is rotated, and the method of this embodiment can effectively track the object.

Freeman 3: the main difficulties with Freeman3 video are scaling, rotation. The method of the embodiment can effectively track the target, and under the condition of frequent rotation of the tracked target, other tracking methods drift to different degrees.

David 2: the main difficulty with David2 video is rotation. The ASLA tracking method, the MIL tracking method, the IVT tracking method, and the method of the present embodiment can effectively track a target. When the tracking target rotates, the TLD tracking method, the ORIA tracking method, the SMS tracking method, the L1APG tracking method, and the LSK tracking method all drift to different degrees.

Dog 1: the main difficulties of Dog1 video are scale change, rotation. The ORIA tracking method, the ASLA tracking method, and the method of the present embodiment can effectively track a target. Other tracking methods all suffer from varying degrees of drift as the tracked target rotates.

Then, the experimental results of this example by quantitative evaluation are as follows. For objective evaluation of the performance of the tracking method herein and 8 other tracking methods, according to the PASCAL VOC detection standard, an overlap ratio score is used herein, which is calculated by the formula:

wherein R is_rIs the area covered by the target tracking frame of each frame obtained by the tracking method, R_gIs the area where the true position provided by the test sequence is located. The overlap ratio ranges between 0 and 1, and a larger value represents a more precise result. Referring to fig. 2-11, the comparison of the overlap ratio of different video sequences by different methods is shown, wherein the abscissa is the frame number and the ordinate is the overlap ratio. TABLE 2 average overlap ratio comparison of different video sequences using different methods, best results using additionThe overlapping rate of the tracking method provided by the embodiment is kept at a higher level in the tracking process, and the method has certain advantages compared with the other 8 methods.

TABLE 2 comparison of the overlap ratio of different tracking methods

In order to further analyze the performance of the algorithm, the error of the central point is selected as a measurement standard, and the calculation formula is as follows:

wherein x is_tAnd y_tValues on two coordinate axes representing the tracking results of different tracking methods. x is the number of_gAnd y_gThe smaller the error value, the closer the tracking method result is to the real position. Table 3 shows that the tracking method of the present invention can obtain a lower average center error in each video sequence, and the accuracy of the tracking method is verified.

TABLE 3 comparison of mean center errors for different tracking methods

In summary, compared with the existing tracking method, the sparse representation-based multi-appearance model fusion target tracking method of the embodiment has the following advantages:

the sparse representation-based multi-appearance model fusion target tracking method is compared with the existing multiple tracking methods in performance, the average central error of the tracking method is 6.64 pixels under 10 classical test sets, and the average overlapping rate is 69%, so that the method is superior to other existing tracking methods. Moreover, experimental results show that the target tracking method can robustly track the target in various videos containing similar background interference, shielding, illumination change, posture change or rotation.

Example 3

The embodiment provides a sparse representation-based multi-appearance model fusion target tracking device, which applies the sparse representation-based multi-appearance model fusion target tracking method of embodiment 1 or embodiment 2. The target tracking device of the embodiment comprises a particle filter framework building module, a global sparse appearance model building module, a local sparse appearance model building module, a fusion module and a template updating module.

The particle filter frame building module is used for building a particle filter frame and determining the state of the tracking target through a plurality of affine parameters to build a motion model of state transfer of the tracking target.

The global sparse appearance model building module is used for firstly determining a target template set and a candidate target set of a tracking image in a particle filter framework, then generating a sparse coefficient solution of the candidate target in the candidate target set by linearly programming the target template set and a trivial template for describing errors, and building a global sparse appearance model.

The local sparse appearance model building module is used for building a local sparse appearance model. The local sparse appearance model building module comprises an extraction combination unit, an extraction coding unit and a division unit. The extraction and combination unit is used for extracting a plurality of first local image blocks with overlapping in each target template of the target template set and combining all the first local image blocks into a local dictionary in a group mode. The extraction and coding unit is used for firstly extracting a plurality of overlapped local image blocks II from each candidate target of the candidate target set, then coding the local image blocks II serving as local features by using the local dictionary, and obtaining corresponding coding coefficient matrixes. The dividing unit is used for dividing the coding coefficient matrix into a plurality of segment coefficients, weighting all the segment coefficients to obtain coefficient vectors, and finally forming a similarity matrix by all the coefficient vectors.

The fusion module is used for fusing the global sparse appearance model with the local sparse appearance model. The fusion module comprises a similarity calculation unit, a reconstruction error calculation unit and a discriminant function unit. The similarity calculation unit is used for calculating the sum of two elements (two similarity elements) on the main diagonal of the similarity matrix, and then calculating the weighted sum of the two elements to be used as the fusion similarity of a target area. And the reconstruction error calculation unit is used for calculating the reconstruction error after the global representation according to the sparse coefficient solution. And the discriminant function unit is used for calculating the weighted sum of the fusion similarity and the reconstruction error and taking the weighted sum as a discriminant function of the corresponding candidate target.

The template updating module is used for endowing each target template of the target template set with a weight value in direct proportion to the importance of the target template set, and updating the target templates of the target template set. The template updating module comprises a resetting unit, a similarity calculating unit and a judging and updating unit. The resetting unit is used for selecting the target template with the maximum sparse linear representation coefficient as the current optimal template and resetting the weight of the current optimal template to 1. The similarity calculation unit is used for calculating the similarity between the tracking target of the current frame and the current optimal template. The judgment updating unit is used for judging whether the similarity is greater than a similarity threshold, if so, judging whether the shielding degree of the tracking result is within a threshold range, and if so, updating the corresponding target template: and linearly representing an estimated target, updating a target template corresponding to the component with the minimum median of the sparse linear representation coefficient vector, and finally updating a corresponding local dictionary.

Example 4

The present embodiments provide a computer terminal comprising a memory, a processor, and a computer program stored on the memory and executable on the processor. The processor implements the steps of the sparse representation-based multi-appearance model fusion target tracking method of embodiment 1 when executing the program.

When the target tracking method in embodiment 1 is applied, the target tracking method may be applied in a software form, for example, a program designed to run independently is installed on a computer terminal, and the computer terminal may be a computer, a smart phone, a control system, other internet of things devices, and the like. The target tracking method in embodiment 1 may also be designed as an embedded running program, and installed on a computer terminal, such as a single chip microcomputer.

Example 5

The present embodiment provides a computer-readable storage medium having a computer program stored thereon. The program, when executed by a processor, implements the steps of the sparse representation-based multi-appearance model fusion target tracking method of embodiment 1.

When the target tracking method in embodiment 1 is applied, the target tracking method may be applied in the form of software, such as a program designed to be independently run by a computer-readable storage medium, where the computer-readable storage medium may be a usb disk designed as a usb shield, and the usb disk is designed to be a program that starts the whole method through external triggering.

The present invention is not limited to the above preferred embodiments, and any modifications, equivalent substitutions and improvements made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A multi-appearance model fusion target tracking method based on sparse representation is used for tracking a tracking target in a tracking image, and is characterized by comprising the following steps:

(4) fusing the global sparse appearance model with the local sparse appearance model, wherein the fusing method comprises the following steps: (4.1) firstly calculating two similarity elements on the main diagonal line of the similarity matrix, and then calculating the weighted sum of the two similarity elements to be used as the fusion similarity of a target area; (4.2) calculating a reconstruction error after global representation according to the sparse coefficient solution; (4.3) calculating a weighted sum of the fusion similarity and the reconstruction error as a discriminant function of the corresponding candidate object;

2. The sparse representation-based multi-appearance model fusion target tracking method according to claim 1, wherein the expression of the motion model is as follows:

p(x_t|x_t-1)＝N(x_t；x_t-1，ψ)

wherein x is_tIs a state vector and is used for representing the tracking target state determined by a plurality of affine parameters; ψ denotes a diagonal covariance matrix whose elements correspond to variances of a plurality of affine transformation parameters; n (x)_t；x_t-1Phi), denotes the vector x at a given state_t-1And diagonal covariance matrix psi, state vector x_tGaussian probability density function.

3. The sparse representation-based multi-appearance model fusion target tracking method according to claim 1, wherein the expression of the global sparse appearance model is as follows:

representing candidate objects y to be solved_iA sparse coefficient vector of (d); the global sparse appearance model also performs non-negative constraint on the sparse coefficient vectors, and the constraint formula is:

4. The sparse representation-based multi-appearance model fusion target tracking method according to claim 3, wherein the sparse linear representation coefficients are calculated by the formula:

in the formula, p_iIs the ith local feature of the candidate object, b_iIs p_iA matrix of coding coefficients under the local dictionary; d is the local dictionary;

the expression of the coefficient vector is:

wherein C is a normalization term,

5. The sparse representation-based multi-appearance model fusion target tracking method according to claim 1, wherein the calculation formula of the weighted sum of the two similarity elements is as follows:

β＝η₁∑diag(V₁)+η₂∑diag(V₂)

in the formula eta₁And η₂Are all weighting coefficients; diag (V)₁) And diag (V)₂) The local blocks with different sizes correspond to the sum of elements on the main diagonal of the similarity matrix.

6. The sparse representation-based multi-appearance model fusion target tracking method according to claim 5, wherein the discriminant function has an expression as follows:

and taking the candidate particle with the maximum f value as a tracking result for the reconstruction error.

7. The sparse representation-based multi-appearance model fusion target tracking method according to claim 1, wherein the similarity is represented by an included angle between two normalized vectors, and the expression is as follows:

g＝angle(y,T_j)

8. The sparse representation-based multi-appearance model fusion target tracking method according to claim 1, wherein the linear representation formula of the estimation target is as follows:

wherein H is [ U I ]]，c＝[q,e]^T；

The updating formula of the target template is as follows:

T_k＝Uq

9. The sparse representation-based multi-appearance model fusion target tracking method according to claim 2, wherein in the particle filter framework, the solving formula of the optimal state of the tracking target at the time t is as follows:

p(x_t|z_1:t-1)＝∫p(x_t|x_t-1)p(x_t-1|z_1:t-1)dx_t-1

in the formula, p (x)_t|x_t-1) Representing the probability of a state transition, p (x)_t-1|z_1:t-1) Representing the posterior probability, p (z), of the previous moment in time_t|x_t) Representing a state vector x_tLikelihood probability of (d); p (z)_t|z_1:t-1) Denotes z_tThe posterior probability of the last moment.

10. A sparse representation-based multi-appearance model fusion target tracking device, which applies the sparse representation-based multi-appearance model fusion target tracking method according to any one of claims 1 to 9, and is characterized by comprising the following steps: