CN109102521B

CN109102521B - Video target tracking method based on parallel attention-dependent filtering

Info

Publication number: CN109102521B
Application number: CN201810647331.2A
Authority: CN
Inventors: 宋慧慧; 樊佳庆; 张开华; 刘青山
Original assignee: Nanjing University of Information Science and Technology
Current assignee: Nanjing University of Information Science and Technology
Priority date: 2018-06-22
Filing date: 2018-06-22
Publication date: 2021-08-27
Anticipated expiration: 2038-06-22
Also published as: CN109102521A

Abstract

The invention discloses a video target tracking method based on parallel attention-related filtering, and belongs to the technical field of image processing. The tracking problem is designed to be the probability of estimating a target position, the space selection attention SSA and the apparent selection attention ASA are integrated, a target function is obtained by utilizing a Log function, and the continuous and effective tracking of the video target is realized. Firstly, SSA modeling is carried out, a series of binary graphs are generated, a position response graph is obtained through filtering, then a series of interference areas are sampled in a semi-local area around a tracking target, anti-interference distance measurement is learned in relevant video filtering, anti-interference measurement regular relevant filtering is carried out, an interference item is pushed into a negative area, an ASA target graph is obtained, and then images processed in the local area and the semi-local area are fused through a target function obtained through a Log function to track the target. The method has the advantages of more stability and accuracy in problem processing, strong adaptability, good tracking effect and the like.

Description

Video target tracking method based on parallel attention-dependent filtering

Technical Field

The invention relates to a video target tracking method based on parallel attention-related filtering, and belongs to the technical field of image processing.

Background

Visual tracking is a prerequisite in some important computer vision applications, such as video surveillance, behavior recognition, video retrieval and human-computer interaction. Although visual tracking techniques have advanced significantly in recent years, given only the target location information of the first frame, in some unconstrained environments, it is still challenging to continuously track a general target because the appearance of the target is severely affected by interference factors such as occlusion, rapid motion, and deformation.

The task of target tracking is to find the target location and determine the target characteristics, which is a question of where and what, and also related to the attention selection mechanism in human visual perception. Psychological and cognitive research evidence suggests that human visual perception is so important and selective that the human visual system can focus on rapidly processing relevant important visual information. There are two main mechanisms of visual attention in human visual perception: one is Spatially Selective Attention (SSA), which reduces the relative field of a neuron and increases the sensitivity to a particular location in the visual domain; another is Apparent Selective Attention (ASA), which enhances activity in different regions of the cerebral cortex by specifically processing different types of features to enhance response values.

After leaving the eyes, these scene input signals entering the forehead cortex are split into dorsal and ventral flow, the former exploiting the existing spatial relationship (i.e. where) and the latter emphasizing the apparent features (i.e. what). Some perceptual studies have demonstrated that these two types of functions may be processed in parallel, and that these mechanisms may play an important role in dealing with distractors, blurring and occlusion of target tracking. How to use these studies to deal with the problem of where and what in the relevant filtering type tracker is of great significance to solve the target tracking in complex environments.

Disclosure of Invention

The invention aims to solve the technical problem that the conventional target tracking method cannot continuously track a common target, and provides a video target tracking method based on parallel attention-dependent filtering.

In order to solve the above technical problems, the present invention provides a video target tracking method based on parallel attention correlation filtering, which designs a tracking problem as a probability of estimating a target position, integrates Spatial Selective Attention (SSA) and Apparent Selective Attention (ASA), obtains a target function by using a Log function, and realizes continuous and effective tracking of a video target, and comprises the following steps:

(1) acquiring an SSA position response diagram: first, for a tracking target, in the vicinity of the tracking targetA local area, a series of binary images are generated to describe the topological structure between the target and the surrounding scene under different granularities, the images are arranged from top to bottom according to the describing granularity from coarse to fine to obtain a group of tracking target Boolean images B_i(i＝1，2，......，N_b) The coarse-grained Boolean graph encodes and describes the apparent change of an obvious target for the global shape information, and the fine-grained Boolean graph describes the detailed structure of a space; then, a binary filter F is defined for the tracking target, and F is acted on a Boolean graph B_iObtaining a conditional position response graph, completing learning weight by minimizing a linear regression function, learning an optimal weight for each Boolean graph, and weighting each graph to obtain a final position response graph:

(2) obtaining an ASA target map: firstly, sampling a series of interference regions in a semi-local region around a tracking target, approximately equating a ridge regression target function to be a measurement learning related filter, learning anti-interference distance measurement in related video filtering, and solving the mutual relation between modeling positive samples; then, introducing an anti-interference measurement regular term, performing anti-interference measurement regular correlation filtering on the target image, learning anti-interference distance measurement in the correlation filtering, meanwhile, considering useful correlation from a real negative sample, pushing the interference term into a negative domain, and acquiring a target tracking picture:

(3) continuously tracking the video target: and obtaining an objective function integrating SSA and ASA through Log function modeling, tracking the video target by using the function, and updating parameters on line to realize effective tracking of the video target.

The video target tracking method based on the parallel attention-dependent filtering comprises the following specific steps:

(1) obtaining SSA location response graph

(1.1) for a tracked target, in a local region around the tracked target, generating a series of binary maps to describe the topology between the target and its surrounding scene at different granularities by:

wherein I (j) represents the jth pixel intensity, U (-) is a univariate function, R (-) represents an integer function,

is an RGB color channel map of an image block, T denotes transpose;

arranging the pictures from top to bottom according to coarse-fine description granularity to obtain a group of tracking target Boolean graphs B_i(i＝1，2，......，N_b) The coarse-grained Boolean graph encodes the global shape information to describe obvious target appearance change, and the fine-grained Boolean graph describes the detailed structure of the space

(1.2) weight learning: conventionally, a binary filter is defined for the tracked object

Acting F on the tracking target Boolean graph B obtained in the step (1.1)_iIn the above, a set of conditional position response maps is obtained, and learning weights is accomplished by minimizing a linear regression function as follows, and an optimal weight is learned for each Boolean map

Weighting each map to obtain a final position response map P (B)_i，F|I∈Ω_o)：

Wherein omega_oIs the area in the scene where the target appears, Ω_bIs the background area present in the scene, d_wIs the width of the feature, d_hIs the height of the feature or features,

is the classifier parameter vector for the k-th frame,

is the number of non-blank pixels in the target area,

is the number of non-blank pixels in the background area, beta_kIs a weight coefficient to be optimized

Need to pass through

Updated online to accommodate apparent changes in the target over time, β^tIs the weight coefficient vector after the update, η is the fusion coefficient,

is the weight coefficient vector of the current frame;

(2) obtaining an ASA target map

(2.1) sampling a series of interference regions in a semi-local area around a tracked target

Correlation filter for learning by approximately equating the following ridge regression objective function to a metric

Learning an anti-interference distance measure in the relevant video filtering;

wherein the content of the first and second substances,

x_iis a matrix of samples of the sample to be sampled,

is the DFT of the vector x and,

is that

Row i of (1), w_iIs the ith sample matrix x_iThe corresponding weight of the relevant filter is used,

is all of w_iComponent vectors, y being labels of the Gaussian type, d_w′d_h' width and height of the feature matrix, respectively, lambda is the regular term coefficient,

is the mahalanobis distance and is,

and is

(2.2) introducing an anti-interference measurement regular term into a correlation filtering objective function to obtain an anti-interference measurement regular correlation filtering model

Anti-interference measurement regular correlation filtering is further carried out on the target image obtained in the step (2.1) through the model, discrimination and tracking of target features are enhanced, the filtered interference items are pushed into a negative domain, and a positive space target tracking picture P (X) is obtained_i，w_i||∈Ω_o)：

Wherein the content of the first and second substances,

is the k-th sub-vector in the interference rejection metric canonical correlation filter weight,

is the k-th sub-vector in the total sample vector,

is the k-th sub-vector, w, of the Gaussian-shaped tag vector_iIs the weight vector corresponding to the ith cyclic sample matrix,

by passing

The online update is obtained by the online update,

is to ask for

The inverse FFT of (a) results in tracking of the t-th frame,

is that

I is an identity matrix, λ is a regular term coefficient, η is a fusion coefficient;

is defined as:

wherein x is_iIs the ith sample vector of the sample,

is the mth cycle sample of the kth base sample,

is the n-th cyclic sample of the k-th base sample, w^mnIs the sample difference weight (for measuring the similarity between the samples i and j, the greater the weight, the greater the difference of the samples, the more discriminative the learned apparent features);

(4) continuously tracking video objects

Modeling through a Log function, and integrating SSA and ASA images to obtain the following target functions:

wherein, P (B)_i，F|I∈Ω_o) Representing the obtained SSA position response graph,

representing a series of N_bA boolean diagram of the channels is shown,

showing a Boolean Filter, P (X)_i，w_i|I∈Ω_o) Showing the target map of the ASA obtained,

denotes a spatially dependent operation, β_iRepresenting a weight coefficient to be optimized, e^(·)Denotes an exponential function, Ω_o∈R²Representing the target area, o represents the object present in the sceneThe mark is that,

representing a series of N_xBy moving a basic HOG eigen-channel vector

Obtaining that all characteristic channels are independently distributed),

represents an ASA filter;

and tracking the video target by using the target function, updating parameters on line and realizing effective tracking of the target.

The value of the regular term coefficient lambda is 0.001, and the value of the fusion coefficient eta is 0.006.

The principle of the invention is as follows:

the core of the invention is to plan the tracking problem as estimating a target position probability, seamlessly integrate SSA and ASA:

here omega_o∈R²Representing the target area and o representing the target appearing in the scene,

representing a series of N_bA boolean diagram of the channels is shown,

is a series of N_xEach by shifting a basic HOG eigen-channel vector

The obtained material has the advantages of high yield,

and

are their corresponding filters. Furthermore, for simplicity, all feature channels are assumed to be distributed independently. Finally, a Log function is used on both sides of the formula (1) to obtain:

p (B) herein_i，F|I∈Ω_o) And P (X)_i，w_i||∈Ω_o) Is defined as:

here is a spatial correlation operation, β_iIs a weight coefficient to be optimized, and e^(·)Is an exponential function.

In modeling SSA, the present invention first generates a series of binary maps, i.e., BMRs, describing the topology between the target and its surrounding scene at different granularities. In fig. 2, from top to bottom, a boolean diagram describes the granularity from coarse to fine, where a coarse-grained boolean diagram encodes global shape information that is robust to large target appearance variations, whereas a fine-grained boolean diagram describes spatial structural details that are effective for precise target localization. A predefined binary filter is then applied to the maps to obtain a set of conditional position response maps, each of which is weighted to obtain a final position response map, with the goal of learning an optimal weight for each Boolean map.

BMR is inspired by recent human visual attention studies and appears as a temporal perceptual awareness of a scene that can be represented by a set of boolean graphs. In particular, given

Is RGB of an image blockColor channel map, its corresponding

Is obtained from the formula

Here threshold value theta_iFrom one at [0, 255]The independent distribution between them (black and white binary graph) and this symbol ≧ represents the inequality sign at the element level. For simplicity, the threshold is set to θ_i＝N_b(i-1)/255 with a fixed step size δ ═ N_bThe/255 samples from 0 to 255 because fixed step sampling is fully equivalent to infinite delta → 0 uniform sampling. Thus, it is easy to prove

And, the jth pixel intensity I (j) can be expressed as

Where U (·) is a univariate function, such as U (2) ═ 1; 1; 0]，U(3)＝[1；1；1]There are 3 discrete layers, and R (-) represents an integer function.

Is an RGB color channel map of an image block,

in weight learning, the present invention learns weights by minimizing the following linear regression function:

here | · | non-conducting phosphor_FRepresenting the F norm. It is obvious that the minimization of

Equivalent to minimizing the following objective function:

here, formula (5) has been replaced by formula (7). Omega_oAnd Ω_bRespectively represent the target and background regions, and

by placing

Minimization

Resulting solution { beta }_iCan get

To adapt to apparent changes in the target over time, the coefficients are updated online

Herein, the

The tracking result at t frames is calculated by equation (8).

In the aspect of solving the interference problem, the invention utilizes the principle that ASA in human visual perception emphasizes the learning of apparent characteristics, and pushes the interference item into a negative space by learning an anti-interference distance measurement, so that the discrimination capability of the characteristics is enhanced, the robust tracking aiming at the interference item is generated, and the target can be well distinguished from the interference item. Learning the correlation filtering is approximated as learning a distance metric to solve the correlation between modeling positive samples, then learning the anti-interference distance metric in the correlation filtering, while considering the useful correlation from the true negative samples.

In distance metric learning, the learning CF is represented as a spatial ridge regression objective function:

herein, the

Is a target of the gaussian regression,

and λ is a regular term coefficient. It is noted that if

Is remolded to

For any a ≠ 0, then, equation (10) can be re-formulated as

Except for reshaping y at a 1/a ratio, it is equivalent to equation (10) and, due to the same maximum response position, this will yield the same tracking result.

Based on this, in order to clearly show the relationship between the correlation filter learning and the metric learning, the setting in equation (10) is

And remoulds

This is equivalent to adding constraints therein

Then, using the mark

To represent

And then, the data item in the rewrite equation (10) is:

herein, the

Is the mahalanobis distance and is,

and is

Is a vector of all ones. Therefore, learning the correlation filter can be roughly regarded as learning an optimal distance metric.

However, only the relationship between the positive samples is considered in equation (11), and thus its discrimination ability to distinguish the object from the background is limited. To solve this problem, an interference rejection metric regularization term is added to equation (10), which consists of the relationship of the negative space and acts as a force to push the interference term to the negative space.

In performing the interference rejection metric canonical correlation filtering, a series of interference regions are first sampled from a semi-local region around the target

And then modeling the interaction between them as

And is integrated into equation (10) as a regularization term:

where γ is a regular term coefficient, and w^mnIs a weight that measures the similarity between samples i and j. The greater the weight, the greater the sample variability, making the learned appearance more discriminative.

Equation (12) can be re-formulated as:

herein, the

And is

This is

Can be passed through

Obtaining:

herein, the

Is a block matrix having N_x×N_xBlock with a plurality of grooves

Herein, the

And is

Because the circulant matrix x satisfies

Where F denotes a Discrete Fourier Transform (DFT) matrix,

DFT representing a reference vector x, and F^H＝(F^＊)^TRepresenting a conjugate transpose. Using this modeling, equation (15) can be diagonalized into

Here, the

And is

Alternatively, substitution (16) in (14) whose right term can be re-planned

FFT that obtains its solution by substituting equations (17) and (18) into equation (14)

Herein, the

Its ith element is

The k element of (2), and

similar to the formula (9),

obtained by on-line updating

Herein, the

It is calculated by equation (19) using the tracking result of the t-th frame.

Is defined as

Herein, the

Because of the fact that

Is the number of rows of

It is the column number, directly calculated in the formula (19)

The inverse of (c) is not very practical. Instead, we use transformations

To calculate

The inverse of (c). In obtaining all

It can then be computed in parallel, the optimal solution of equation (14)

Can be obtained by

The inverse FFT of (a).

The invention provides a related filtering tracking algorithm based on human visual perception, reflects SSA and ASA mechanisms in the human visual perception, and enhances the robustness and anti-interference of target tracking by processing a local and semi-local background domain in parallel. For the local domain, to model SSA, a simple but efficient BMR is introduced into the correlation filtering learning to delineate the local topology of the object and its scene by randomly binarizing the image color channels, which is invariant to the various transformations. For the semi-local domain, in order to model ASA, an interference rejection metric regularization term is introduced into the objective function of the correlation filtering, which is used as a force for pushing the interference term into the negative domain, so that the tracking robustness is enhanced when the challenging target similar object interference term is encountered. The method has the advantages of more stability and accuracy in problem processing, strong adaptability, good tracking effect and the like, and can realize continuous and effective tracking of the video target.

Drawings

Fig. 1 is a schematic diagram of the present invention.

FIG. 2 is a flow chart of the present invention for modeling SSA.

Detailed Description

The following detailed description of the embodiments of the present invention will be described in detail with reference to the accompanying drawings, wherein the technology and products not shown in the embodiments are all conventional products available in the art or commercially.

Example 1: as shown in fig. 1 and 2, the video target tracking method based on parallel attention-dependent filtering is to design a tracking problem as a probability of estimating a target position, integrate Spatial Selective Attention (SSA) and Apparent Selective Attention (ASA), and obtain a target function by using a Log function to realize continuous and effective tracking of a video target, and includes the following steps:

(1) acquiring an SSA position response diagram: firstly, aiming at a tracking target, a series of binary images are generated in a local area around the tracking target to describe the topological structure between the target and the surrounding scene under different granularities, and the images are arranged from top to bottom according to the describing granularity from coarse to fine to obtain a group of Boolean images B of the tracking target_i(i＝1，2，......，N_b) The coarse-grained Boolean graph encodes and describes the apparent change of an obvious target for the global shape information, and the fine-grained Boolean graph describes the detailed structure of a space; then, a binary filter F is defined for the tracking target, and F is acted on a Boolean graph B_iObtaining a conditional position response graph, completing learning weight by minimizing a linear regression function, learning an optimal weight for each Boolean graph, and weighting each graph to obtain a final position response graph:

The video target tracking method based on the parallel attention correlation filtering comprises the following specific steps:

(1) obtaining SSA location response graph

is an RGB color channel map of an image block, T denotes transpose;

is the classifier parameter vector for the k-th frame,

is the number of non-blank pixels in the target area,

Need to pass through

is the weight coefficient vector of the current frame;

(2) obtaining an ASA target map

Approximately equating the following ridge regression objective function to one degreeCorrelation filter for quantity learning

Learning an anti-interference distance measure in the relevant video filtering;

wherein the content of the first and second substances,

x_ithe matrix of samples is then used to determine,

is the DFT of the vector x and,

is that

is all of w_iComponent vectors, y being labels of the Gaussian type, d_w′d_h' is the width and height of the feature matrix, λ is the regular term coefficient,

is the mahalanobis distance and is,

and is

Anti-interference measurement regular correlation filtering is further carried out on the target image obtained in the step (2.1) through the model, discrimination and tracking of target features are enhanced, the filtered interference items are pushed into a negative domain, and a positive space target tracking picture P (X) is obtained_i，w_i|I∈Ω_o)：

Wherein the content of the first and second substances,

is the kth sub-vector in the interference rejection metric canonical correlation filter model,

is the k-th sub-vector in the total sample vector,

by passing

The result of the online update is that,

is to ask for

The inverse FFT of (a) results in tracking of the t-th frame,

is that

is defined as:

wherein, X_iIs the ith sample vector of the sample,

is the mth cycle sample of the kth base sample,

(4) continuously tracking video objects

representing a series of N_bA boolean diagram of the channels is shown,

denotes a spatially dependent operation, β_iRepresenting a weight coefficient to be optimized, e^(·)Denotes an exponential function, Ω_o∈R²Representing a target area, o representing a target appearing in the scene,

representing a series of N_xBy moving a basic HOG eigen-channel vector

Obtaining that all characteristic channels are independently distributed),

represents an ASA filter;

In this example, the regularization term coefficient λ is 0.001, and the fusion coefficient η is 0.3.

Example 2: as shown in fig. 1 and 2, the video target tracking method based on parallel attention-dependent filtering is to design a tracking problem as a probability of estimating a target position, integrate a spatial selective attention SSA and an apparent selective attention ASA, and obtain a target function by using a Log function to realize continuous and effective tracking of a video target, and includes the following steps:

(1) acquiring an SSA position response diagram:firstly, a series of binary images are generated aiming at a tracking target to describe the topological structure between the target and the surrounding scene thereof under different granularities, and the images are arranged from top to bottom according to the describing granularity from coarse to fine to obtain a group of Boolean images B of the tracking target_iThe coarse-grained Boolean graph encodes and describes the apparent change of an obvious target for the global shape information, and the fine-grained Boolean graph describes the detailed structure of a space; then, a binary filter F is defined for the tracking target, and F is acted on a Boolean graph B_iObtaining a conditional position response graph, completing learning weight by minimizing a linear regression function, learning an optimal weight for each Boolean graph, and weighting each graph to obtain a final position response graph:

(2) obtaining an ASA target map: firstly, sampling a series of interference regions in a semi-local region around a tracking target, approximately equating a ridge regression target function to be a measurement learning related filter, learning anti-interference distance measurement in related video filtering, and solving the mutual relation between modeling positive samples; then introducing an anti-interference measurement regular term, performing anti-interference measurement regular correlation filtering on the target image, learning anti-interference distance measurement in the correlation filtering, meanwhile, considering useful correlation from a real negative sample, pushing the interference term into a negative domain, and acquiring a target tracking picture:

In this example, the procedure is the same as in example 1, where the regularization term coefficient λ is 0.001 and the fusion coefficient η is 0.3.

While the present invention has been described with reference to the accompanying drawings, it is to be understood that the invention is not limited thereto, and that various changes and modifications may be made without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. A video target tracking method based on parallel attention-dependent filtering is characterized in that: the method comprises the following steps of designing a tracking problem as a probability of estimating a target position, integrating space selection attention SSA and apparent selection attention ASA, obtaining a target function by using a Log function, and realizing continuous and effective tracking of a video target, wherein the tracking problem comprises the following steps:

(1) acquiring an SSA position response diagram: firstly, aiming at a tracking target, a series of binary images are generated in a local area around the tracking target to describe the topological structure between the target and the surrounding scene under different granularities, and the images are arranged from top to bottom according to the describing granularity from coarse to fine to obtain a group of Boolean images of the tracking target

，i=1，2，......，

The coarse-grained Boolean graph encodes and describes the apparent change of an obvious target for the global shape information, and the fine-grained Boolean graph describes the detailed structure of a space; then, a binary filter is defined for the tracking target

Will be

Acting on Boolean diagrams

Obtaining a conditional position response graph, completing learning weight by minimizing a linear regression function, learning an optimal weight for each Boolean graph, and weighting each graph to obtain a final position response graph:

(2) obtaining an ASA target map: firstly, sampling a series of interference regions in a semi-local region around a tracking target, approximately equating a ridge regression target function to be a measurement learning related filter, and learning anti-interference distance measurement in related video filtering; then, introducing an anti-interference measurement regular term, performing anti-interference measurement regular correlation filtering on the target image, pushing the interference term into a negative domain, and acquiring a target tracking picture:

2. The method for tracking video target based on parallel attention-dependent filtering according to claim 1, wherein: the video target tracking method comprises the following specific steps:

(1) obtaining SSA location response graph

wherein the content of the first and second substances,

is shown asjThe intensity of each of the pixels is determined,

is a function of a single element, and is,

a function of rounding is represented which is,

is an RGB color channel map of an image block, T denotes transpose;

arranging the pictures from top to bottom according to coarse-fine description granularity to obtain a group of tracking target Boolean graphs

，i=1，2，......，

The coarse-grained Boolean graph encodes and describes the apparent change of an obvious target for the global shape information, and the fine-grained Boolean graph describes the detailed structure of a space;

(1.2) weight learning: defining a binary filter for a tracked object

Will be

Acting on the Boolean graph of the tracked target obtained in step (1.1)

In the above, a set of conditional position response maps is obtained, and learning weights is accomplished by minimizing a linear regression function as follows, and an optimal weight is learned for each Boolean map

Weighting each graph to obtain a set of final position response graphs

Wherein the content of the first and second substances,

is the area in the scene where the object is present,

is the background area that appears in the scene,

is the width of the feature that is,

is the height of the feature or features,

is the firstkThe vector of classifier parameters for the frame,

is the number of non-blank pixels in the target area,

is the number of non-blank pixels in the background area,

is a weight coefficient to be optimized

By passing

The updating is carried out on line, and the updating is carried out,

is the vector of weight coefficients after the update,

is the fusion coefficient of the image to be fused,

is the weight coefficient vector of the current frame;

(2) obtaining an ASA target map

Learning an anti-interference distance measure in the relevant video filtering;

wherein the content of the first and second substances,

is a matrix of samples of the sample to be sampled,

is that

The DFT of the vector is a function of,

is that

The number of the ith row of (a),

is the firstiA sample matrix

The corresponding weight of the relevant filter is used,

is all that

The vector of the composition is then calculated,

is a tag of the gaussian type and,

respectively the width and the height of the feature matrix,

is a coefficient of a regular term that,

is the mahalanobis distance and is,

and is

Anti-interference measurement regular correlation filtering is further carried out on the target image obtained in the step (2.1) through the model, discrimination and tracking of target features are enhanced, the filtered interference items are pushed into a negative domain, and a positive space target tracking picture is obtained

Wherein the content of the first and second substances,

is the first of the regular correlation filtering weights of the interference rejection metrickThe number of sub-vectors,

is the first in the total sample vectorkThe number of sub-vectors,

is the first in a Gaussian label vectorkThe number of sub-vectors,

is the firstiThe weight vector corresponding to each cyclic sample matrix,

by passing

The online update is obtained by the online update,

is to ask for

The inverse FFT of (a) results in tracking of the t-th frame,

is that

The conjugate transpose of (a) is performed,

is a matrix of units, and is,

is a coefficient of a regular term that,

is the fusion coefficient;

is defined as:

wherein the content of the first and second substances,

is the firstiA vector of one sample of the plurality of samples,

is the firstkA first of the basic samplesmThe number of samples in a cycle is,

is the firstkA first of the basic samplesnThe number of samples in a cycle is,

is the sample difference weight;

(4) continuously tracking video objects

；

wherein the content of the first and second substances,

representing the obtained SSA position response graph,

representing a series of

A boolean diagram of the channels is shown,

a boolean diagram filter is shown that is,

showing the target map of the ASA obtained,

a spatial correlation operation is represented that is,

represents a weight coefficient to be optimized,

the function of the index is expressed in terms of,

a target area is represented by a number of images,

an object that is present in the scene is represented,

representing a series of

Each by shifting a basic HOG eigen-channel vector

All the characteristic channels are independently distributed,

represents an ASA filter;

3. The method for tracking video target based on parallel attention-dependent filtering according to claim 2, wherein: the value of the regular term coefficient is 0.001, and the value of the fusion coefficient is 0.3.