CN109102521B - Video target tracking method based on parallel attention-dependent filtering - Google Patents
Video target tracking method based on parallel attention-dependent filtering Download PDFInfo
- Publication number
- CN109102521B CN109102521B CN201810647331.2A CN201810647331A CN109102521B CN 109102521 B CN109102521 B CN 109102521B CN 201810647331 A CN201810647331 A CN 201810647331A CN 109102521 B CN109102521 B CN 109102521B
- Authority
- CN
- China
- Prior art keywords
- target
- tracking
- function
- interference
- boolean
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
- G06T7/246—Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
- G06T7/262—Analysis of motion using transform domain methods, e.g. Fourier domain methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10016—Video; Image sequence
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20048—Transform domain processing
- G06T2207/20056—Discrete and fast Fourier transform, [DFT, FFT]
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Multimedia (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Mathematical Physics (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a video target tracking method based on parallel attention-related filtering, and belongs to the technical field of image processing. The tracking problem is designed to be the probability of estimating a target position, the space selection attention SSA and the apparent selection attention ASA are integrated, a target function is obtained by utilizing a Log function, and the continuous and effective tracking of the video target is realized. Firstly, SSA modeling is carried out, a series of binary graphs are generated, a position response graph is obtained through filtering, then a series of interference areas are sampled in a semi-local area around a tracking target, anti-interference distance measurement is learned in relevant video filtering, anti-interference measurement regular relevant filtering is carried out, an interference item is pushed into a negative area, an ASA target graph is obtained, and then images processed in the local area and the semi-local area are fused through a target function obtained through a Log function to track the target. The method has the advantages of more stability and accuracy in problem processing, strong adaptability, good tracking effect and the like.
Description
Technical Field
The invention relates to a video target tracking method based on parallel attention-related filtering, and belongs to the technical field of image processing.
Background
Visual tracking is a prerequisite in some important computer vision applications, such as video surveillance, behavior recognition, video retrieval and human-computer interaction. Although visual tracking techniques have advanced significantly in recent years, given only the target location information of the first frame, in some unconstrained environments, it is still challenging to continuously track a general target because the appearance of the target is severely affected by interference factors such as occlusion, rapid motion, and deformation.
The task of target tracking is to find the target location and determine the target characteristics, which is a question of where and what, and also related to the attention selection mechanism in human visual perception. Psychological and cognitive research evidence suggests that human visual perception is so important and selective that the human visual system can focus on rapidly processing relevant important visual information. There are two main mechanisms of visual attention in human visual perception: one is Spatially Selective Attention (SSA), which reduces the relative field of a neuron and increases the sensitivity to a particular location in the visual domain; another is Apparent Selective Attention (ASA), which enhances activity in different regions of the cerebral cortex by specifically processing different types of features to enhance response values.
After leaving the eyes, these scene input signals entering the forehead cortex are split into dorsal and ventral flow, the former exploiting the existing spatial relationship (i.e. where) and the latter emphasizing the apparent features (i.e. what). Some perceptual studies have demonstrated that these two types of functions may be processed in parallel, and that these mechanisms may play an important role in dealing with distractors, blurring and occlusion of target tracking. How to use these studies to deal with the problem of where and what in the relevant filtering type tracker is of great significance to solve the target tracking in complex environments.
Disclosure of Invention
The invention aims to solve the technical problem that the conventional target tracking method cannot continuously track a common target, and provides a video target tracking method based on parallel attention-dependent filtering.
In order to solve the above technical problems, the present invention provides a video target tracking method based on parallel attention correlation filtering, which designs a tracking problem as a probability of estimating a target position, integrates Spatial Selective Attention (SSA) and Apparent Selective Attention (ASA), obtains a target function by using a Log function, and realizes continuous and effective tracking of a video target, and comprises the following steps:
(1) acquiring an SSA position response diagram: first, for a tracking target, in the vicinity of the tracking targetA local area, a series of binary images are generated to describe the topological structure between the target and the surrounding scene under different granularities, the images are arranged from top to bottom according to the describing granularity from coarse to fine to obtain a group of tracking target Boolean images Bi(i=1,2,......,Nb) The coarse-grained Boolean graph encodes and describes the apparent change of an obvious target for the global shape information, and the fine-grained Boolean graph describes the detailed structure of a space; then, a binary filter F is defined for the tracking target, and F is acted on a Boolean graph BiObtaining a conditional position response graph, completing learning weight by minimizing a linear regression function, learning an optimal weight for each Boolean graph, and weighting each graph to obtain a final position response graph:
(2) obtaining an ASA target map: firstly, sampling a series of interference regions in a semi-local region around a tracking target, approximately equating a ridge regression target function to be a measurement learning related filter, learning anti-interference distance measurement in related video filtering, and solving the mutual relation between modeling positive samples; then, introducing an anti-interference measurement regular term, performing anti-interference measurement regular correlation filtering on the target image, learning anti-interference distance measurement in the correlation filtering, meanwhile, considering useful correlation from a real negative sample, pushing the interference term into a negative domain, and acquiring a target tracking picture:
(3) continuously tracking the video target: and obtaining an objective function integrating SSA and ASA through Log function modeling, tracking the video target by using the function, and updating parameters on line to realize effective tracking of the video target.
The video target tracking method based on the parallel attention-dependent filtering comprises the following specific steps:
(1) obtaining SSA location response graph
(1.1) for a tracked target, in a local region around the tracked target, generating a series of binary maps to describe the topology between the target and its surrounding scene at different granularities by:
wherein I (j) represents the jth pixel intensity, U (-) is a univariate function, R (-) represents an integer function,is an RGB color channel map of an image block, T denotes transpose;
arranging the pictures from top to bottom according to coarse-fine description granularity to obtain a group of tracking target Boolean graphs Bi(i=1,2,......,Nb) The coarse-grained Boolean graph encodes the global shape information to describe obvious target appearance change, and the fine-grained Boolean graph describes the detailed structure of the space
(1.2) weight learning: conventionally, a binary filter is defined for the tracked objectActing F on the tracking target Boolean graph B obtained in the step (1.1)iIn the above, a set of conditional position response maps is obtained, and learning weights is accomplished by minimizing a linear regression function as follows, and an optimal weight is learned for each Boolean mapWeighting each map to obtain a final position response map P (B)i,F|I∈Ωo):
Wherein omegaoIs the area in the scene where the target appears, ΩbIs the background area present in the scene, dwIs the width of the feature, dhIs the height of the feature or features,is the classifier parameter vector for the k-th frame,is the number of non-blank pixels in the target area,is the number of non-blank pixels in the background area, betakIs a weight coefficient to be optimizedNeed to pass throughUpdated online to accommodate apparent changes in the target over time, βtIs the weight coefficient vector after the update, η is the fusion coefficient,is the weight coefficient vector of the current frame;
(2) obtaining an ASA target map
(2.1) sampling a series of interference regions in a semi-local area around a tracked targetCorrelation filter for learning by approximately equating the following ridge regression objective function to a metricLearning an anti-interference distance measure in the relevant video filtering;
wherein the content of the first and second substances,xiis a matrix of samples of the sample to be sampled,is the DFT of the vector x and,is thatRow i of (1), wiIs the ith sample matrix xiThe corresponding weight of the relevant filter is used,is all of wiComponent vectors, y being labels of the Gaussian type, dw′dh' width and height of the feature matrix, respectively, lambda is the regular term coefficient,is the mahalanobis distance and is,and is
(2.2) introducing an anti-interference measurement regular term into a correlation filtering objective function to obtain an anti-interference measurement regular correlation filtering modelAnti-interference measurement regular correlation filtering is further carried out on the target image obtained in the step (2.1) through the model, discrimination and tracking of target features are enhanced, the filtered interference items are pushed into a negative domain, and a positive space target tracking picture P (X) is obtainedi,wi||∈Ωo):
Wherein the content of the first and second substances,is the k-th sub-vector in the interference rejection metric canonical correlation filter weight,is the k-th sub-vector in the total sample vector,is the k-th sub-vector, w, of the Gaussian-shaped tag vectoriIs the weight vector corresponding to the ith cyclic sample matrix,by passingThe online update is obtained by the online update,is to ask forThe inverse FFT of (a) results in tracking of the t-th frame,is thatI is an identity matrix, λ is a regular term coefficient, η is a fusion coefficient;
wherein x isiIs the ith sample vector of the sample,is the mth cycle sample of the kth base sample,is the n-th cyclic sample of the k-th base sample, wmnIs the sample difference weight (for measuring the similarity between the samples i and j, the greater the weight, the greater the difference of the samples, the more discriminative the learned apparent features);
(4) continuously tracking video objects
Modeling through a Log function, and integrating SSA and ASA images to obtain the following target functions:
wherein, P (B)i,F|I∈Ωo) Representing the obtained SSA position response graph, representing a series of NbA boolean diagram of the channels is shown,showing a Boolean Filter, P (X)i,wi|I∈Ωo) Showing the target map of the ASA obtained,denotes a spatially dependent operation, βiRepresenting a weight coefficient to be optimized, e(·)Denotes an exponential function, Ωo∈R2Representing the target area, o represents the object present in the sceneThe mark is that,representing a series of NxBy moving a basic HOG eigen-channel vectorObtaining that all characteristic channels are independently distributed),represents an ASA filter;
and tracking the video target by using the target function, updating parameters on line and realizing effective tracking of the target.
The value of the regular term coefficient lambda is 0.001, and the value of the fusion coefficient eta is 0.006.
The principle of the invention is as follows:
the core of the invention is to plan the tracking problem as estimating a target position probability, seamlessly integrate SSA and ASA:
here omegao∈R2Representing the target area and o representing the target appearing in the scene,representing a series of NbA boolean diagram of the channels is shown,is a series of NxEach by shifting a basic HOG eigen-channel vectorThe obtained material has the advantages of high yield,andare their corresponding filters. Furthermore, for simplicity, all feature channels are assumed to be distributed independently. Finally, a Log function is used on both sides of the formula (1) to obtain:
p (B) hereini,F|I∈Ωo) And P (X)i,wi||∈Ωo) Is defined as:
here is a spatial correlation operation, βiIs a weight coefficient to be optimized, and e(·)Is an exponential function.
In modeling SSA, the present invention first generates a series of binary maps, i.e., BMRs, describing the topology between the target and its surrounding scene at different granularities. In fig. 2, from top to bottom, a boolean diagram describes the granularity from coarse to fine, where a coarse-grained boolean diagram encodes global shape information that is robust to large target appearance variations, whereas a fine-grained boolean diagram describes spatial structural details that are effective for precise target localization. A predefined binary filter is then applied to the maps to obtain a set of conditional position response maps, each of which is weighted to obtain a final position response map, with the goal of learning an optimal weight for each Boolean map.
BMR is inspired by recent human visual attention studies and appears as a temporal perceptual awareness of a scene that can be represented by a set of boolean graphs. In particular, givenIs RGB of an image blockColor channel map, its correspondingIs obtained from the formula
Here threshold value thetaiFrom one at [0, 255]The independent distribution between them (black and white binary graph) and this symbol ≧ represents the inequality sign at the element level. For simplicity, the threshold is set to θi=Nb(i-1)/255 with a fixed step size δ ═ NbThe/255 samples from 0 to 255 because fixed step sampling is fully equivalent to infinite delta → 0 uniform sampling. Thus, it is easy to proveAnd, the jth pixel intensity I (j) can be expressed as
Where U (·) is a univariate function, such as U (2) ═ 1; 1; 0],U(3)=[1;1;1]There are 3 discrete layers, and R (-) represents an integer function.Is an RGB color channel map of an image block,
in weight learning, the present invention learns weights by minimizing the following linear regression function:
here | · | non-conducting phosphorFRepresenting the F norm. It is obvious that the minimization ofEquivalent to minimizing the following objective function:
here, formula (5) has been replaced by formula (7). OmegaoAnd ΩbRespectively represent the target and background regions, and
In the aspect of solving the interference problem, the invention utilizes the principle that ASA in human visual perception emphasizes the learning of apparent characteristics, and pushes the interference item into a negative space by learning an anti-interference distance measurement, so that the discrimination capability of the characteristics is enhanced, the robust tracking aiming at the interference item is generated, and the target can be well distinguished from the interference item. Learning the correlation filtering is approximated as learning a distance metric to solve the correlation between modeling positive samples, then learning the anti-interference distance metric in the correlation filtering, while considering the useful correlation from the true negative samples.
In distance metric learning, the learning CF is represented as a spatial ridge regression objective function:
herein, theIs a target of the gaussian regression,and λ is a regular term coefficient. It is noted that ifIs remolded toFor any a ≠ 0, then, equation (10) can be re-formulated asExcept for reshaping y at a 1/a ratio, it is equivalent to equation (10) and, due to the same maximum response position, this will yield the same tracking result.
Based on this, in order to clearly show the relationship between the correlation filter learning and the metric learning, the setting in equation (10) isAnd remouldsThis is equivalent to adding constraints therein
herein, theIs the mahalanobis distance and is,and isIs a vector of all ones. Therefore, learning the correlation filter can be roughly regarded as learning an optimal distance metric.
However, only the relationship between the positive samples is considered in equation (11), and thus its discrimination ability to distinguish the object from the background is limited. To solve this problem, an interference rejection metric regularization term is added to equation (10), which consists of the relationship of the negative space and acts as a force to push the interference term to the negative space.
In performing the interference rejection metric canonical correlation filtering, a series of interference regions are first sampled from a semi-local region around the targetAnd then modeling the interaction between them asAnd is integrated into equation (10) as a regularization term:
where γ is a regular term coefficient, and wmnIs a weight that measures the similarity between samples i and j. The greater the weight, the greater the sample variability, making the learned appearance more discriminative.
Equation (12) can be re-formulated as:
Because the circulant matrix x satisfies
Where F denotes a Discrete Fourier Transform (DFT) matrix,DFT representing a reference vector x, and FH=(F*)TRepresenting a conjugate transpose. Using this modeling, equation (15) can be diagonalized into
FFT that obtains its solution by substituting equations (17) and (18) into equation (14)
Herein, theIts ith element isThe k element of (2), andsimilar to the formula (9),obtained by on-line updating
Herein, theBecause of the fact thatIs the number of rows ofIt is the column number, directly calculated in the formula (19)The inverse of (c) is not very practical. Instead, we use transformationsTo calculateThe inverse of (c). In obtaining allIt can then be computed in parallel, the optimal solution of equation (14)Can be obtained byThe inverse FFT of (a).
The invention provides a related filtering tracking algorithm based on human visual perception, reflects SSA and ASA mechanisms in the human visual perception, and enhances the robustness and anti-interference of target tracking by processing a local and semi-local background domain in parallel. For the local domain, to model SSA, a simple but efficient BMR is introduced into the correlation filtering learning to delineate the local topology of the object and its scene by randomly binarizing the image color channels, which is invariant to the various transformations. For the semi-local domain, in order to model ASA, an interference rejection metric regularization term is introduced into the objective function of the correlation filtering, which is used as a force for pushing the interference term into the negative domain, so that the tracking robustness is enhanced when the challenging target similar object interference term is encountered. The method has the advantages of more stability and accuracy in problem processing, strong adaptability, good tracking effect and the like, and can realize continuous and effective tracking of the video target.
Drawings
Fig. 1 is a schematic diagram of the present invention.
FIG. 2 is a flow chart of the present invention for modeling SSA.
Detailed Description
The following detailed description of the embodiments of the present invention will be described in detail with reference to the accompanying drawings, wherein the technology and products not shown in the embodiments are all conventional products available in the art or commercially.
Example 1: as shown in fig. 1 and 2, the video target tracking method based on parallel attention-dependent filtering is to design a tracking problem as a probability of estimating a target position, integrate Spatial Selective Attention (SSA) and Apparent Selective Attention (ASA), and obtain a target function by using a Log function to realize continuous and effective tracking of a video target, and includes the following steps:
(1) acquiring an SSA position response diagram: firstly, aiming at a tracking target, a series of binary images are generated in a local area around the tracking target to describe the topological structure between the target and the surrounding scene under different granularities, and the images are arranged from top to bottom according to the describing granularity from coarse to fine to obtain a group of Boolean images B of the tracking targeti(i=1,2,......,Nb) The coarse-grained Boolean graph encodes and describes the apparent change of an obvious target for the global shape information, and the fine-grained Boolean graph describes the detailed structure of a space; then, a binary filter F is defined for the tracking target, and F is acted on a Boolean graph BiObtaining a conditional position response graph, completing learning weight by minimizing a linear regression function, learning an optimal weight for each Boolean graph, and weighting each graph to obtain a final position response graph:
(2) obtaining an ASA target map: firstly, sampling a series of interference regions in a semi-local region around a tracking target, approximately equating a ridge regression target function to be a measurement learning related filter, learning anti-interference distance measurement in related video filtering, and solving the mutual relation between modeling positive samples; then, introducing an anti-interference measurement regular term, performing anti-interference measurement regular correlation filtering on the target image, learning anti-interference distance measurement in the correlation filtering, meanwhile, considering useful correlation from a real negative sample, pushing the interference term into a negative domain, and acquiring a target tracking picture:
(3) continuously tracking the video target: and obtaining an objective function integrating SSA and ASA through Log function modeling, tracking the video target by using the function, and updating parameters on line to realize effective tracking of the video target.
The video target tracking method based on the parallel attention correlation filtering comprises the following specific steps:
(1) obtaining SSA location response graph
(1.1) for a tracked target, in a local region around the tracked target, generating a series of binary maps to describe the topology between the target and its surrounding scene at different granularities by:
wherein I (j) represents the jth pixel intensity, U (-) is a univariate function, R (-) represents an integer function,is an RGB color channel map of an image block, T denotes transpose;
arranging the pictures from top to bottom according to coarse-fine description granularity to obtain a group of tracking target Boolean graphs Bi(i=1,2,......,Nb) The coarse-grained Boolean graph encodes the global shape information to describe obvious target appearance change, and the fine-grained Boolean graph describes the detailed structure of the space
(1.2) weight learning: conventionally, a binary filter is defined for the tracked objectActing F on the tracking target Boolean graph B obtained in the step (1.1)iIn the above, a set of conditional position response maps is obtained, and learning weights is accomplished by minimizing a linear regression function as follows, and an optimal weight is learned for each Boolean mapWeighting each map to obtain a final position response map P (B)i,F|I∈Ωo):
Wherein omegaoIs the area in the scene where the target appears, ΩbIs the background area present in the scene, dwIs the width of the feature, dhIs the height of the feature or features,is the classifier parameter vector for the k-th frame,is the number of non-blank pixels in the target area,is the number of non-blank pixels in the background area, betakIs a weight coefficient to be optimizedNeed to pass throughUpdated online to accommodate apparent changes in the target over time, βtIs the weight coefficient vector after the update, η is the fusion coefficient,is the weight coefficient vector of the current frame;
(2) obtaining an ASA target map
(2.1) sampling a series of interference regions in a semi-local area around a tracked targetApproximately equating the following ridge regression objective function to one degreeCorrelation filter for quantity learningLearning an anti-interference distance measure in the relevant video filtering;
wherein the content of the first and second substances,xithe matrix of samples is then used to determine,is the DFT of the vector x and,is thatRow i of (1), wiIs the ith sample matrix XiThe corresponding weight of the relevant filter is used,is all of wiComponent vectors, y being labels of the Gaussian type, dw′dh' is the width and height of the feature matrix, λ is the regular term coefficient,is the mahalanobis distance and is,and is
(2.2) introducing an anti-interference measurement regular term into a correlation filtering objective function to obtain an anti-interference measurement regular correlation filtering modelAnti-interference measurement regular correlation filtering is further carried out on the target image obtained in the step (2.1) through the model, discrimination and tracking of target features are enhanced, the filtered interference items are pushed into a negative domain, and a positive space target tracking picture P (X) is obtainedi,wi|I∈Ωo):
Wherein the content of the first and second substances,is the kth sub-vector in the interference rejection metric canonical correlation filter model,is the k-th sub-vector in the total sample vector,is the k-th sub-vector, w, of the Gaussian-shaped tag vectoriIs the weight vector corresponding to the ith cyclic sample matrix,by passingThe result of the online update is that,is to ask forThe inverse FFT of (a) results in tracking of the t-th frame,is thatI is an identity matrix, λ is a regular term coefficient, η is a fusion coefficient;
wherein, XiIs the ith sample vector of the sample,is the mth cycle sample of the kth base sample,is the n-th cyclic sample of the k-th base sample, wmnIs the sample difference weight (for measuring the similarity between the samples i and j, the greater the weight, the greater the difference of the samples, the more discriminative the learned apparent features);
(4) continuously tracking video objects
Modeling through a Log function, and integrating SSA and ASA images to obtain the following target functions:
wherein, P (B)i,F|I∈Ωo) Representing the obtained SSA position response graph, representing a series of NbA boolean diagram of the channels is shown,showing a Boolean Filter, P (X)i,wi|I∈Ωo) Showing the target map of the ASA obtained,denotes a spatially dependent operation, βiRepresenting a weight coefficient to be optimized, e(·)Denotes an exponential function, Ωo∈R2Representing a target area, o representing a target appearing in the scene,representing a series of NxBy moving a basic HOG eigen-channel vectorObtaining that all characteristic channels are independently distributed),represents an ASA filter;
and tracking the video target by using the target function, updating parameters on line and realizing effective tracking of the target.
In this example, the regularization term coefficient λ is 0.001, and the fusion coefficient η is 0.3.
Example 2: as shown in fig. 1 and 2, the video target tracking method based on parallel attention-dependent filtering is to design a tracking problem as a probability of estimating a target position, integrate a spatial selective attention SSA and an apparent selective attention ASA, and obtain a target function by using a Log function to realize continuous and effective tracking of a video target, and includes the following steps:
(1) acquiring an SSA position response diagram:firstly, a series of binary images are generated aiming at a tracking target to describe the topological structure between the target and the surrounding scene thereof under different granularities, and the images are arranged from top to bottom according to the describing granularity from coarse to fine to obtain a group of Boolean images B of the tracking targetiThe coarse-grained Boolean graph encodes and describes the apparent change of an obvious target for the global shape information, and the fine-grained Boolean graph describes the detailed structure of a space; then, a binary filter F is defined for the tracking target, and F is acted on a Boolean graph BiObtaining a conditional position response graph, completing learning weight by minimizing a linear regression function, learning an optimal weight for each Boolean graph, and weighting each graph to obtain a final position response graph:
(2) obtaining an ASA target map: firstly, sampling a series of interference regions in a semi-local region around a tracking target, approximately equating a ridge regression target function to be a measurement learning related filter, learning anti-interference distance measurement in related video filtering, and solving the mutual relation between modeling positive samples; then introducing an anti-interference measurement regular term, performing anti-interference measurement regular correlation filtering on the target image, learning anti-interference distance measurement in the correlation filtering, meanwhile, considering useful correlation from a real negative sample, pushing the interference term into a negative domain, and acquiring a target tracking picture:
(3) continuously tracking the video target: and obtaining an objective function integrating SSA and ASA through Log function modeling, tracking the video target by using the function, and updating parameters on line to realize effective tracking of the video target.
In this example, the procedure is the same as in example 1, where the regularization term coefficient λ is 0.001 and the fusion coefficient η is 0.3.
While the present invention has been described with reference to the accompanying drawings, it is to be understood that the invention is not limited thereto, and that various changes and modifications may be made without departing from the spirit and scope of the invention as defined by the appended claims.
Claims (3)
1. A video target tracking method based on parallel attention-dependent filtering is characterized in that: the method comprises the following steps of designing a tracking problem as a probability of estimating a target position, integrating space selection attention SSA and apparent selection attention ASA, obtaining a target function by using a Log function, and realizing continuous and effective tracking of a video target, wherein the tracking problem comprises the following steps:
(1) acquiring an SSA position response diagram: firstly, aiming at a tracking target, a series of binary images are generated in a local area around the tracking target to describe the topological structure between the target and the surrounding scene under different granularities, and the images are arranged from top to bottom according to the describing granularity from coarse to fine to obtain a group of Boolean images of the tracking target,i=1,2,......,The coarse-grained Boolean graph encodes and describes the apparent change of an obvious target for the global shape information, and the fine-grained Boolean graph describes the detailed structure of a space; then, a binary filter is defined for the tracking targetWill beActing on Boolean diagramsObtaining a conditional position response graph, completing learning weight by minimizing a linear regression function, learning an optimal weight for each Boolean graph, and weighting each graph to obtain a final position response graph:
(2) obtaining an ASA target map: firstly, sampling a series of interference regions in a semi-local region around a tracking target, approximately equating a ridge regression target function to be a measurement learning related filter, and learning anti-interference distance measurement in related video filtering; then, introducing an anti-interference measurement regular term, performing anti-interference measurement regular correlation filtering on the target image, pushing the interference term into a negative domain, and acquiring a target tracking picture:
(3) continuously tracking the video target: and obtaining an objective function integrating SSA and ASA through Log function modeling, tracking the video target by using the function, and updating parameters on line to realize effective tracking of the video target.
2. The method for tracking video target based on parallel attention-dependent filtering according to claim 1, wherein: the video target tracking method comprises the following specific steps:
(1) obtaining SSA location response graph
(1.1) for a tracked target, in a local region around the tracked target, generating a series of binary maps to describe the topology between the target and its surrounding scene at different granularities by:
wherein the content of the first and second substances,is shown asjThe intensity of each of the pixels is determined,is a function of a single element, and is,a function of rounding is represented which is,is an RGB color channel map of an image block, T denotes transpose;
arranging the pictures from top to bottom according to coarse-fine description granularity to obtain a group of tracking target Boolean graphs,i=1,2,......,The coarse-grained Boolean graph encodes and describes the apparent change of an obvious target for the global shape information, and the fine-grained Boolean graph describes the detailed structure of a space;
(1.2) weight learning: defining a binary filter for a tracked objectWill beActing on the Boolean graph of the tracked target obtained in step (1.1)In the above, a set of conditional position response maps is obtained, and learning weights is accomplished by minimizing a linear regression function as follows, and an optimal weight is learned for each Boolean mapWeighting each graph to obtain a set of final position response graphs
Wherein the content of the first and second substances,is the area in the scene where the object is present,is the background area that appears in the scene,is the width of the feature that is,is the height of the feature or features,is the firstkThe vector of classifier parameters for the frame,is the number of non-blank pixels in the target area,is the number of non-blank pixels in the background area,is a weight coefficient to be optimizedBy passingThe updating is carried out on line, and the updating is carried out,is the vector of weight coefficients after the update,is the fusion coefficient of the image to be fused,is the weight coefficient vector of the current frame;
(2) obtaining an ASA target map
(2.1) sampling a series of interference regions in a semi-local area around a tracked targetCorrelation filter for learning by approximately equating the following ridge regression objective function to a metricLearning an anti-interference distance measure in the relevant video filtering;
wherein the content of the first and second substances, is a matrix of samples of the sample to be sampled,is thatThe DFT of the vector is a function of,is thatThe number of the ith row of (a),is the firstiA sample matrixThe corresponding weight of the relevant filter is used,is all thatThe vector of the composition is then calculated,is a tag of the gaussian type and,respectively the width and the height of the feature matrix,is a coefficient of a regular term that,is the mahalanobis distance and is,and is
(2.2) introducing an anti-interference measurement regular term into a correlation filtering objective function to obtain an anti-interference measurement regular correlation filtering modelAnti-interference measurement regular correlation filtering is further carried out on the target image obtained in the step (2.1) through the model, discrimination and tracking of target features are enhanced, the filtered interference items are pushed into a negative domain, and a positive space target tracking picture is obtained
Wherein the content of the first and second substances,is the first of the regular correlation filtering weights of the interference rejection metrickThe number of sub-vectors,is the first in the total sample vectorkThe number of sub-vectors,is the first in a Gaussian label vectorkThe number of sub-vectors,is the firstiThe weight vector corresponding to each cyclic sample matrix,by passingThe online update is obtained by the online update,is to ask forThe inverse FFT of (a) results in tracking of the t-th frame,is thatThe conjugate transpose of (a) is performed,is a matrix of units, and is,is a coefficient of a regular term that,is the fusion coefficient;
wherein the content of the first and second substances,is the firstiA vector of one sample of the plurality of samples,is the firstkA first of the basic samplesmThe number of samples in a cycle is,is the firstkA first of the basic samplesnThe number of samples in a cycle is,is the sample difference weight;
(4) continuously tracking video objects
Modeling through a Log function, and integrating SSA and ASA images to obtain the following target functions:
wherein the content of the first and second substances,representing the obtained SSA position response graph, representing a series ofA boolean diagram of the channels is shown,a boolean diagram filter is shown that is,showing the target map of the ASA obtained,a spatial correlation operation is represented that is,represents a weight coefficient to be optimized,the function of the index is expressed in terms of,a target area is represented by a number of images,an object that is present in the scene is represented,representing a series ofEach by shifting a basic HOG eigen-channel vectorAll the characteristic channels are independently distributed,represents an ASA filter;
and tracking the video target by using the target function, updating parameters on line and realizing effective tracking of the target.
3. The method for tracking video target based on parallel attention-dependent filtering according to claim 2, wherein: the value of the regular term coefficient is 0.001, and the value of the fusion coefficient is 0.3.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810647331.2A CN109102521B (en) | 2018-06-22 | 2018-06-22 | Video target tracking method based on parallel attention-dependent filtering |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810647331.2A CN109102521B (en) | 2018-06-22 | 2018-06-22 | Video target tracking method based on parallel attention-dependent filtering |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109102521A CN109102521A (en) | 2018-12-28 |
CN109102521B true CN109102521B (en) | 2021-08-27 |
Family
ID=64844863
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810647331.2A Active CN109102521B (en) | 2018-06-22 | 2018-06-22 | Video target tracking method based on parallel attention-dependent filtering |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109102521B (en) |
Families Citing this family (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109919114A (en) * | 2019-03-14 | 2019-06-21 | 浙江大学 | One kind is based on the decoded video presentation method of complementary attention mechanism cyclic convolution |
CN109993777B (en) * | 2019-04-04 | 2021-06-29 | 杭州电子科技大学 | Target tracking method and system based on dual-template adaptive threshold |
CN110102050B (en) | 2019-04-30 | 2022-02-18 | 腾讯科技(深圳)有限公司 | Virtual object display method and device, electronic equipment and storage medium |
CN110335290B (en) * | 2019-06-04 | 2021-02-26 | 大连理工大学 | Twin candidate region generation network target tracking method based on attention mechanism |
CN110443852B (en) * | 2019-08-07 | 2022-03-01 | 腾讯科技(深圳)有限公司 | Image positioning method and related device |
CN111428771B (en) * | 2019-11-08 | 2023-04-18 | 腾讯科技(深圳)有限公司 | Video scene classification method and device and computer-readable storage medium |
CN113704684B (en) * | 2021-07-27 | 2023-08-29 | 浙江工商大学 | Centralized fusion robust filtering method |
CN113808171A (en) * | 2021-09-27 | 2021-12-17 | 山东工商学院 | Unmanned aerial vehicle visual tracking method based on dynamic feature selection of feature weight pool |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105809713A (en) * | 2016-03-03 | 2016-07-27 | 南京信息工程大学 | Object tracing method based on online Fisher discrimination mechanism to enhance characteristic selection |
-
2018
- 2018-06-22 CN CN201810647331.2A patent/CN109102521B/en active Active
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105809713A (en) * | 2016-03-03 | 2016-07-27 | 南京信息工程大学 | Object tracing method based on online Fisher discrimination mechanism to enhance characteristic selection |
Non-Patent Citations (3)
Title |
---|
Visual Tracking via Nonlocal Similarity Learning;Qingshan Liu,etc;《IEEE Transaction on circuits and systems for video technology》;20170526;第28卷(第10期);第2826页至2835页 * |
Visual Tracking With Weighted Adaptive Local Sparse Appearance Model via Spatio-Temporal Context Learning;Zhetao Li,etc;《IEEE Transcation on image processing》;20180524;第27卷(第9期);第4478页至第4489页 * |
通道稳定性加权补充学习的实时视觉跟踪算法;樊佳庆等;《计算机应用》;20180610;第38卷(第6期);第1751页至1754页 * |
Also Published As
Publication number | Publication date |
---|---|
CN109102521A (en) | 2018-12-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109102521B (en) | Video target tracking method based on parallel attention-dependent filtering | |
Baldwin et al. | Time-ordered recent event (TORE) volumes for event cameras | |
Xiong et al. | Spatiotemporal modeling for crowd counting in videos | |
Babu Sam et al. | Switching convolutional neural network for crowd counting | |
Zhang et al. | Single-image crowd counting via multi-column convolutional neural network | |
CN111814661B (en) | Human body behavior recognition method based on residual error-circulating neural network | |
Smith et al. | Tracking the visual focus of attention for a varying number of wandering people | |
Hsueh et al. | Human behavior recognition from multiview videos | |
CN107590427B (en) | Method for detecting abnormal events of surveillance video based on space-time interest point noise reduction | |
Jeong et al. | Stereo saliency map considering affective factors and selective motion analysis in a dynamic environment | |
Hong et al. | Geodesic regression on the Grassmannian | |
Elayaperumal et al. | Robust visual object tracking using context-based spatial variation via multi-feature fusion | |
Luotamo et al. | Multiscale cloud detection in remote sensing images using a dual convolutional neural network | |
Cao et al. | Learning spatial-temporal representation for smoke vehicle detection | |
Medouakh et al. | Improved object tracking via joint color-LPQ texture histogram based mean shift algorithm | |
CN116740418A (en) | Target detection method based on graph reconstruction network | |
CN113313055A (en) | Non-overlapping vision field cross-camera network pedestrian re-identification method | |
Gori et al. | Semantic video labeling by developmental visual agents | |
Javed et al. | Deep bidirectional correlation filters for visual object tracking | |
Lei et al. | Convolutional restricted Boltzmann machines learning for robust visual tracking | |
Ranganatha et al. | SELECTED SINGLE FACE TRACKING IN TECHNICALLY CHALLENGING DIFFERENT BACKGROUND VIDEO SEQUENCES USING COMBINED FEATURES. | |
US20050259865A1 (en) | Object classification via time-varying information inherent in imagery | |
Meglouli et al. | A new technique based on 3D convolutional neural networks and filtering optical flow maps for action classification in infrared video | |
Priya et al. | A NOVEL METHOD FOR OBJECT DETECTION IN AUTONOMOUS DRIVING SYSTEM USING CSPResNeXt AND YOLO-V4. | |
Medasani et al. | Active learning system for object fingerprinting |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |