CN109102521B - Video target tracking method based on parallel attention-dependent filtering - Google Patents

Video target tracking method based on parallel attention-dependent filtering Download PDF

Info

Publication number
CN109102521B
CN109102521B CN201810647331.2A CN201810647331A CN109102521B CN 109102521 B CN109102521 B CN 109102521B CN 201810647331 A CN201810647331 A CN 201810647331A CN 109102521 B CN109102521 B CN 109102521B
Authority
CN
China
Prior art keywords
target
tracking
function
interference
boolean
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810647331.2A
Other languages
Chinese (zh)
Other versions
CN109102521A (en
Inventor
宋慧慧
樊佳庆
张开华
刘青山
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Information Science and Technology
Original Assignee
Nanjing University of Information Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Information Science and Technology filed Critical Nanjing University of Information Science and Technology
Priority to CN201810647331.2A priority Critical patent/CN109102521B/en
Publication of CN109102521A publication Critical patent/CN109102521A/en
Application granted granted Critical
Publication of CN109102521B publication Critical patent/CN109102521B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/262Analysis of motion using transform domain methods, e.g. Fourier domain methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20048Transform domain processing
    • G06T2207/20056Discrete and fast Fourier transform, [DFT, FFT]

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a video target tracking method based on parallel attention-related filtering, and belongs to the technical field of image processing. The tracking problem is designed to be the probability of estimating a target position, the space selection attention SSA and the apparent selection attention ASA are integrated, a target function is obtained by utilizing a Log function, and the continuous and effective tracking of the video target is realized. Firstly, SSA modeling is carried out, a series of binary graphs are generated, a position response graph is obtained through filtering, then a series of interference areas are sampled in a semi-local area around a tracking target, anti-interference distance measurement is learned in relevant video filtering, anti-interference measurement regular relevant filtering is carried out, an interference item is pushed into a negative area, an ASA target graph is obtained, and then images processed in the local area and the semi-local area are fused through a target function obtained through a Log function to track the target. The method has the advantages of more stability and accuracy in problem processing, strong adaptability, good tracking effect and the like.

Description

Video target tracking method based on parallel attention-dependent filtering
Technical Field
The invention relates to a video target tracking method based on parallel attention-related filtering, and belongs to the technical field of image processing.
Background
Visual tracking is a prerequisite in some important computer vision applications, such as video surveillance, behavior recognition, video retrieval and human-computer interaction. Although visual tracking techniques have advanced significantly in recent years, given only the target location information of the first frame, in some unconstrained environments, it is still challenging to continuously track a general target because the appearance of the target is severely affected by interference factors such as occlusion, rapid motion, and deformation.
The task of target tracking is to find the target location and determine the target characteristics, which is a question of where and what, and also related to the attention selection mechanism in human visual perception. Psychological and cognitive research evidence suggests that human visual perception is so important and selective that the human visual system can focus on rapidly processing relevant important visual information. There are two main mechanisms of visual attention in human visual perception: one is Spatially Selective Attention (SSA), which reduces the relative field of a neuron and increases the sensitivity to a particular location in the visual domain; another is Apparent Selective Attention (ASA), which enhances activity in different regions of the cerebral cortex by specifically processing different types of features to enhance response values.
After leaving the eyes, these scene input signals entering the forehead cortex are split into dorsal and ventral flow, the former exploiting the existing spatial relationship (i.e. where) and the latter emphasizing the apparent features (i.e. what). Some perceptual studies have demonstrated that these two types of functions may be processed in parallel, and that these mechanisms may play an important role in dealing with distractors, blurring and occlusion of target tracking. How to use these studies to deal with the problem of where and what in the relevant filtering type tracker is of great significance to solve the target tracking in complex environments.
Disclosure of Invention
The invention aims to solve the technical problem that the conventional target tracking method cannot continuously track a common target, and provides a video target tracking method based on parallel attention-dependent filtering.
In order to solve the above technical problems, the present invention provides a video target tracking method based on parallel attention correlation filtering, which designs a tracking problem as a probability of estimating a target position, integrates Spatial Selective Attention (SSA) and Apparent Selective Attention (ASA), obtains a target function by using a Log function, and realizes continuous and effective tracking of a video target, and comprises the following steps:
(1) acquiring an SSA position response diagram: first, for a tracking target, in the vicinity of the tracking targetA local area, a series of binary images are generated to describe the topological structure between the target and the surrounding scene under different granularities, the images are arranged from top to bottom according to the describing granularity from coarse to fine to obtain a group of tracking target Boolean images Bi(i=1,2,......,Nb) The coarse-grained Boolean graph encodes and describes the apparent change of an obvious target for the global shape information, and the fine-grained Boolean graph describes the detailed structure of a space; then, a binary filter F is defined for the tracking target, and F is acted on a Boolean graph BiObtaining a conditional position response graph, completing learning weight by minimizing a linear regression function, learning an optimal weight for each Boolean graph, and weighting each graph to obtain a final position response graph:
(2) obtaining an ASA target map: firstly, sampling a series of interference regions in a semi-local region around a tracking target, approximately equating a ridge regression target function to be a measurement learning related filter, learning anti-interference distance measurement in related video filtering, and solving the mutual relation between modeling positive samples; then, introducing an anti-interference measurement regular term, performing anti-interference measurement regular correlation filtering on the target image, learning anti-interference distance measurement in the correlation filtering, meanwhile, considering useful correlation from a real negative sample, pushing the interference term into a negative domain, and acquiring a target tracking picture:
(3) continuously tracking the video target: and obtaining an objective function integrating SSA and ASA through Log function modeling, tracking the video target by using the function, and updating parameters on line to realize effective tracking of the video target.
The video target tracking method based on the parallel attention-dependent filtering comprises the following specific steps:
(1) obtaining SSA location response graph
(1.1) for a tracked target, in a local region around the tracked target, generating a series of binary maps to describe the topology between the target and its surrounding scene at different granularities by:
Figure GDA0001812976290000021
wherein I (j) represents the jth pixel intensity, U (-) is a univariate function, R (-) represents an integer function,
Figure GDA0001812976290000022
is an RGB color channel map of an image block, T denotes transpose;
arranging the pictures from top to bottom according to coarse-fine description granularity to obtain a group of tracking target Boolean graphs Bi(i=1,2,......,Nb) The coarse-grained Boolean graph encodes the global shape information to describe obvious target appearance change, and the fine-grained Boolean graph describes the detailed structure of the space
(1.2) weight learning: conventionally, a binary filter is defined for the tracked object
Figure GDA0001812976290000023
Acting F on the tracking target Boolean graph B obtained in the step (1.1)iIn the above, a set of conditional position response maps is obtained, and learning weights is accomplished by minimizing a linear regression function as follows, and an optimal weight is learned for each Boolean map
Figure GDA0001812976290000024
Weighting each map to obtain a final position response map P (B)i,F|I∈Ωo):
Figure GDA0001812976290000025
Wherein omegaoIs the area in the scene where the target appears, ΩbIs the background area present in the scene, dwIs the width of the feature, dhIs the height of the feature or features,
Figure GDA0001812976290000026
is the classifier parameter vector for the k-th frame,
Figure GDA0001812976290000027
is the number of non-blank pixels in the target area,
Figure GDA0001812976290000031
is the number of non-blank pixels in the background area, betakIs a weight coefficient to be optimized
Figure GDA0001812976290000032
Need to pass through
Figure GDA0001812976290000033
Updated online to accommodate apparent changes in the target over time, βtIs the weight coefficient vector after the update, η is the fusion coefficient,
Figure GDA0001812976290000034
is the weight coefficient vector of the current frame;
(2) obtaining an ASA target map
(2.1) sampling a series of interference regions in a semi-local area around a tracked target
Figure GDA0001812976290000035
Correlation filter for learning by approximately equating the following ridge regression objective function to a metric
Figure GDA0001812976290000036
Learning an anti-interference distance measure in the relevant video filtering;
Figure GDA0001812976290000037
Figure GDA0001812976290000038
wherein the content of the first and second substances,
Figure GDA0001812976290000039
xiis a matrix of samples of the sample to be sampled,
Figure GDA00018129762900000310
is the DFT of the vector x and,
Figure GDA00018129762900000311
is that
Figure GDA00018129762900000312
Row i of (1), wiIs the ith sample matrix xiThe corresponding weight of the relevant filter is used,
Figure GDA00018129762900000313
is all of wiComponent vectors, y being labels of the Gaussian type, dw′dh' width and height of the feature matrix, respectively, lambda is the regular term coefficient,
Figure GDA00018129762900000314
is the mahalanobis distance and is,
Figure GDA00018129762900000315
and is
Figure GDA00018129762900000316
(2.2) introducing an anti-interference measurement regular term into a correlation filtering objective function to obtain an anti-interference measurement regular correlation filtering model
Figure GDA00018129762900000317
Anti-interference measurement regular correlation filtering is further carried out on the target image obtained in the step (2.1) through the model, discrimination and tracking of target features are enhanced, the filtered interference items are pushed into a negative domain, and a positive space target tracking picture P (X) is obtainedi,wi||∈Ωo):
Wherein the content of the first and second substances,
Figure GDA00018129762900000318
is the k-th sub-vector in the interference rejection metric canonical correlation filter weight,
Figure GDA00018129762900000319
is the k-th sub-vector in the total sample vector,
Figure GDA00018129762900000320
is the k-th sub-vector, w, of the Gaussian-shaped tag vectoriIs the weight vector corresponding to the ith cyclic sample matrix,
Figure GDA00018129762900000321
by passing
Figure GDA00018129762900000322
The online update is obtained by the online update,
Figure GDA00018129762900000323
is to ask for
Figure GDA00018129762900000324
The inverse FFT of (a) results in tracking of the t-th frame,
Figure GDA00018129762900000325
is that
Figure GDA00018129762900000326
I is an identity matrix, λ is a regular term coefficient, η is a fusion coefficient;
Figure GDA00018129762900000327
is defined as:
Figure GDA0001812976290000041
Figure GDA0001812976290000042
Figure GDA0001812976290000043
wherein x isiIs the ith sample vector of the sample,
Figure GDA0001812976290000044
is the mth cycle sample of the kth base sample,
Figure GDA0001812976290000045
is the n-th cyclic sample of the k-th base sample, wmnIs the sample difference weight (for measuring the similarity between the samples i and j, the greater the weight, the greater the difference of the samples, the more discriminative the learned apparent features);
(4) continuously tracking video objects
Modeling through a Log function, and integrating SSA and ASA images to obtain the following target functions:
Figure GDA0001812976290000046
wherein, P (B)i,F|I∈Ωo) Representing the obtained SSA position response graph,
Figure GDA0001812976290000047
Figure GDA0001812976290000048
representing a series of NbA boolean diagram of the channels is shown,
Figure GDA0001812976290000049
showing a Boolean Filter, P (X)i,wi|I∈Ωo) Showing the target map of the ASA obtained,
Figure GDA00018129762900000410
denotes a spatially dependent operation, βiRepresenting a weight coefficient to be optimized, e(·)Denotes an exponential function, Ωo∈R2Representing the target area, o represents the object present in the sceneThe mark is that,
Figure GDA00018129762900000411
representing a series of NxBy moving a basic HOG eigen-channel vector
Figure GDA00018129762900000412
Obtaining that all characteristic channels are independently distributed),
Figure GDA00018129762900000413
represents an ASA filter;
and tracking the video target by using the target function, updating parameters on line and realizing effective tracking of the target.
The value of the regular term coefficient lambda is 0.001, and the value of the fusion coefficient eta is 0.006.
The principle of the invention is as follows:
the core of the invention is to plan the tracking problem as estimating a target position probability, seamlessly integrate SSA and ASA:
Figure GDA0001812976290000051
here omegao∈R2Representing the target area and o representing the target appearing in the scene,
Figure GDA0001812976290000052
representing a series of NbA boolean diagram of the channels is shown,
Figure GDA0001812976290000053
is a series of NxEach by shifting a basic HOG eigen-channel vector
Figure GDA0001812976290000054
The obtained material has the advantages of high yield,
Figure GDA0001812976290000055
and
Figure GDA0001812976290000056
are their corresponding filters. Furthermore, for simplicity, all feature channels are assumed to be distributed independently. Finally, a Log function is used on both sides of the formula (1) to obtain:
Figure GDA0001812976290000057
p (B) hereini,F|I∈Ωo) And P (X)i,wi||∈Ωo) Is defined as:
Figure GDA0001812976290000058
here is a spatial correlation operation, βiIs a weight coefficient to be optimized, and e(·)Is an exponential function.
In modeling SSA, the present invention first generates a series of binary maps, i.e., BMRs, describing the topology between the target and its surrounding scene at different granularities. In fig. 2, from top to bottom, a boolean diagram describes the granularity from coarse to fine, where a coarse-grained boolean diagram encodes global shape information that is robust to large target appearance variations, whereas a fine-grained boolean diagram describes spatial structural details that are effective for precise target localization. A predefined binary filter is then applied to the maps to obtain a set of conditional position response maps, each of which is weighted to obtain a final position response map, with the goal of learning an optimal weight for each Boolean map.
BMR is inspired by recent human visual attention studies and appears as a temporal perceptual awareness of a scene that can be represented by a set of boolean graphs. In particular, given
Figure GDA0001812976290000059
Is RGB of an image blockColor channel map, its corresponding
Figure GDA00018129762900000510
Is obtained from the formula
Figure GDA0001812976290000061
Here threshold value thetaiFrom one at [0, 255]The independent distribution between them (black and white binary graph) and this symbol ≧ represents the inequality sign at the element level. For simplicity, the threshold is set to θi=Nb(i-1)/255 with a fixed step size δ ═ NbThe/255 samples from 0 to 255 because fixed step sampling is fully equivalent to infinite delta → 0 uniform sampling. Thus, it is easy to prove
Figure GDA0001812976290000062
And, the jth pixel intensity I (j) can be expressed as
Figure GDA0001812976290000063
Where U (·) is a univariate function, such as U (2) ═ 1; 1; 0],U(3)=[1;1;1]There are 3 discrete layers, and R (-) represents an integer function.
Figure GDA0001812976290000064
Is an RGB color channel map of an image block,
in weight learning, the present invention learns weights by minimizing the following linear regression function:
Figure GDA0001812976290000065
here | · | non-conducting phosphorFRepresenting the F norm. It is obvious that the minimization of
Figure GDA0001812976290000066
Equivalent to minimizing the following objective function:
Figure GDA0001812976290000067
here, formula (5) has been replaced by formula (7). OmegaoAnd ΩbRespectively represent the target and background regions, and
Figure GDA0001812976290000068
by placing
Figure GDA0001812976290000069
Minimization
Figure GDA00018129762900000610
Resulting solution { beta }iCan get
Figure GDA0001812976290000071
To adapt to apparent changes in the target over time, the coefficients are updated online
Figure GDA0001812976290000072
Figure GDA0001812976290000073
Herein, the
Figure GDA0001812976290000074
The tracking result at t frames is calculated by equation (8).
In the aspect of solving the interference problem, the invention utilizes the principle that ASA in human visual perception emphasizes the learning of apparent characteristics, and pushes the interference item into a negative space by learning an anti-interference distance measurement, so that the discrimination capability of the characteristics is enhanced, the robust tracking aiming at the interference item is generated, and the target can be well distinguished from the interference item. Learning the correlation filtering is approximated as learning a distance metric to solve the correlation between modeling positive samples, then learning the anti-interference distance metric in the correlation filtering, while considering the useful correlation from the true negative samples.
In distance metric learning, the learning CF is represented as a spatial ridge regression objective function:
Figure GDA0001812976290000075
herein, the
Figure GDA0001812976290000076
Is a target of the gaussian regression,
Figure GDA0001812976290000077
and λ is a regular term coefficient. It is noted that if
Figure GDA0001812976290000078
Is remolded to
Figure GDA0001812976290000079
For any a ≠ 0, then, equation (10) can be re-formulated as
Figure GDA00018129762900000710
Except for reshaping y at a 1/a ratio, it is equivalent to equation (10) and, due to the same maximum response position, this will yield the same tracking result.
Based on this, in order to clearly show the relationship between the correlation filter learning and the metric learning, the setting in equation (10) is
Figure GDA00018129762900000711
And remoulds
Figure GDA00018129762900000712
This is equivalent to adding constraints therein
Figure GDA00018129762900000713
Then, using the mark
Figure GDA00018129762900000714
To represent
Figure GDA00018129762900000715
And then, the data item in the rewrite equation (10) is:
Figure GDA0001812976290000081
herein, the
Figure GDA0001812976290000082
Is the mahalanobis distance and is,
Figure GDA0001812976290000083
and is
Figure GDA0001812976290000084
Is a vector of all ones. Therefore, learning the correlation filter can be roughly regarded as learning an optimal distance metric.
However, only the relationship between the positive samples is considered in equation (11), and thus its discrimination ability to distinguish the object from the background is limited. To solve this problem, an interference rejection metric regularization term is added to equation (10), which consists of the relationship of the negative space and acts as a force to push the interference term to the negative space.
In performing the interference rejection metric canonical correlation filtering, a series of interference regions are first sampled from a semi-local region around the target
Figure GDA0001812976290000085
And then modeling the interaction between them as
Figure GDA0001812976290000086
And is integrated into equation (10) as a regularization term:
Figure GDA0001812976290000087
Figure GDA0001812976290000088
where γ is a regular term coefficient, and wmnIs a weight that measures the similarity between samples i and j. The greater the weight, the greater the sample variability, making the learned appearance more discriminative.
Equation (12) can be re-formulated as:
Figure GDA0001812976290000089
herein, the
Figure GDA0001812976290000091
And is
Figure GDA0001812976290000092
This is
Figure GDA0001812976290000093
Can be passed through
Figure GDA0001812976290000094
Obtaining:
Figure GDA0001812976290000095
herein, the
Figure GDA0001812976290000096
Is a block matrix having Nx×NxBlock with a plurality of grooves
Figure GDA0001812976290000097
Herein, the
Figure GDA0001812976290000098
And is
Figure GDA0001812976290000099
Because the circulant matrix x satisfies
Figure GDA00018129762900000910
Where F denotes a Discrete Fourier Transform (DFT) matrix,
Figure GDA00018129762900000911
DFT representing a reference vector x, and FH=(F)TRepresenting a conjugate transpose. Using this modeling, equation (15) can be diagonalized into
Figure GDA00018129762900000912
Here, the
Figure GDA00018129762900000913
And is
Figure GDA00018129762900000914
Alternatively, substitution (16) in (14) whose right term can be re-planned
Figure GDA00018129762900000915
FFT that obtains its solution by substituting equations (17) and (18) into equation (14)
Figure GDA00018129762900000916
Herein, the
Figure GDA00018129762900000917
Its ith element is
Figure GDA00018129762900000918
The k element of (2), and
Figure GDA00018129762900000919
similar to the formula (9),
Figure GDA00018129762900000920
obtained by on-line updating
Figure GDA00018129762900000921
Herein, the
Figure GDA00018129762900000922
It is calculated by equation (19) using the tracking result of the t-th frame.
Figure GDA00018129762900000923
Is defined as
Figure GDA00018129762900000924
Herein, the
Figure GDA00018129762900000925
Because of the fact that
Figure GDA00018129762900000926
Is the number of rows of
Figure GDA00018129762900000927
It is the column number, directly calculated in the formula (19)
Figure GDA0001812976290000101
The inverse of (c) is not very practical. Instead, we use transformations
Figure GDA0001812976290000102
To calculate
Figure GDA0001812976290000103
The inverse of (c). In obtaining all
Figure GDA0001812976290000104
It can then be computed in parallel, the optimal solution of equation (14)
Figure GDA0001812976290000105
Can be obtained by
Figure GDA0001812976290000106
The inverse FFT of (a).
The invention provides a related filtering tracking algorithm based on human visual perception, reflects SSA and ASA mechanisms in the human visual perception, and enhances the robustness and anti-interference of target tracking by processing a local and semi-local background domain in parallel. For the local domain, to model SSA, a simple but efficient BMR is introduced into the correlation filtering learning to delineate the local topology of the object and its scene by randomly binarizing the image color channels, which is invariant to the various transformations. For the semi-local domain, in order to model ASA, an interference rejection metric regularization term is introduced into the objective function of the correlation filtering, which is used as a force for pushing the interference term into the negative domain, so that the tracking robustness is enhanced when the challenging target similar object interference term is encountered. The method has the advantages of more stability and accuracy in problem processing, strong adaptability, good tracking effect and the like, and can realize continuous and effective tracking of the video target.
Drawings
Fig. 1 is a schematic diagram of the present invention.
FIG. 2 is a flow chart of the present invention for modeling SSA.
Detailed Description
The following detailed description of the embodiments of the present invention will be described in detail with reference to the accompanying drawings, wherein the technology and products not shown in the embodiments are all conventional products available in the art or commercially.
Example 1: as shown in fig. 1 and 2, the video target tracking method based on parallel attention-dependent filtering is to design a tracking problem as a probability of estimating a target position, integrate Spatial Selective Attention (SSA) and Apparent Selective Attention (ASA), and obtain a target function by using a Log function to realize continuous and effective tracking of a video target, and includes the following steps:
(1) acquiring an SSA position response diagram: firstly, aiming at a tracking target, a series of binary images are generated in a local area around the tracking target to describe the topological structure between the target and the surrounding scene under different granularities, and the images are arranged from top to bottom according to the describing granularity from coarse to fine to obtain a group of Boolean images B of the tracking targeti(i=1,2,......,Nb) The coarse-grained Boolean graph encodes and describes the apparent change of an obvious target for the global shape information, and the fine-grained Boolean graph describes the detailed structure of a space; then, a binary filter F is defined for the tracking target, and F is acted on a Boolean graph BiObtaining a conditional position response graph, completing learning weight by minimizing a linear regression function, learning an optimal weight for each Boolean graph, and weighting each graph to obtain a final position response graph:
(2) obtaining an ASA target map: firstly, sampling a series of interference regions in a semi-local region around a tracking target, approximately equating a ridge regression target function to be a measurement learning related filter, learning anti-interference distance measurement in related video filtering, and solving the mutual relation between modeling positive samples; then, introducing an anti-interference measurement regular term, performing anti-interference measurement regular correlation filtering on the target image, learning anti-interference distance measurement in the correlation filtering, meanwhile, considering useful correlation from a real negative sample, pushing the interference term into a negative domain, and acquiring a target tracking picture:
(3) continuously tracking the video target: and obtaining an objective function integrating SSA and ASA through Log function modeling, tracking the video target by using the function, and updating parameters on line to realize effective tracking of the video target.
The video target tracking method based on the parallel attention correlation filtering comprises the following specific steps:
(1) obtaining SSA location response graph
(1.1) for a tracked target, in a local region around the tracked target, generating a series of binary maps to describe the topology between the target and its surrounding scene at different granularities by:
Figure GDA0001812976290000111
wherein I (j) represents the jth pixel intensity, U (-) is a univariate function, R (-) represents an integer function,
Figure GDA0001812976290000112
is an RGB color channel map of an image block, T denotes transpose;
arranging the pictures from top to bottom according to coarse-fine description granularity to obtain a group of tracking target Boolean graphs Bi(i=1,2,......,Nb) The coarse-grained Boolean graph encodes the global shape information to describe obvious target appearance change, and the fine-grained Boolean graph describes the detailed structure of the space
(1.2) weight learning: conventionally, a binary filter is defined for the tracked object
Figure GDA0001812976290000113
Acting F on the tracking target Boolean graph B obtained in the step (1.1)iIn the above, a set of conditional position response maps is obtained, and learning weights is accomplished by minimizing a linear regression function as follows, and an optimal weight is learned for each Boolean map
Figure GDA0001812976290000114
Weighting each map to obtain a final position response map P (B)i,F|I∈Ωo):
Figure GDA0001812976290000115
Wherein omegaoIs the area in the scene where the target appears, ΩbIs the background area present in the scene, dwIs the width of the feature, dhIs the height of the feature or features,
Figure GDA0001812976290000116
is the classifier parameter vector for the k-th frame,
Figure GDA0001812976290000117
is the number of non-blank pixels in the target area,
Figure GDA0001812976290000118
is the number of non-blank pixels in the background area, betakIs a weight coefficient to be optimized
Figure GDA0001812976290000119
Need to pass through
Figure GDA00018129762900001110
Updated online to accommodate apparent changes in the target over time, βtIs the weight coefficient vector after the update, η is the fusion coefficient,
Figure GDA00018129762900001111
is the weight coefficient vector of the current frame;
(2) obtaining an ASA target map
(2.1) sampling a series of interference regions in a semi-local area around a tracked target
Figure GDA00018129762900001112
Approximately equating the following ridge regression objective function to one degreeCorrelation filter for quantity learning
Figure GDA00018129762900001113
Learning an anti-interference distance measure in the relevant video filtering;
Figure GDA0001812976290000121
Figure GDA0001812976290000122
wherein the content of the first and second substances,
Figure GDA0001812976290000123
xithe matrix of samples is then used to determine,
Figure GDA0001812976290000124
is the DFT of the vector x and,
Figure GDA0001812976290000125
is that
Figure GDA0001812976290000126
Row i of (1), wiIs the ith sample matrix XiThe corresponding weight of the relevant filter is used,
Figure GDA0001812976290000127
is all of wiComponent vectors, y being labels of the Gaussian type, dw′dh' is the width and height of the feature matrix, λ is the regular term coefficient,
Figure GDA0001812976290000128
is the mahalanobis distance and is,
Figure GDA0001812976290000129
and is
Figure GDA00018129762900001210
(2.2) introducing an anti-interference measurement regular term into a correlation filtering objective function to obtain an anti-interference measurement regular correlation filtering model
Figure GDA00018129762900001211
Anti-interference measurement regular correlation filtering is further carried out on the target image obtained in the step (2.1) through the model, discrimination and tracking of target features are enhanced, the filtered interference items are pushed into a negative domain, and a positive space target tracking picture P (X) is obtainedi,wi|I∈Ωo):
Wherein the content of the first and second substances,
Figure GDA00018129762900001212
is the kth sub-vector in the interference rejection metric canonical correlation filter model,
Figure GDA00018129762900001213
is the k-th sub-vector in the total sample vector,
Figure GDA00018129762900001214
is the k-th sub-vector, w, of the Gaussian-shaped tag vectoriIs the weight vector corresponding to the ith cyclic sample matrix,
Figure GDA00018129762900001215
by passing
Figure GDA00018129762900001216
The result of the online update is that,
Figure GDA00018129762900001217
is to ask for
Figure GDA00018129762900001218
The inverse FFT of (a) results in tracking of the t-th frame,
Figure GDA00018129762900001219
is that
Figure GDA00018129762900001220
I is an identity matrix, λ is a regular term coefficient, η is a fusion coefficient;
Figure GDA00018129762900001221
is defined as:
Figure GDA00018129762900001222
Figure GDA00018129762900001223
Figure GDA00018129762900001224
wherein, XiIs the ith sample vector of the sample,
Figure GDA00018129762900001225
is the mth cycle sample of the kth base sample,
Figure GDA00018129762900001226
is the n-th cyclic sample of the k-th base sample, wmnIs the sample difference weight (for measuring the similarity between the samples i and j, the greater the weight, the greater the difference of the samples, the more discriminative the learned apparent features);
(4) continuously tracking video objects
Modeling through a Log function, and integrating SSA and ASA images to obtain the following target functions:
Figure GDA0001812976290000131
wherein, P (B)i,F|I∈Ωo) Representing the obtained SSA position response graph,
Figure GDA0001812976290000132
Figure GDA0001812976290000133
representing a series of NbA boolean diagram of the channels is shown,
Figure GDA0001812976290000134
showing a Boolean Filter, P (X)i,wi|I∈Ωo) Showing the target map of the ASA obtained,
Figure GDA0001812976290000135
denotes a spatially dependent operation, βiRepresenting a weight coefficient to be optimized, e(·)Denotes an exponential function, Ωo∈R2Representing a target area, o representing a target appearing in the scene,
Figure GDA0001812976290000136
representing a series of NxBy moving a basic HOG eigen-channel vector
Figure GDA0001812976290000137
Obtaining that all characteristic channels are independently distributed),
Figure GDA0001812976290000138
represents an ASA filter;
and tracking the video target by using the target function, updating parameters on line and realizing effective tracking of the target.
In this example, the regularization term coefficient λ is 0.001, and the fusion coefficient η is 0.3.
Example 2: as shown in fig. 1 and 2, the video target tracking method based on parallel attention-dependent filtering is to design a tracking problem as a probability of estimating a target position, integrate a spatial selective attention SSA and an apparent selective attention ASA, and obtain a target function by using a Log function to realize continuous and effective tracking of a video target, and includes the following steps:
(1) acquiring an SSA position response diagram:firstly, a series of binary images are generated aiming at a tracking target to describe the topological structure between the target and the surrounding scene thereof under different granularities, and the images are arranged from top to bottom according to the describing granularity from coarse to fine to obtain a group of Boolean images B of the tracking targetiThe coarse-grained Boolean graph encodes and describes the apparent change of an obvious target for the global shape information, and the fine-grained Boolean graph describes the detailed structure of a space; then, a binary filter F is defined for the tracking target, and F is acted on a Boolean graph BiObtaining a conditional position response graph, completing learning weight by minimizing a linear regression function, learning an optimal weight for each Boolean graph, and weighting each graph to obtain a final position response graph:
(2) obtaining an ASA target map: firstly, sampling a series of interference regions in a semi-local region around a tracking target, approximately equating a ridge regression target function to be a measurement learning related filter, learning anti-interference distance measurement in related video filtering, and solving the mutual relation between modeling positive samples; then introducing an anti-interference measurement regular term, performing anti-interference measurement regular correlation filtering on the target image, learning anti-interference distance measurement in the correlation filtering, meanwhile, considering useful correlation from a real negative sample, pushing the interference term into a negative domain, and acquiring a target tracking picture:
(3) continuously tracking the video target: and obtaining an objective function integrating SSA and ASA through Log function modeling, tracking the video target by using the function, and updating parameters on line to realize effective tracking of the video target.
In this example, the procedure is the same as in example 1, where the regularization term coefficient λ is 0.001 and the fusion coefficient η is 0.3.
While the present invention has been described with reference to the accompanying drawings, it is to be understood that the invention is not limited thereto, and that various changes and modifications may be made without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (3)

1. A video target tracking method based on parallel attention-dependent filtering is characterized in that: the method comprises the following steps of designing a tracking problem as a probability of estimating a target position, integrating space selection attention SSA and apparent selection attention ASA, obtaining a target function by using a Log function, and realizing continuous and effective tracking of a video target, wherein the tracking problem comprises the following steps:
(1) acquiring an SSA position response diagram: firstly, aiming at a tracking target, a series of binary images are generated in a local area around the tracking target to describe the topological structure between the target and the surrounding scene under different granularities, and the images are arranged from top to bottom according to the describing granularity from coarse to fine to obtain a group of Boolean images of the tracking target
Figure 985336DEST_PATH_IMAGE001
i=1,2,......,
Figure 990201DEST_PATH_IMAGE002
The coarse-grained Boolean graph encodes and describes the apparent change of an obvious target for the global shape information, and the fine-grained Boolean graph describes the detailed structure of a space; then, a binary filter is defined for the tracking target
Figure 428136DEST_PATH_IMAGE003
Will be
Figure 661802DEST_PATH_IMAGE003
Acting on Boolean diagrams
Figure 134372DEST_PATH_IMAGE001
Obtaining a conditional position response graph, completing learning weight by minimizing a linear regression function, learning an optimal weight for each Boolean graph, and weighting each graph to obtain a final position response graph:
(2) obtaining an ASA target map: firstly, sampling a series of interference regions in a semi-local region around a tracking target, approximately equating a ridge regression target function to be a measurement learning related filter, and learning anti-interference distance measurement in related video filtering; then, introducing an anti-interference measurement regular term, performing anti-interference measurement regular correlation filtering on the target image, pushing the interference term into a negative domain, and acquiring a target tracking picture:
(3) continuously tracking the video target: and obtaining an objective function integrating SSA and ASA through Log function modeling, tracking the video target by using the function, and updating parameters on line to realize effective tracking of the video target.
2. The method for tracking video target based on parallel attention-dependent filtering according to claim 1, wherein: the video target tracking method comprises the following specific steps:
(1) obtaining SSA location response graph
(1.1) for a tracked target, in a local region around the tracked target, generating a series of binary maps to describe the topology between the target and its surrounding scene at different granularities by:
Figure 44559DEST_PATH_IMAGE004
wherein the content of the first and second substances,
Figure 782839DEST_PATH_IMAGE005
is shown asjThe intensity of each of the pixels is determined,
Figure 7147DEST_PATH_IMAGE006
is a function of a single element, and is,
Figure 396540DEST_PATH_IMAGE007
a function of rounding is represented which is,
Figure 493940DEST_PATH_IMAGE008
is an RGB color channel map of an image block, T denotes transpose;
arranging the pictures from top to bottom according to coarse-fine description granularity to obtain a group of tracking target Boolean graphs
Figure 906467DEST_PATH_IMAGE001
i=1,2,......,
Figure 731203DEST_PATH_IMAGE002
The coarse-grained Boolean graph encodes and describes the apparent change of an obvious target for the global shape information, and the fine-grained Boolean graph describes the detailed structure of a space;
(1.2) weight learning: defining a binary filter for a tracked object
Figure 912786DEST_PATH_IMAGE009
Will be
Figure 181087DEST_PATH_IMAGE003
Acting on the Boolean graph of the tracked target obtained in step (1.1)
Figure 143227DEST_PATH_IMAGE001
In the above, a set of conditional position response maps is obtained, and learning weights is accomplished by minimizing a linear regression function as follows, and an optimal weight is learned for each Boolean map
Figure 709338DEST_PATH_IMAGE010
Weighting each graph to obtain a set of final position response graphs
Figure 558476DEST_PATH_IMAGE011
Figure 184630DEST_PATH_IMAGE012
Wherein the content of the first and second substances,
Figure 634065DEST_PATH_IMAGE013
is the area in the scene where the object is present,
Figure 551337DEST_PATH_IMAGE014
is the background area that appears in the scene,
Figure 707512DEST_PATH_IMAGE015
is the width of the feature that is,
Figure 566883DEST_PATH_IMAGE016
is the height of the feature or features,
Figure 175719DEST_PATH_IMAGE010
is the firstkThe vector of classifier parameters for the frame,
Figure 162261DEST_PATH_IMAGE017
is the number of non-blank pixels in the target area,
Figure 235259DEST_PATH_IMAGE018
is the number of non-blank pixels in the background area,
Figure 203215DEST_PATH_IMAGE019
is a weight coefficient to be optimized
Figure 112397DEST_PATH_IMAGE020
By passing
Figure 558421DEST_PATH_IMAGE021
The updating is carried out on line, and the updating is carried out,
Figure 751505DEST_PATH_IMAGE022
is the vector of weight coefficients after the update,
Figure 703412DEST_PATH_IMAGE023
is the fusion coefficient of the image to be fused,
Figure 286840DEST_PATH_IMAGE024
is the weight coefficient vector of the current frame;
(2) obtaining an ASA target map
(2.1) sampling a series of interference regions in a semi-local area around a tracked target
Figure 598873DEST_PATH_IMAGE025
Correlation filter for learning by approximately equating the following ridge regression objective function to a metric
Figure 584146DEST_PATH_IMAGE026
Learning an anti-interference distance measure in the relevant video filtering;
Figure 706954DEST_PATH_IMAGE027
wherein the content of the first and second substances,
Figure 839995DEST_PATH_IMAGE028
Figure 893402DEST_PATH_IMAGE029
is a matrix of samples of the sample to be sampled,
Figure 546231DEST_PATH_IMAGE030
is that
Figure 89208DEST_PATH_IMAGE031
The DFT of the vector is a function of,
Figure 381649DEST_PATH_IMAGE032
is that
Figure 51796DEST_PATH_IMAGE033
The number of the ith row of (a),
Figure 808399DEST_PATH_IMAGE034
is the firstiA sample matrix
Figure 459961DEST_PATH_IMAGE029
The corresponding weight of the relevant filter is used,
Figure 318326DEST_PATH_IMAGE035
is all that
Figure 713536DEST_PATH_IMAGE036
The vector of the composition is then calculated,
Figure 324645DEST_PATH_IMAGE037
is a tag of the gaussian type and,
Figure 237455DEST_PATH_IMAGE038
respectively the width and the height of the feature matrix,
Figure 504489DEST_PATH_IMAGE039
is a coefficient of a regular term that,
Figure 500126DEST_PATH_IMAGE040
is the mahalanobis distance and is,
Figure 982054DEST_PATH_IMAGE041
and is
Figure 303314DEST_PATH_IMAGE042
(2.2) introducing an anti-interference measurement regular term into a correlation filtering objective function to obtain an anti-interference measurement regular correlation filtering model
Figure 792064DEST_PATH_IMAGE043
Anti-interference measurement regular correlation filtering is further carried out on the target image obtained in the step (2.1) through the model, discrimination and tracking of target features are enhanced, the filtered interference items are pushed into a negative domain, and a positive space target tracking picture is obtained
Figure 607705DEST_PATH_IMAGE044
Wherein the content of the first and second substances,
Figure DEST_PATH_IMAGE045
is the first of the regular correlation filtering weights of the interference rejection metrickThe number of sub-vectors,
Figure 193407DEST_PATH_IMAGE046
is the first in the total sample vectorkThe number of sub-vectors,
Figure DEST_PATH_IMAGE047
is the first in a Gaussian label vectorkThe number of sub-vectors,
Figure 170721DEST_PATH_IMAGE048
is the firstiThe weight vector corresponding to each cyclic sample matrix,
Figure DEST_PATH_IMAGE049
by passing
Figure 474664DEST_PATH_IMAGE050
The online update is obtained by the online update,
Figure 828416DEST_PATH_IMAGE049
is to ask for
Figure DEST_PATH_IMAGE051
The inverse FFT of (a) results in tracking of the t-th frame,
Figure 268624DEST_PATH_IMAGE052
is that
Figure DEST_PATH_IMAGE053
The conjugate transpose of (a) is performed,
Figure 744736DEST_PATH_IMAGE054
is a matrix of units, and is,
Figure DEST_PATH_IMAGE055
is a coefficient of a regular term that,
Figure 21128DEST_PATH_IMAGE056
is the fusion coefficient;
Figure DEST_PATH_IMAGE057
is defined as:
Figure 427839DEST_PATH_IMAGE058
wherein the content of the first and second substances,
Figure DEST_PATH_IMAGE059
is the firstiA vector of one sample of the plurality of samples,
Figure 738865DEST_PATH_IMAGE060
is the firstkA first of the basic samplesmThe number of samples in a cycle is,
Figure DEST_PATH_IMAGE061
is the firstkA first of the basic samplesnThe number of samples in a cycle is,
Figure 120299DEST_PATH_IMAGE062
is the sample difference weight;
(4) continuously tracking video objects
Modeling through a Log function, and integrating SSA and ASA images to obtain the following target functions:
Figure DEST_PATH_IMAGE063
wherein the content of the first and second substances,
Figure 398834DEST_PATH_IMAGE064
representing the obtained SSA position response graph,
Figure DEST_PATH_IMAGE065
Figure 94388DEST_PATH_IMAGE066
representing a series of
Figure DEST_PATH_IMAGE067
A boolean diagram of the channels is shown,
Figure 509189DEST_PATH_IMAGE068
a boolean diagram filter is shown that is,
Figure DEST_PATH_IMAGE069
showing the target map of the ASA obtained,
Figure 999208DEST_PATH_IMAGE070
a spatial correlation operation is represented that is,
Figure DEST_PATH_IMAGE071
represents a weight coefficient to be optimized,
Figure 312508DEST_PATH_IMAGE072
the function of the index is expressed in terms of,
Figure DEST_PATH_IMAGE073
a target area is represented by a number of images,
Figure 61022DEST_PATH_IMAGE074
an object that is present in the scene is represented,
Figure DEST_PATH_IMAGE075
representing a series of
Figure 81061DEST_PATH_IMAGE076
Each by shifting a basic HOG eigen-channel vector
Figure DEST_PATH_IMAGE077
All the characteristic channels are independently distributed,
Figure 256828DEST_PATH_IMAGE078
represents an ASA filter;
and tracking the video target by using the target function, updating parameters on line and realizing effective tracking of the target.
3. The method for tracking video target based on parallel attention-dependent filtering according to claim 2, wherein: the value of the regular term coefficient is 0.001, and the value of the fusion coefficient is 0.3.
CN201810647331.2A 2018-06-22 2018-06-22 Video target tracking method based on parallel attention-dependent filtering Active CN109102521B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810647331.2A CN109102521B (en) 2018-06-22 2018-06-22 Video target tracking method based on parallel attention-dependent filtering

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810647331.2A CN109102521B (en) 2018-06-22 2018-06-22 Video target tracking method based on parallel attention-dependent filtering

Publications (2)

Publication Number Publication Date
CN109102521A CN109102521A (en) 2018-12-28
CN109102521B true CN109102521B (en) 2021-08-27

Family

ID=64844863

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810647331.2A Active CN109102521B (en) 2018-06-22 2018-06-22 Video target tracking method based on parallel attention-dependent filtering

Country Status (1)

Country Link
CN (1) CN109102521B (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109919114A (en) * 2019-03-14 2019-06-21 浙江大学 One kind is based on the decoded video presentation method of complementary attention mechanism cyclic convolution
CN109993777B (en) * 2019-04-04 2021-06-29 杭州电子科技大学 Target tracking method and system based on dual-template adaptive threshold
CN110102050B (en) 2019-04-30 2022-02-18 腾讯科技(深圳)有限公司 Virtual object display method and device, electronic equipment and storage medium
CN110335290B (en) * 2019-06-04 2021-02-26 大连理工大学 Twin candidate region generation network target tracking method based on attention mechanism
CN110443852B (en) * 2019-08-07 2022-03-01 腾讯科技(深圳)有限公司 Image positioning method and related device
CN111428771B (en) * 2019-11-08 2023-04-18 腾讯科技(深圳)有限公司 Video scene classification method and device and computer-readable storage medium
CN113704684B (en) * 2021-07-27 2023-08-29 浙江工商大学 Centralized fusion robust filtering method
CN113808171A (en) * 2021-09-27 2021-12-17 山东工商学院 Unmanned aerial vehicle visual tracking method based on dynamic feature selection of feature weight pool

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105809713A (en) * 2016-03-03 2016-07-27 南京信息工程大学 Object tracing method based on online Fisher discrimination mechanism to enhance characteristic selection

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105809713A (en) * 2016-03-03 2016-07-27 南京信息工程大学 Object tracing method based on online Fisher discrimination mechanism to enhance characteristic selection

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Visual Tracking via Nonlocal Similarity Learning;Qingshan Liu,etc;《IEEE Transaction on circuits and systems for video technology》;20170526;第28卷(第10期);第2826页至2835页 *
Visual Tracking With Weighted Adaptive Local Sparse Appearance Model via Spatio-Temporal Context Learning;Zhetao Li,etc;《IEEE Transcation on image processing》;20180524;第27卷(第9期);第4478页至第4489页 *
通道稳定性加权补充学习的实时视觉跟踪算法;樊佳庆等;《计算机应用》;20180610;第38卷(第6期);第1751页至1754页 *

Also Published As

Publication number Publication date
CN109102521A (en) 2018-12-28

Similar Documents

Publication Publication Date Title
CN109102521B (en) Video target tracking method based on parallel attention-dependent filtering
Baldwin et al. Time-ordered recent event (TORE) volumes for event cameras
Xiong et al. Spatiotemporal modeling for crowd counting in videos
Babu Sam et al. Switching convolutional neural network for crowd counting
Zhang et al. Single-image crowd counting via multi-column convolutional neural network
CN111814661B (en) Human body behavior recognition method based on residual error-circulating neural network
Smith et al. Tracking the visual focus of attention for a varying number of wandering people
Hsueh et al. Human behavior recognition from multiview videos
CN107590427B (en) Method for detecting abnormal events of surveillance video based on space-time interest point noise reduction
Jeong et al. Stereo saliency map considering affective factors and selective motion analysis in a dynamic environment
Hong et al. Geodesic regression on the Grassmannian
Elayaperumal et al. Robust visual object tracking using context-based spatial variation via multi-feature fusion
Luotamo et al. Multiscale cloud detection in remote sensing images using a dual convolutional neural network
Cao et al. Learning spatial-temporal representation for smoke vehicle detection
Medouakh et al. Improved object tracking via joint color-LPQ texture histogram based mean shift algorithm
CN116740418A (en) Target detection method based on graph reconstruction network
CN113313055A (en) Non-overlapping vision field cross-camera network pedestrian re-identification method
Gori et al. Semantic video labeling by developmental visual agents
Javed et al. Deep bidirectional correlation filters for visual object tracking
Lei et al. Convolutional restricted Boltzmann machines learning for robust visual tracking
Ranganatha et al. SELECTED SINGLE FACE TRACKING IN TECHNICALLY CHALLENGING DIFFERENT BACKGROUND VIDEO SEQUENCES USING COMBINED FEATURES.
US20050259865A1 (en) Object classification via time-varying information inherent in imagery
Meglouli et al. A new technique based on 3D convolutional neural networks and filtering optical flow maps for action classification in infrared video
Priya et al. A NOVEL METHOD FOR OBJECT DETECTION IN AUTONOMOUS DRIVING SYSTEM USING CSPResNeXt AND YOLO-V4.
Medasani et al. Active learning system for object fingerprinting

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant