CN104537686A - Tracing method and device based on target space and time consistency and local sparse representation - Google Patents

Tracing method and device based on target space and time consistency and local sparse representation Download PDF

Info

Publication number
CN104537686A
CN104537686A CN201410770556.9A CN201410770556A CN104537686A CN 104537686 A CN104537686 A CN 104537686A CN 201410770556 A CN201410770556 A CN 201410770556A CN 104537686 A CN104537686 A CN 104537686A
Authority
CN
China
Prior art keywords
mrow
image blocks
candidate
sample
msub
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201410770556.9A
Other languages
Chinese (zh)
Other versions
CN104537686B (en
Inventor
张文生
杨叶辉
谢源
胡文锐
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Automation of Chinese Academy of Science
Original Assignee
Institute of Automation of Chinese Academy of Science
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Automation of Chinese Academy of Science filed Critical Institute of Automation of Chinese Academy of Science
Priority to CN201410770556.9A priority Critical patent/CN104537686B/en
Publication of CN104537686A publication Critical patent/CN104537686A/en
Application granted granted Critical
Publication of CN104537686B publication Critical patent/CN104537686B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • G06T7/251Analysis of motion using feature-based methods, e.g. the tracking of corners or segments involving models

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a tracing method and device based on target space and time consistency and local sparse representation. The method includes the steps that first, a positive sample set and a candidate sample set are acquired according to a known tracing result; second, samples are blocked through fixing grids and provided with marks; third, sparse coding is performed on local sample image blocks, and the average value of codes of the positive sample image blocks with the same marks is calculated; fourth, target space and time consistency measurement is defined, and all the candidate samples are graded according to the measurement; fifth, candidate particles with the highest space and time consistency grade are selected from the candidate sample set and serve as tracing targets. According to the method and device, by calculating the space and time consistency of the samples, the shielding problem is effectively solved, and tracing accuracy is improved. Besides, through local image block sparse representation, influences of outside illumination changes and target posture changes on tracing are eliminated, and target tracing efficiency is also improved.

Description

Tracking method and device based on target space-time consistency and local sparse representation
Technical Field
The invention belongs to the technical field of image target tracking, and relates to a tracking method and a tracking device based on target space-time consistency and local sparse representation.
Background
The target representation problem is a first problem to be considered by the tracking method. At present, the representation of the target can be broadly classified into the following methods: and (4) pixel level representation. Objects are represented using image pixel-level features, such as directly taking the object pixel values as input, or based on a representation of the object color, texture, etc. ② artificial characteristic level representation. And (3) extracting features of the target by using some artificially designed feature operators (such as SIFT features, HOG features, Haar-like features and the like) to represent the target. -representation based on a descriptive model. Such as a subspace representation, a sparse representation, etc.
However, the representation at the pixel level is redundant, and it is straightforward to represent the object by pixels [ Duffner, s., Garcia, c.: pixeltrack: a fast adaptive algorithm for tracking non-edge objects in: ICCV.2480-2487, 2013, not only the identification of the target is difficult to guarantee, but also the calculated amount is correspondingly large. The description model using the artificial features has specific requirements on the characteristics of the target, that is, different targets are suitable for different feature descriptors, and the recognition effect is reduced by adopting the same feature representation for different targets. Therefore, the artificial feature level representation model is relatively dependent on subjective selection of people, so that the generalization and the robustness are poor. The descriptive model can better overcome the problems set forth above compared to the first two representation methods. However, the subspace expression [ d.ross, j.lim, r.s.lin, and m.h.yang, incorporated learning for robust visual tracking, in proc.int.j.com.vis. (IJCV), (2008) 125-. Research has found that the above problem can be solved relatively conveniently by sparsely representing the target.
L1T[X.Mei and H.Ling,Robust visual tracking using l 1minimization,in:ICCV,1436-1443,2009.]Is a sparse representation-based method that is currently popular and reconstructs the occlusion by adding detail templates (triviral templates) to the coding dictionary. But the method encodes the whole object, and only utilizes the global information of the object and does not utilize the local detail information of the image. And because the dimension of the original target is higher, the time cost is very large for the whole target to be coded.
Disclosure of Invention
Aiming at the problems, the invention provides a tracking method based on target space-time consistency and local sparse representation. The algorithm defines the assumption of space-time consistency of the target between two continuous frames according to the change characteristics of the target in the tracking process: (1) temporal consistency. The object does not change much in two consecutive frames, so in two consecutive frames-the object is considered to be consistent in appearance; (2) and (5) spatial consistency. Since the object is similar in two consecutive frames, the position of each portion of the object is substantially fixed in the two frames. For example, the samples are divided into image blocks that do not overlap with each other, and in a sample, once an image block is determined to be a target image block, if several adjacent image blocks of the image block are also determined to be target blocks, the sample is a target with high confidence. According to the characteristics of the space-time consistency defined above, the invention provides the measure for calculating the score of the space-time consistency of the sample, and solves the problem of target occlusion.
The invention solves the problems of illumination, target attitude change and the like by utilizing sparse coding of the local image blocks of the sample, solves the problem of local shielding by utilizing the partitioning of the target and defining the space-time consistency of the target image block on the basis, and improves the coding speed by only carrying out sparse coding on the image blocks with relatively small dimensions.
The invention provides a tracking method based on target space-time consistency and local sparse representation, which comprises the following steps:
the method comprises the following steps: acquiring a positive sample set for the current frame by Gaussian disturbance according to the position of the tracking result of the previous frame;
step two: dividing each positive sample in the positive sample set into image blocks and marking, and learning a dictionary by using the image blocks obtained by division;
step three: carrying out sparse coding on image blocks obtained by dividing each positive sample in the positive sample set by using the learning dictionary, and calculating average vectors of sparse coding of the image blocks with the same marks;
step four: acquiring a candidate sample set of a current frame from a tracking result of a previous frame by using a motion model with affine transformation state change, and segmenting each candidate sample in the candidate sample set into image blocks and marking the image blocks;
step five: calculating a space-time consistency score of each candidate sample according to the average vector of the image block sparse codes corresponding to the image blocks with the same mark in the positive sample set and the image block sparse code of each candidate sample;
step six: and determining the tracking result of the current frame according to the space-time consistency score of each candidate sample in the candidate sample set based on Bayesian inference.
The invention also provides a tracking device based on the space-time consistency and the local sparse representation of the target, which comprises the following components:
a positive sample collection module: acquiring a positive sample set for the current frame by Gaussian disturbance according to the position of the tracking result of the previous frame;
a segmentation and labeling module: dividing each positive sample in the positive sample set into image blocks and marking, and learning a dictionary by using the image blocks obtained by division;
a sparse coding module: carrying out sparse coding on image blocks obtained by dividing each positive sample in the positive sample set by using the learning dictionary, and calculating average vectors of sparse coding of the image blocks with the same marks;
a candidate sample segmentation module: acquiring a candidate sample set of a current frame from a tracking result of a previous frame by using a motion model with affine transformation state change, and segmenting each candidate sample in the candidate sample set into image blocks and marking the image blocks;
a scoring module: calculating a space-time consistency score of each candidate sample according to the average vector of the image block sparse codes corresponding to the image blocks with the same mark in the positive sample set and the image block sparse code of each candidate sample;
a tracking module: and determining the tracking result of the current frame according to the space-time consistency score of each candidate sample in the candidate sample set based on Bayesian inference.
According to the invention, the tracking method is more robust to illumination, target attitude change and the like through sparse coding, and the method can effectively solve the problem of shielding by partitioning the target and defining the space-time consistency of target image blocks. Since the speed of sparse coding is greatly influenced by the dimension of the target to be coded, the method for carrying out sparse coding on the local image block with relatively small dimension is faster than a tracking method for coding the whole image.
Drawings
FIG. 1 is a schematic diagram of the sampling area and unified sample size in the present invention;
FIG. 2 is a schematic diagram of sample blocking and labeling according to the present invention;
FIG. 3 is a schematic diagram of the steps of calculating a sample spatiotemporal consistency score according to the present invention;
FIG. 4 is a flow chart of a complete tracking method of the present invention;
FIG. 5 is a diagram illustrating the tracking of key frames on a public data set according to the present invention.
Detailed Description
In order to make the technical solution, implementation steps and tracking effect of the present invention more clearly understood, the following detailed description of the embodiments of the present invention is provided in conjunction with the technical solution and the accompanying drawings.
The invention provides a tracking method based on target space-time consistency and local sparse representation. The method defines the assumption of space-time consistency of the target between two continuous frames according to the change characteristics of the target in the tracking process: (1) temporal consistency. The target does not change much in two consecutive frames, so in two consecutive frames, the target is considered to be consistent in appearance; (2) and (5) spatial consistency. Since the object is similar in two consecutive frames, the position of each portion of the object is substantially fixed in the two frames. For example, the samples are divided into image blocks that do not overlap with each other, and in a sample, once an image block is determined to be a target image block, if several adjacent image blocks of the image block are also determined to be target blocks, the sample is a target with high confidence. According to the characteristics of the space-time consistency defined above, the invention provides the measure for calculating the score of the space-time consistency of the sample, and solves the problem of target occlusion.
The invention solves the problems of illumination, target attitude change and the like by utilizing sparse coding of the local image blocks of the sample, solves the problem of local shielding by utilizing the partitioning of the target and defining the space-time consistency of the target image block on the basis, and improves the coding speed by only carrying out sparse coding on the image blocks with relatively small dimensions. The method comprises the following concrete steps:
step S1: and acquiring a sample, namely adding Gaussian disturbance to the central position of the tracking result of the previous frame in the step to obtain a positive sample central position set, acquiring the positive sample set on the image of the previous frame according to the central position set, and using the positive sample set for the current frame. The size of the collected positive sample is the same as the target size of the previous frame. Specifically, the method comprises the following steps: since the present invention is a generative model, only a positive sample needs to be collected. The invention utilizes Gaussian disturbance to collect the positive sample around the target tracking result of the previous frame, and the center of the collected positive sample meets LposS-Lpos||<CinnerWherein L ispos=[xpos,ypos]Is the center position, x, of the previous frame tracking resultpos,yposRespectively as abscissa and ordinate; l isposSFor the central position of the collected sample, CinnerThe size of the sample is consistent with the target size of the previous frame. The sample set collected is represented as:where each column is a sample and d and N are the sample dimension and number of samples, respectively. Step S2: division of the sampleAnd cutting and marking. As shown in fig. 2, the target position is defined by a rectangular box. The conventional sample blocking method is different from the present invention in that a series of overlapping image blocks are acquired by sliding a sub-window over a sample region. The invention divides the sample by using the fixed grid, obtains non-overlapped image blocks and can completely represent the sample. By sample segmentation with an h x h grid, for each sample, h is obtained2And each image block. As shown in fig. 2, the image blocks are labeled {1, 2.,. h, according to different positions in the grid2In the following steps, the invention uses these markers to indicate the position of the image block in the sample.
Step S3: and local dictionary learning. The present invention acquires image blocks by sliding on the positive samples acquired in step S1 through a fixed-size sub-window, and learns the dictionary D using these image blocks as training samples. The size of the sub-window for acquiring the training image block is consistent with the size of the image block obtained by using the grid in step S2. The dictionary learning method of the invention is an Online dictionary learning method [ J.Mairal and F.Bach, Online dictionary learning for dictionary coding, in: ICML 689-]. Step S4: the t-th frame candidate sample is acquired based on the resulting position of the known t-1 tracking. The invention uses affine transformation to describe the position relation between two continuous frame targets. The state of the tracking result at frame t-1 can be determined using six parameters of the affine transformation: st-1={s(t-1)x,s(t-1)y,θt-1,sct-1t-1,ψt-1In which(s)(t-1)x,s(t-1)y) Coordinates of x-direction and y-direction in the image plane, theta, for the center of the t-1 th tracking result(t-1)、sc(t-1)(t-1)、ψ(t-1)The t-1 tracking result rotation angle, the scaling scale, the target width-length ratio and the inclination angle parameter are respectively. The candidate sample state for the t-th frame may be determined by counting the number of samples in st-1And adding Gaussian disturbance to obtain the target. Definition ofp(st|st-1)=N(st;st-1Sigma) as a state-change movementModels with which the motion model can be basedst-1Acquiring a tth candidate sample state stAnd obtaining a candidate sample X on the t frame image according to the candidate sample state*Where Σ is the diagonal matrix, and the elements on the diagonal are the variances of each parameter of the previous t-1 frame tracking target affine state. The method acquires P candidate samples as a candidate sample set in each frame and records the candidate sample set as a candidate sample setWherein the 1 st item in the small brackets is a candidate sample, and the 2 nd item is the corresponding affine transformation state.
Step S5: and carrying out sparse coding on the image blocks obtained by dividing the positive sample set and the candidate sample set. Through step S1, a positive sample set is obtained. For any sample X in the sample setiH is obtained by step S22And each image block. Through step S3, an overcomplete sparse dictionary D may be learned. Through step S4, a candidate sample is obtained. In this step, the present invention performs sparse coding on the image blocks in the positive sample set and the candidate sample set (the candidate samples are divided by step S2) by using the dictionary D, and the sparse coding solving model is as follows:
wherein,for image blocks at the jth position of the ith sample in the sample set, dictionary To correspond to image block pijI ∈ {1, 2., N }, j ∈ {1, 2., h }, h ∈2}. λ is control l1A regularized weighting factor. Tong (Chinese character of 'tong')By solving this Lasso regression equation, the sample p is calculatedijConversion to sparse coding alphaij. The present invention utilizes the open source tool LARS [ B.Efron, T.Hastie, and I.Johnstone, Least angle regression, Ann.Stat., 32(2), (2004)407 499.]And solving the model (1).
Step S6: and (4) calculating the average sparse coding of positive samples. Sparse coding of the image blocks by converting the positive sample image blocks into image blocks { a } through step S5ijWhere i ═ 1, 2.., N }, j ═ 1, 2.., h2}. In this step, the average vector of image blocks having the same label is calculated according to the following formula.
<math> <mrow> <msub> <mover> <mi>&alpha;</mi> <mo>&OverBar;</mo> </mover> <mi>j</mi> </msub> <mo>=</mo> <mfrac> <mn>1</mn> <mi>N</mi> </mfrac> <munderover> <mi>&Sigma;</mi> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>N</mi> </munderover> <msub> <mi>&alpha;</mi> <mi>ij</mi> </msub> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>2</mn> <mo>)</mo> </mrow> </mrow> </math>
Wherein,is the average vector of the image block marked j.
Step S7: and calculating the consistency score of the candidate sample. The average vector of sparse coding at each position of the positive sample image block in the grid is obtained through step S6, and then any candidate sample X obtained through step S4 is used*The candidate sample image blocks are blocked and marked by the same grid in step S1, and sparse coded by step S5 to obtain h2Sparse coding of image blocksAccording to the assumption of time domain consistency, the invention designs the following formula to calculate the similarity of the average vectors with the same mark between the sparse codes of the image blocks at different positions and the previous frame, and the similarity is used as the time domain confidence of the candidate samples:
<math> <mrow> <mi>T</mi> <mrow> <mo>(</mo> <msup> <mi>X</mi> <mo>*</mo> </msup> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mn>1</mn> <msup> <mi>h</mi> <mn>2</mn> </msup> </mfrac> <munderover> <mi>&Sigma;</mi> <mrow> <mi>j</mi> <mo>=</mo> <mn>1</mn> </mrow> <msup> <mi>h</mi> <mn>2</mn> </msup> </munderover> <mi>sim</mi> <mrow> <mo>(</mo> <msub> <mover> <mi>&alpha;</mi> <mo>&OverBar;</mo> </mover> <mi>j</mi> </msub> <mo>,</mo> <msubsup> <mi>&alpha;</mi> <mi>j</mi> <mo>*</mo> </msubsup> <mo>)</mo> </mrow> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>3</mn> <mo>)</mo> </mrow> </mrow> </math>
where sim (,) can be designed as any similarity measure, in the present invention, image block coding using vector similarity to measure the jth position of candidate sample is usedAverage coding with the jth position of the previous frame positive sampleSimilarity of (2):whereinThe calculation method of (2) is as in step S6.
By analyzing equation (3), we can obtain: t (X) if the candidate sample is more similar to the previous frame positive sample*) The larger. Thus, the temporal consistency of the target image block is ensured by the formula (3). However, if the target is blocked or has a posture change during the tracking process, in this case, the local part of the target may be very dissimilar to the previous frame sample, and then a large error may occur when the similarity calculated by the formula (3) is used as the discrimination index. Accordingly, the present invention defines the spatial consistency of the target image block. The spatial consistency of the present invention is based on the following assumptions: if an image block of a candidate sample is considered as the target image block, then in this case the more image blocks surrounding this image block are also considered as the target image block, the higher the spatial confidence of the sample. The formula is defined as follows:
whereinAs candidate sample X*Of the jth image block.The number of elements in the neighboring image block set.Is an indicator function whose output is 1 if the two inputs are the same, and 0 otherwise.For judging candidate sample X*Whether the jth image block of (1)Is a target image block, and is defined as follows:
wherein,means X*Is the target image block with a higher confidence. τ is a threshold that controls the confidence of similarity as described above.
By the formula (4), it can be ensured that if the jth image is judged to be the target blockThe more of his neighbours are considered target fast, S (X)*) The larger, if the jth image is judged not to be the target blockThen its neighbor pair S (X)*) The method has no influence, and the concept of 'non-penalty viewing' can ensure the establishment of the spatial consistency and the connection between the image block and the adjacent image block, so that the method can make up the problem when the local temporal consistency is invalid by using the spatial information among the sample image blocks.
By combining the time domain confidence coefficient and the space domain confidence coefficient, the invention obtains the final space-time consistency scoring measure of the candidate sample:
f(X*)=T(X*)+S(X*) (6)
step S8: and determining a tracking result from the candidate sample set based on Bayesian inference. The tracking results obtained from the 1 st frame to the t-th frame are shown ass1:t={s1,s2,...,st-1And the tracking result is corresponding to the affine transformation state. Let us say that P candidate samples are obtained at the t-th frame using the state transition motion model in step S4WhereinFor the ith candidate sample of the tth frame,for its corresponding affine transformation state. The invention determines the target of the t frame from the candidate sample set omega according to the maximum posterior probability
<math> <mrow> <munder> <mi>max</mi> <mrow> <mo>{</mo> <msub> <mover> <mi>X</mi> <mo>^</mo> </mover> <mi>t</mi> </msub> <mo>,</mo> <msub> <mi>x</mi> <mi>t</mi> </msub> <mo>}</mo> <mo>&Element;</mo> <mi>&Omega;</mi> </mrow> </munder> <mi>p</mi> <mrow> <mo>(</mo> <msub> <mi>s</mi> <mi>t</mi> </msub> <mo>|</mo> <msub> <mover> <mi>X</mi> <mo>^</mo> </mover> <mrow> <mn>1</mn> <mo>:</mo> <mi>t</mi> </mrow> </msub> <mo>)</mo> </mrow> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>7</mn> <mo>)</mo> </mrow> </mrow> </math>
According to the Bayesian inference criterion,the solution can be recursively solved in the following way:
<math> <mrow> <mi>p</mi> <mrow> <mo>(</mo> <msub> <mi>s</mi> <mi>t</mi> </msub> <mo>|</mo> <msub> <mover> <mi>X</mi> <mo>^</mo> </mover> <mrow> <mn>1</mn> <mo>:</mo> <mi>t</mi> </mrow> </msub> <mo>)</mo> </mrow> <mo>&Proportional;</mo> <mi>p</mi> <mrow> <mo>(</mo> <msub> <mover> <mi>X</mi> <mo>^</mo> </mover> <mi>t</mi> </msub> <mo>|</mo> <msub> <mi>s</mi> <mrow> <mo>-</mo> <mn>1</mn> </mrow> </msub> <mo>)</mo> </mrow> <mi>p</mi> <mrow> <mo>(</mo> <msub> <mi>s</mi> <mrow> <mi>t</mi> <mo>-</mo> <mn>1</mn> </mrow> </msub> <mo>|</mo> <msub> <mover> <mi>X</mi> <mo>^</mo> </mover> <mrow> <mn>1</mn> <mo>:</mo> <mi>t</mi> <mo>-</mo> <mn>1</mn> </mrow> </msub> <mo>)</mo> </mrow> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>8</mn> <mo>)</mo> </mrow> </mrow> </math>
wherein <math> <mrow> <mi>p</mi> <mrow> <mo>(</mo> <msub> <mi>s</mi> <mi>t</mi> </msub> <mo>|</mo> <msub> <mover> <mi>X</mi> <mo>^</mo> </mover> <mrow> <mn>1</mn> <mo>:</mo> <mi>t</mi> <mo>-</mo> <mn>1</mn> </mrow> </msub> <mo>)</mo> </mrow> <mo>=</mo> <mo>&Integral;</mo> <mi>p</mi> <mrow> <mo>(</mo> <msub> <mi>s</mi> <mi>t</mi> </msub> <mo>|</mo> <msub> <mi>s</mi> <mrow> <mi>t</mi> <mo>-</mo> <mn>1</mn> </mrow> </msub> <mo>)</mo> </mrow> <mi>p</mi> <mrow> <mo>(</mo> <msub> <mi>s</mi> <mrow> <mi>t</mi> <mo>-</mo> <mn>1</mn> </mrow> </msub> <mo>|</mo> <msub> <mover> <mi>X</mi> <mo>^</mo> </mover> <mrow> <mn>1</mn> <mo>:</mo> <mi>t</mi> <mo>-</mo> <mn>1</mn> </mrow> </msub> <mo>)</mo> </mrow> <mi>d</mi> <msub> <mi>s</mi> <mrow> <mi>t</mi> <mo>-</mo> <mn>1</mn> </mrow> </msub> <mo>,</mo> </mrow> </math> According to step S4, the motion model is p (S)t|st-1)=N(st;st-1Sigma). In the inventionSpatiotemporal consistency score defined as the goal:
p ( X ^ t | s t - 1 ) = f ( X ^ t ) - - - ( 9 )
thus p(s)t|st-1) Substituting equation (9) into equation (8) and solving recursively, equation (7) can be converted into
<math> <mrow> <munder> <mi>max</mi> <mrow> <mo>{</mo> <msub> <mover> <mi>X</mi> <mo>^</mo> </mover> <mi>t</mi> </msub> <mo>,</mo> <msub> <mi>s</mi> <mi>t</mi> </msub> <mo>}</mo> <mo>&Element;</mo> <mi>&Omega;</mi> </mrow> </munder> <mi>f</mi> <mrow> <mo>(</mo> <msub> <mover> <mi>X</mi> <mo>^</mo> </mover> <mi>t</mi> </msub> <mo>)</mo> </mrow> <munderover> <mi>&Pi;</mi> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mrow> <mi>t</mi> <mo>-</mo> <mn>1</mn> </mrow> </munderover> <mi>p</mi> <mrow> <mo>(</mo> <msub> <mi>s</mi> <mrow> <mi>i</mi> <mo>+</mo> <mn>1</mn> </mrow> </msub> <mo>|</mo> <msub> <mi>s</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>10</mn> <mo>)</mo> </mrow> </mrow> </math>
The candidate sample that maximizes equation (10) may be selected from the candidate sample set Ω as the tracking result of the t-th frame by equation (10).
In the above method, the acquisition and segmentation of the candidate sample set in step 4 are completed before step 5, and the order relationship with steps 1-3 is not particularly limited, and in different embodiments, may be selected according to actual situations.
The invention utilizes 8 image sequence data sets disclosed on the network as an example to explain the effectiveness of the method, and 8 images together contain 5707 frame image sequences, covering the difficulties of external illumination change, target attitude change, shielding, target scale change, target rapid motion and the like which are possibly encountered in practical application. The data set may be downloaded in the following links: http:// cvlab. hand. ac. kr/tracker _ benchmark _ v10. html.
Fig. 1 corresponds to step S1, in each frame, the present invention collects N-60 positive samples around the tracking target by using gaussian disturbance, the center of the positive sample is limited to the above frame target center position as the center of the circle, Ciinner-3 pixels are inside the circle with radius (as shown in figure 1 by circular shading). The size of the acquisition window is consistent with the size of the target of the previous frame. All the collected positive samples are uniform in size 32 × 32.
The way in which the sample is segmented and marked is shown in figure 2. The invention divides samples by using a 4 x 4 grid, so each sample is divided into 16 non-overlapping image blocks, and each image block marks the image block by using Arabic numbers 1-16 in the grid.
FIG. 3 is a calculation process of spatiotemporal consistency score, which is mainly divided into the following steps: collecting a positive sample; dividing and marking the positive sample; training dictionary by utilizing positive sample image blockCoding image blocks by utilizing a dictionary, and calculating an average vector of sparse coding of positive sample image blocks with the same marks; acquiring a next frame target candidate sample set through the state change model in the step S4, and segmenting, marking and coding all candidate samples by using the same method; and fifthly, calculating the space-time consistency scores of all the candidate samples in the candidate sample set by using the formulas (3) to (6) in the step S7.
The neighborhood set of image blocks is illustrated by the spatiotemporal consistency score computation phase in fig. 3. As shown by the arrows in the graph space confidence box: the neighbor set of the image block marked as 1 is the image block marked as 2, 5, 6; while the neighbor set of the image block labeled 11 is the image block labeled 6, 7, 8, 10, 12, 14, 15, 16. The neighbors of other image blocks are defined and so on.
τ in equation (5) is set to 0.55.
Fig. 4 is a tracking flowchart of the present invention, which first collects positive samples according to the target position of the known frame 1 (step S1), initializes the dictionary (steps S2-S3), then collects candidate sample sets on the frame 2 according to step S4, then calculates spatio-temporal consistency scores of all candidate samples according to steps S5-S7, and finally selects candidate samples capable of maximizing the formula (10) from the candidate samples as the tracking result of the frame 2 by using bayesian inference (step S8). After the tracking result of a new frame is known, operations such as collecting positive samples can be carried out again. By circulating the steps, the tracking result of each frame in the future can be obtained.
FIG. 5 is an experimental result of key frames on a public image dataset sequence with tracking results represented by boxes according to the present invention. Since the target or external environment changes most drastically in these key frames, tracking drift is most likely to occur in these key frames, resulting in tracking failure. The representation of the invention in these key frames shows the robustness and effectiveness of the method.
In fig. 5 (a) and (g), the external illumination change is the main difficulty in tracking the two data sets, especially for the cases of the target entering into the shadow in about 188 frames and exiting out of the shadow in about 230 frames as in (a), and experiments show that the present invention is robust to the tracking of similar illumination changes.
In (c), (f) and (h) of fig. 5, there are various attitude changes of the target. For example, in (c), the side face of #146 frame, the side body of #171 frame, and the left and right of #294 frame are removed from glasses, and the expression of 305 frame is changed. While the target in (f) observes substantially complete differences in pose during the turn, there is also a large amount of non-planar motion of the target in (h). Nevertheless, the temporal-spatial consistency defined by the present invention in consecutive frames is satisfied because the change of the pose occurs slowly, i.e. the target does not change much in pose in consecutive frames. Therefore, the invention shows the effectiveness of the tracking method under the condition of the target posture change.
While there is a serious occlusion problem near the #456 frame of the sequences (b) and (e) and (f) of fig. 5, the present invention can well deal with similar object occlusion problems by blocking the images and using spatial information between the images.
The sequence (d) of fig. 5 mainly shows that the present invention can effectively deal with the problem of fast motion of the target. The target position change between two consecutive frames in (d) of fig. 5 is relatively large, but the present invention can solve this problem well by adjusting the displacement parameters of affine transformation in the motion model. The target scale change problem, which also exists in fig. 5 (a), (c) and (f), can also be overcome by using the scale change parameters in the motion model.
The above-mentioned embodiments and experimental examples have described the technical solutions, implementation details and effectiveness of the methods of the present invention in detail. It should be understood that the above description is only exemplary of the present invention, and is not intended to limit the present invention, and any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (10)

1. A tracking method based on space-time consistency and local sparse representation of a target comprises the following steps:
the method comprises the following steps: acquiring a positive sample set for the current frame by Gaussian disturbance according to the position of the tracking result of the previous frame;
step two: dividing each positive sample in the positive sample set into image blocks and marking, and learning a dictionary by using the image blocks obtained by division;
step three: carrying out sparse coding on image blocks obtained by dividing each positive sample in the positive sample set by using the learning dictionary, and calculating average vectors of sparse coding of the image blocks with the same marks;
step four: acquiring a candidate sample set of a current frame from a tracking result of a previous frame by using a motion model with affine transformation state change, and segmenting each candidate sample in the candidate sample set into image blocks and marking the image blocks;
step five: calculating a space-time consistency score of each candidate sample according to the average vector of the image block sparse codes corresponding to the image blocks with the same mark in the positive sample set and the image block sparse code of each candidate sample;
step six: and determining the tracking result of the current frame according to the space-time consistency score of each candidate sample in the candidate sample set based on Bayesian inference.
2. The tracking method as claimed in claim 1, wherein the image block of each candidate sample in step four is partitioned in the same way and in the same number as the image block of the positive sample.
3. The tracking method according to claim 2, wherein in step five, the space-time consistency score is the sum of a time domain confidence degree and a space domain confidence degree; the time domain confidence coefficient is in direct proportion to the average vector corresponding to the sparse coding of the image blocks with the same mark in the positive sample set and the similarity of the sparse coding of the image blocks with the same mark in the candidate samples; the spatial confidence is positively correlated with the number of neighboring image blocks of each image block in the candidate sample that are considered target image blocks.
4. The tracking method according to claim 3, characterized in that the time-domain confidence is calculated as follows:
<math> <mrow> <mi>T</mi> <mrow> <mo>(</mo> <msup> <mi>X</mi> <mo>*</mo> </msup> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mn>1</mn> <msup> <mi>h</mi> <mn>2</mn> </msup> </mfrac> <munderover> <mi>&Sigma;</mi> <mrow> <mi>j</mi> <mo>=</mo> <mn>1</mn> </mrow> <msup> <mi>h</mi> <mn>2</mn> </msup> </munderover> <mi>sim</mi> <mrow> <mo>(</mo> <msub> <mover> <mi>&alpha;</mi> <mo>&OverBar;</mo> </mover> <mi>j</mi> </msub> <mo>,</mo> <msubsup> <mi>&alpha;</mi> <mi>j</mi> <mo>*</mo> </msubsup> <mo>)</mo> </mrow> <mo>;</mo> </mrow> </math>
wherein, T (X)*) As candidate sample X*Time domain confidence of h2Is the number of image blocks of the candidate sample,the average vector of sparse coding marked as the jth image block in the corresponding positive sample set is used;as candidate sample X*Is marked as sparse coding of the jth image block;
the spatial confidence is calculated as follows:
wherein, the S (X)*) As candidate sample X*The spatial confidence of the spatial domain of (c),as candidate sample X*The neighbor image block set of the jth image block;the number of elements in the neighbor image block set is the number of the elements in the neighbor image block set;is an indicator function whose output is 1 if the two inputs are the same, and 0 otherwise;
wherein,representing candidate sample X*Is the target image block; τ is the control threshold.
5. The tracking method according to claim 1, wherein in step six, the tracking result of the current frame is determined from the candidate sample set by:
<math> <mrow> <munder> <mi>max</mi> <mrow> <mo>{</mo> <msub> <mover> <mi>X</mi> <mo>^</mo> </mover> <mi>t</mi> </msub> <mo>,</mo> <msub> <mi>s</mi> <mi>t</mi> </msub> <mo>}</mo> <mo>&Element;</mo> <mi>&Omega;</mi> </mrow> </munder> <mi>f</mi> <mrow> <mo>(</mo> <msub> <mover> <mi>X</mi> <mo>^</mo> </mover> <mi>t</mi> </msub> <mo>)</mo> </mrow> <munderover> <mi>&Pi;</mi> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mrow> <mi>t</mi> <mo>-</mo> <mn>1</mn> </mrow> </munderover> <mi>p</mi> <mrow> <mo>(</mo> <msub> <mi>s</mi> <mrow> <mi>i</mi> <mo>+</mo> <mn>1</mn> </mrow> </msub> <mo>|</mo> <msub> <mi>s</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> </mrow> </math>
wherein,candidate samples representing the current frame tScore of spatiotemporal consistency, stIs a candidate sample of the current frame tCorresponding affine transformation state, wherein omega represents a candidate sample set; p(s)t|st-1)=N(st;st-1And sigma) is a motion model of affine transformation state change, sigma is a diagonal matrix, and elements on the diagonal are the variance of each parameter of the previous t-1 frames of affine transformation state.
6. The tracking method according to claim 1, wherein the center of the positive sample collected in the first step satisfies | | | LposS-Lpos||<CinnerWherein L ispos=[xpos,ypos]Is the center position, x, of the last frame tracking resultpos,yposRespectively as abscissa and ordinate; l isposSFor the central position of the collected sample, CinnerThe size of the sampled positive sample is consistent with the size of the tracking target in the previous frame by the radius of the circle of the sampling area.
7. The tracking method according to claim 1, wherein the sparse coding of image blocks in step three is obtained according to the following model:
wherein,an image block marked as the jth position for the ith positive sample in the positive sample set,in order to be said learning dictionary,to correspond to image block pijI ∈ {1, 2., N }, j ∈ {1, 2., h }, h ∈2N is the number of positive samples in the positive sample set, h2λ is the regularization weighting factor for the number of image blocks per positive sample in the positive sample set.
8. The tracking method according to claim 1, wherein in step two, the positive samples are divided by using a grid with a fixed size, and the divided image blocks are marked according to positions of the grid, and the divided image blocks of each positive sample do not overlap with each other.
9. The tracking method according to claim 1, wherein in the first step, a set of center positions of the positive samples is obtained by adding gaussian disturbance to the center position of the previous frame of tracking result, and a set of positive samples is acquired on the previous frame of image according to the set of center positions and is used for the current frame; the size of the positive sample obtained by sampling is the same as the target size of the previous frame.
10. A tracking apparatus based on target spatiotemporal consistency and local sparse representation, comprising:
a positive sample collection module: acquiring a positive sample set for the current frame by Gaussian disturbance according to the position of the tracking result of the previous frame;
a segmentation and labeling module: dividing each positive sample in the positive sample set into image blocks and marking, and learning a dictionary by using the image blocks obtained by division;
a sparse coding module: carrying out sparse coding on image blocks obtained by dividing each positive sample in the positive sample set by using the learning dictionary, and calculating average vectors of sparse coding of the image blocks with the same marks;
a candidate sample segmentation module: acquiring a candidate sample set of a current frame from a tracking result of a previous frame by using a motion model with affine transformation state change, and segmenting each candidate sample in the candidate sample set into image blocks and marking the image blocks;
a scoring module: calculating a space-time consistency score of each candidate sample according to the average vector of the image block sparse codes corresponding to the image blocks with the same mark in the positive sample set and the image block sparse code of each candidate sample;
a tracking module: and determining the tracking result of the current frame according to the space-time consistency score of each candidate sample in the candidate sample set based on Bayesian inference.
CN201410770556.9A 2014-12-12 2014-12-12 Tracking and device based on target space-time consistency and local rarefaction representation Active CN104537686B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410770556.9A CN104537686B (en) 2014-12-12 2014-12-12 Tracking and device based on target space-time consistency and local rarefaction representation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410770556.9A CN104537686B (en) 2014-12-12 2014-12-12 Tracking and device based on target space-time consistency and local rarefaction representation

Publications (2)

Publication Number Publication Date
CN104537686A true CN104537686A (en) 2015-04-22
CN104537686B CN104537686B (en) 2017-10-03

Family

ID=52853205

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410770556.9A Active CN104537686B (en) 2014-12-12 2014-12-12 Tracking and device based on target space-time consistency and local rarefaction representation

Country Status (1)

Country Link
CN (1) CN104537686B (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104933417A (en) * 2015-06-26 2015-09-23 苏州大学 Behavior recognition method based on sparse spatial-temporal characteristics
CN105096339A (en) * 2015-07-08 2015-11-25 温州大学 Object tracking method based on posterior template dictionary learning
CN107273873A (en) * 2017-07-13 2017-10-20 武汉大学 Pedestrian based on irregular video sequence recognition methods and system again
CN107424171A (en) * 2017-07-21 2017-12-01 华中科技大学 A kind of anti-shelter target tracking based on piecemeal
CN109614933A (en) * 2018-12-11 2019-04-12 闽江学院 A kind of motion segmentation method based on certainty fitting
CN110502968A (en) * 2019-07-01 2019-11-26 西安理工大学 The detection method of infrared small dim moving target based on tracing point space-time consistency
CN113419214A (en) * 2021-06-22 2021-09-21 桂林电子科技大学 Indoor positioning method for target without carrying equipment
CN115018875A (en) * 2022-06-22 2022-09-06 浙江大华技术股份有限公司 Data augmentation method and device and storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130245429A1 (en) * 2012-02-28 2013-09-19 Siemens Aktiengesellschaft Robust multi-object tracking using sparse appearance representation and online sparse appearance dictionary update
CN103985143A (en) * 2014-05-30 2014-08-13 上海交通大学 Discriminative online target tracking method based on videos in dictionary learning

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130245429A1 (en) * 2012-02-28 2013-09-19 Siemens Aktiengesellschaft Robust multi-object tracking using sparse appearance representation and online sparse appearance dictionary update
CN103985143A (en) * 2014-05-30 2014-08-13 上海交通大学 Discriminative online target tracking method based on videos in dictionary learning

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
孙浩: "运动成像平台近景视频运动目标检测技术研究", 《中国博士学位论文全文数据库》 *
戴平阳 等: "一种基于稀疏表示的判别式目标跟踪算法", 《厦门大学学报(自然科学版)》 *

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104933417A (en) * 2015-06-26 2015-09-23 苏州大学 Behavior recognition method based on sparse spatial-temporal characteristics
CN104933417B (en) * 2015-06-26 2019-03-15 苏州大学 A kind of Activity recognition method based on sparse space-time characteristic
CN105096339A (en) * 2015-07-08 2015-11-25 温州大学 Object tracking method based on posterior template dictionary learning
CN105096339B (en) * 2015-07-08 2017-12-29 温州大学 A kind of method for tracking target based on posteriority template dictionary learning
CN107273873B (en) * 2017-07-13 2019-09-17 武汉大学 Pedestrian based on irregular video sequence recognition methods and system again
CN107273873A (en) * 2017-07-13 2017-10-20 武汉大学 Pedestrian based on irregular video sequence recognition methods and system again
CN107424171B (en) * 2017-07-21 2020-01-03 华中科技大学 Block-based anti-occlusion target tracking method
CN107424171A (en) * 2017-07-21 2017-12-01 华中科技大学 A kind of anti-shelter target tracking based on piecemeal
CN109614933A (en) * 2018-12-11 2019-04-12 闽江学院 A kind of motion segmentation method based on certainty fitting
CN110502968A (en) * 2019-07-01 2019-11-26 西安理工大学 The detection method of infrared small dim moving target based on tracing point space-time consistency
CN110502968B (en) * 2019-07-01 2022-03-25 西安理工大学 Method for detecting infrared small and weak moving target based on track point space-time consistency
CN113419214A (en) * 2021-06-22 2021-09-21 桂林电子科技大学 Indoor positioning method for target without carrying equipment
CN113419214B (en) * 2021-06-22 2022-08-30 桂林电子科技大学 Indoor positioning method for target without carrying equipment
CN115018875A (en) * 2022-06-22 2022-09-06 浙江大华技术股份有限公司 Data augmentation method and device and storage medium

Also Published As

Publication number Publication date
CN104537686B (en) 2017-10-03

Similar Documents

Publication Publication Date Title
CN104537686B (en) Tracking and device based on target space-time consistency and local rarefaction representation
Feng et al. Water body extraction from very high-resolution remote sensing imagery using deep U-Net and a superpixel-based conditional random field model
CN110097568B (en) Video object detection and segmentation method based on space-time dual-branch network
US20220067335A1 (en) Method for dim and small object detection based on discriminant feature of video satellite data
CN112597941B (en) Face recognition method and device and electronic equipment
Zhou et al. BOMSC-Net: Boundary optimization and multi-scale context awareness based building extraction from high-resolution remote sensing imagery
CN112966691B (en) Multi-scale text detection method and device based on semantic segmentation and electronic equipment
Nguyen et al. Fast and robust spatially constrained Gaussian mixture model for image segmentation
EP3819859B1 (en) Sky filter method for panoramic images and portable terminal
CN108062525B (en) Deep learning hand detection method based on hand region prediction
Li et al. Adaptive deep convolutional neural networks for scene-specific object detection
CN107633226B (en) Human body motion tracking feature processing method
CN111368846B (en) Road ponding identification method based on boundary semantic segmentation
Ding et al. Adversarial shape learning for building extraction in VHR remote sensing images
CN113052873B (en) Single-target tracking method for on-line self-supervision learning scene adaptation
Zhang et al. Polar coordinate sampling-based segmentation of overlapping cervical cells using attention U-Net and random walk
CN110084201B (en) Human body action recognition method based on convolutional neural network of specific target tracking in monitoring scene
CN113920148B (en) Building boundary extraction method and equipment based on polygon and storage medium
CN106780450A (en) A kind of image significance detection method based on low-rank Multiscale Fusion
Zhou et al. Large-scale road extraction from high-resolution remote sensing images based on a weakly-supervised structural and orientational consistency constraint network
US11367206B2 (en) Edge-guided ranking loss for monocular depth prediction
Wang et al. High-resolution remote sensing image semantic segmentation based on a deep feature aggregation network
Oreski YOLO* C—Adding context improves YOLO performance
CN103065302B (en) Image significance detection method based on stray data mining
CN110751670B (en) Target tracking method based on fusion

Legal Events

Date Code Title Description
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant