CN103886585A

CN103886585A - Video tracking method based on rank learning

Info

Publication number: CN103886585A
Application number: CN201410054630.7A
Authority: CN
Inventors: 于慧敏; 曾雄
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University ZJU
Priority date: 2014-02-18
Filing date: 2014-02-18
Publication date: 2014-06-25

Abstract

The invention discloses a video tracking method based on rank learning. The method comprises the steps of firstly compressing multi-scale image features by using a sparse measurement matrix based on a compressed sensing theory, secondly using a Median-Flow tracking algorithm as a predictor to obtain the rough position of a target and constructing a training data set for an RV-SVM algorithm, and finally sorting training samples and taking the RV-SVM algorithm as a binary classifier to separate the target and a background to achieve the purpose of video tracking. The training process of the RV-SVM algorithm is a linear programming problem, the training time of online learning is reduced, and the efficiency of a tracking system is improved. Through the combination of multi-scale image compression feature extraction, the Median-Flow tracking algorithm and the RV-SVM algorithm, problems of target scale change, partial occlusion, 3D rotation, posture change, target fast movement and the like in a video tracking process can be effectively processed.

Description

A kind of video tracing method based on sequence study

Technical field

The invention belongs to computer vision and area of pattern recognition, particularly a kind of video tracing method based on sequence study.

Background technology

Video tracking is a critical research topic of computer vision field, is with a wide range of applications at aspects such as intelligent video monitoring, augmented reality, man-machine interaction, gesture identification and automatic Pilots.Recent two decades comes, although researchist has proposed very many track algorithms both at home and abroad, but it is still a very challenging problem, because efficiently video tracking algorithm need to be processed the problems such as target scale variation in real video scene, illumination variation, partial occlusion, camera rotation, deformation of body.

According to target performance modeling method difference used, track algorithm can be divided into two classes: the target tracking algorism based on generation model and the target tracking algorism based on discrimination model.First track algorithm based on generation model learns a target performance model, then the search target the most similar to this model on every two field picture.Track algorithm based on discrimination model is regarded Target Tracking Problem as a binary classification problems, by sorter of on-line study, target and background is separated.

At present, the track algorithm based on discrimination model becomes the main stream approach in video tracking field just gradually.Video tracking algorithm based on discrimination model is known as again the tracking (tracking-by-detection) based on detecting, and the key step of such algorithm is as follows: 1) known target initial position, and the positive negative sample of present frame is extracted in sampling, online training classifier; 2) read in next frame image, the invariant position of two frame targets before and after general hypothesis, abstract image sample around target location; 3) sample of extraction is sent into the sorter of training before, can be determined the reposition of target according to the highest sample of sorter score.The object of most of video tracking algorithm based on detecting in can processing section real scene changes, but all in various degree there is drifting problem, cause the loss of tracked target.

Conventionally in order to process the dimensional variation of target in tracing process, need to extract multi-scale image feature.Multi-scale image intrinsic dimensionality is too high, can utilize compressive sensing theory to carry out dimensionality reduction to multi-scale image feature.Intermediate value-light stream (Median-Flow) track algorithm tracking effect is good, and calculated amount is little, is suitable as weak tracker, and the guestimate of target location is provided.Ranking Algorithm is in the widespread use of machine learning field, as text retrieval, product grading and semantic analysis etc., recently, Ranking Algorithm starts to be used to computer vision field, ordering vector-support vector machine (Ranking Vector Support Vector Machine, RV-SVM) algorithm is a kind of up-to-date Ranking Algorithm.

Summary of the invention

The object of the invention is to overcome the deficiencies in the prior art, a kind of video tracing method based on sequence study is provided.

Video tracing method based on sequence study comprises following steps:

1) represent target O by rectangular image set of pixels, target O size is h× w, utilize multiple dimensioned rectangular filter to generate the multi-scale image feature of target O;

2), based on compressive sensing theory, utilize sparse random measurement matrix compression multi-scale image feature;

3) adopt Median-Flow track algorithm as fallout predictor, the rough position of target of prediction O in next frame image;

4) build training sample set

with

the target image piece collection extracting from initial frame and multiple nearest frame,

the background image piece collection extracting from nearest two field picture,

it is the weak mark training sample set extracting from current frame image;

5) utilize RV-SVM algorithm as on-line study sorter, target O and background are separated to reach the object of video tracking.

Described step 1) is:

(1) target O and a series of multiple dimensioned rectangular filter are carried out to convolution

l_{i, j} (\tilde{x}, \tilde{y}) = s (\tilde{x}, \tilde{y}) * h_{i, j} (\tilde{x}, \tilde{y})

Wherein,

represent target O, with

the pixel coordinate of target O,

r is vector space, and dimension of a vector space is w× h, be the size of target O,

for multiple dimensioned rectangular filter, be defined as follows

In formula, i and j are rectangular filter label, i=1 ..., w, j=1 ..., h;

(2) by filtered image array

be launched into column vector l _k, k=1 ..., r, r= w× h;

(3) by column vector l _kconnect into the multi-scale image feature h of higher-dimension:

h = {(l_{1}^{T}, . . . l_{r}^{T})}^{T}

In formula, h ∈ R ^m, m=r ², T represents vectorial transposition.

Described step 2) be:

(1) generate sparse stochastic matrix φ, φ ∈ R ^{n × m}, the element φ of φ _i,jfor:

φ_{i, j} = \sqrt{s} \times \{\begin{matrix} 1, & p = 1 / 2 s \\ 0, & p = 1 - 1 / s \\ - 1, & p = 1 / 2 s \end{matrix}

In formula, s chooses between 2 to 4 at random by average probability, and p represents probability;

(2) utilize sparse stochastic matrix φ to do projection to multi-scale image feature h,

x＝φh

Obtain multi-scale image compressive features x, x ∈ R ^{n × 1}.

Described step 3) is:

(1) by abstract target O be 10 × 10 grid unique point, to each unique point, comprise with one the image block that this unique point and size are 4 × 4 and represent;

(2) follow the tracks of these unique points by pyramid optical flow method, the pyramidal number of plies used is 5 layers;

(3) calculate FB error and the NCC error of these unique points, wherein, the computing formula of FB error is as follows:

FB (T_{f}^{k} | S) = ED (T_{f}^{k}, T_{b}^{k})

In formula, S is one section of video or image sequence, and ED represents Euclidean distance, the t frame that t is video,

representation feature point x _tforward direction is followed the tracks of k step,

representation feature point x _t+kbackward tracking k step,

T_{b}^{k} = ({\hat{x}}_{t}, {\hat{x}}_{t + 1}, . . ., {\hat{x}}_{t + k});

(4) the less unique point of tracking error of selection 50%, all the other unique points of 50% are used as exterior point and are rejected;

(5) median of each Spatial Dimension of calculating residue character point is estimated the position of target O, for every pair of unique point, calculate the ratio of the Euclidean distance in their Euclidean distance and former frame images in current frame image, and then calculating the mean value of these ratios, this mean value is the yardstick of target.

Described step 4) is:

Build training sample set

X_{t}^{1}, X_{t}^{1} = {x^{'} : | | l_{s} (x^{'}) - l_{s}^{*} | | \leq α, s = 1, t - Δt, . . ., t};

Build training sample set

X_{t}^{0}, X_{t}^{0} = {x^{'} : γ < | | l_{s} (x^{'}) - l_{s}^{*} | | < β, s = t - Δt, . . ., t};

Build weak mark training sample set

X_{t + 1}^{w}, X_{t + 1}^{w} = {x^{'} : < | | l_{s} (x^{'}) - l_{s}^{w} | | < α, s = t + 1};

Wherein, the frame number that t is video, Δ t is the number of nearest frame, l _s(x ') is illustrated in s two field picture, the position of image block x ', represent the actual position of target O,

the rough position of the target O obtaining for Median-Flow track algorithm, α, β and γ are sample radius.

Described step 5) is:

(1) extract mark training sample set

with

in the multi-scale image compressive features x of each image block;

(2) establish training sample set

in feature ordering higher than

with

in feature ordering,

and

, wherein, i and j are feature sequence number, i=1 ..., N ₁, j=N ₁+ 1 ..., N ₁+ N ₀, N ₁for training sample set

in number of samples, N ₀for training sample set

with

in number of samples sum;

(3) according to the condition of setting in step (2), use training sample set

with

multi-scale image compressive features x remove to train ranking functions F _t+1(x), first solve linear programming problem:

minL(α,ξ)＝Σ _iα _i+Σ _ijξ _ij

s.t.Σ _iα _i(K(x _i,x _u)-K(x _i,x _v))≥1-ξ _uv,α≥0,ξ≥0

In formula, u=1 ..., N ₁, v=N ₁+ 1 ..., N ₁+ N ₀, ξ is slack variable, K (. .) be kernel function, for linear kernel, K (x, z)=<x, z>;

Obtaining the optimum solution α of linear programming problem ^*afterwards, ranking functions F _t+1(x) can be represented by the formula

F_{t + 1} (x) = Σ_{i} α_{i}^{*} K (x_{i}, x)

In formula, K (. .) be kernel function, i=1 ..., N ₁;

(4) target O is separated with background, make ranking functions F _t+1(x) be worth maximum image block and be the actual position of target O

l_{t + 1}^{*} = l (\arg \max F_{t + 1} (x)), x &Element; X_{t + 1}^{w}

In formula,

first to obtain the rough position of target O in current frame image with Median-Flow track algorithm, and then the weak mark training sample set extracting, x is weak mark training sample set

in the multi-scale image compressive features of image block, l (x) represents the position of the image block that multi-scale image compressive features x is corresponding,

represent the actual position of target in present frame.

The invention has the beneficial effects as follows:

1) proposed a kind of video tracing method based on sequence study, the method is utilized a sparse stochastic matrix compression multi-scale image feature, has retained the almost full detail of feature, and has avoided dimension disaster.

2) the present invention adopts Median-Flow track algorithm as fallout predictor, the position of estimating target in next frame also builds training sample set, this process not only calculated amount is little, effective, and can effectively process the sudden change of target location in video, makes tracker robust more.

3) adopt RV-SVM algorithm as on-line learning algorithm, not only training process is transformed into a linear programming problem from a quadratic programming for inscribing, and the computing method of kernel function are simplified, thereby greatly reduce training time of on-line study, improve the efficiency of system.

4) the present invention passes through the extraction of multi-scale image compressive features, Median-Flow track algorithm and the combination of RV-SVM algorithm, realize a robust, efficient video frequency following system, can effectively process the problems such as target scale variation, partial occlusion, 3D rotation, postural change and target fast moving in video tracking process.

Brief description of the drawings

Fig. 1 is the general flow chart of the video tracing method based on sequence study;

Fig. 2 is the schematic diagram of the Median-Flow algorithm part in Fig. 1.

Embodiment

In order to make object of the present invention, technical scheme and advantage clearer, below in conjunction with drawings and Examples, the present invention is further elaborated.

As depicted in figs. 1 and 2, the video tracing method based on sequence study comprises following steps:

4) build training sample set

with

it is the weak mark training sample set extracting from current frame image;

Described step 1) is:

l_{i, j} (\tilde{x}, \tilde{y}) = s (\tilde{x}, \tilde{y}) * h_{i, j} (\tilde{x}, \tilde{y})

Wherein,

represent target O, with

the pixel coordinate of target O,

for multiple dimensioned rectangular filter, be defined as follows

In formula, i and j are rectangular filter label, i=1 ..., w, j=1 ..., h;

(2) by filtered image array be launched into column vector l _k, k=1 ..., r, r= w× h;

h = {(l_{1}^{T}, . . ., l_{r}^{T})}^{T}

In formula, h ∈ R ^m, m=r ², T represents vectorial transposition.

Described step 2) be:

φ_{i, j} = \sqrt{s} \times \{\begin{matrix} 1, & p = 1 / 2 s \\ 0, & p = 1 - 1 / s \\ - 1, & p = 1 / 2 s \end{matrix}

x＝φh

Obtain multi-scale image compressive features x, x ∈ R ^{n × 1}.

In described step 3), as shown in Figure 2, detailed step is the block diagram of Median-Flow algorithm:

(3) calculate forward-backward algorithm (Forward-Backward, FB) error and normalized crosscorrelation (Normalized Cross-Correlation, the NCC) error of these unique points, wherein, the computing formula of FB error is as follows:

FB (T_{f}^{k} | S) = ED (T_{f}^{k}, T_{b}^{k})

representation feature point xt forward direction is followed the tracks of k step,

representation feature point x _t+kbackward tracking k step,

T_{b}^{k} = ({\hat{x}}_{t}, {\hat{x}}_{t + 1}, . . ., {\hat{x}}_{t + k});

Described step 4) is:

Build training sample set

X_{t}^{1}, X_{t}^{1} = {x^{'} : | | l_{s} (x^{'}) - l_{s}^{*} | | \leq α, s = 1, t - Δt, . . ., t};

Build training sample set

X_{t}^{0}, X_{t}^{0} = {x^{'} : γ < | | l_{s} (x^{'}) - l_{s}^{*} | | < β, s = t - Δt, . . ., t};

Build weak mark training sample set

X_{t + 1}^{w}, X_{t + 1}^{w} = {x^{'} : < | | l_{s} (x^{'}) - l_{s}^{w} | | < α, s = t + 1};

Wherein, the frame number that t is video, Δ t is the number of nearest frame, l _s(x ') is illustrated in s two field picture, the position of image block x ',

represent the actual position of target O,

Described step 5) is:

(1) extract mark training sample set

with in the multi-scale image compressive features x of each image block;

(2) establish training sample set

in feature ordering higher than

with

in feature ordering,

and

in number of samples, N ₀for training sample set

with

in number of samples sum;

(3) according to the condition of setting in step (2), use training sample set

with

minL(α,ξ)＝Σ _iα _i+Σ _ijξ _ij

s.t.Σ _iα _i(K(x _i,x _u)-K(x _i,x _v))≥1-ξ _uv,α≥0,ξ≥0

F_{t + 1} (x) = Σ_{i} α_{i}^{*} K (x_{i}, x)

In formula, K (. .) be kernel function, i=1 ..., N ₁;

l_{t + 1}^{*} = l (\arg \max F_{t + 1} (x)), x &Element; X_{t + 1}^{w}

In formula,

first to obtain the rough position of target O in current frame image with Median-Flow track algorithm, and then the weak mark training sample set extracting, x is weak mark training sample set in the multi-scale image compressive features of image block, l (x) represents the position of the image block that multi-scale image compressive features x is corresponding, represent the actual position of target in present frame.

Embodiment 1

Based on a video tracing method for sequence study, comprise the following steps:

1) read in initial image frame, initialization target location parameter wherein for the coordinate of target top left corner pixel point, wwith hrepresent width and the height of target.

2) extracting objects image block sample set

with background sample set

X_{t}^{1} = {x^{'} : | | l_{s} (x^{'}) - l_{s}^{*} | | \leq α, s = 1, t - Δt, . . ., t}

X_{t}^{0} = {x^{'} : γ < | | l_{s} (x^{'}) - l_{s}^{*} | | < β, s = t - Δt, . . ., t};

Wherein, the frame number that t is video, Δ t is the number of nearest frame, Δ t=2, l _s(x ') is illustrated in s two field picture, the position of image block x ',

the actual position that represents target, α, β and γ are sample radius, α=4, γ=8, β=30.

3) the multi-scale image compressive features x of each image block in extraction sample set.

3.1) target O and a series of multiple dimensioned rectangular filter are carried out to convolution

l_{i, j} (\tilde{x}, \tilde{y}) = s (\tilde{x}, \tilde{y}) * h_{i, j} (\tilde{x}, \tilde{y})

Wherein,

represent target O,

with

the pixel coordinate of target O,

r is vector space, and dimension of a vector space is w× h, be the size of target O, for multiple dimensioned rectangular filter, be defined as follows

In formula, i and j are rectangular filter label, i=1 ..., w, j=1 ..., h;

3.2) by filtered image array

be launched into column vector l _k, k=1 ..., r, r= w× h;

3.3) by column vector l _kconnect into the multi-scale image feature h of higher-dimension:

h = {(l_{1}^{T}, . . ., l_{r}^{T})}^{T}

In formula, h ∈ R ^m, m=r ², T represents vectorial transposition.

3.4) generate sparse stochastic matrix φ, φ ∈ R ^{n × m}, the element φ of φ _i,jfor:

φ_{i, j} = \sqrt{s} \times \{\begin{matrix} 1, & p = 1 / 2 s \\ 0, & p = 1 - 1 / s \\ - 1, & p = 1 / 2 s \end{matrix}

3.5) utilize sparse stochastic matrix φ to do projection to multi-scale image feature h,

x＝φh

Obtain multi-scale image compressive features x, x ∈ R ^{n × 1}.

4) read in next frame image.

5) utilize the position of Median-Flow algorithm estimation target in next frame:

5.1) by abstract target be 10 × 10 grid unique point, to each unique point, comprise this unique point with one, and the image block that size is 4 × 4 represents;

5.2) follow the tracks of these unique points by pyramid optical flow method, the pyramidal number of plies used is 5 layers;

5.3) calculate FB error and the NCC error of these unique points, wherein, the computing formula of FB error is as follows:

FB (T_{f}^{k} | S) = ED (T_{f}^{k}, T_{b}^{k})

representation feature point x _t+kbackward tracking k step,

T_{b}^{k} = ({\hat{x}}_{t}, {\hat{x}}_{t + 1}, . . ., {\hat{x}}_{t + k}),

k＝5；

5.4) the less unique point of tracking error of selection 50%, all the other unique points of 50% are used as exterior point and are rejected;

5.5) median of each Spatial Dimension of calculating residue character point is estimated the position of target, in addition, for every pair of unique point, calculate the ratio of the Euclidean distance in their Euclidean distance and former frame images in current frame image, and then calculating the mean value of these ratios, this mean value is the yardstick of target.

6) the target rough position of estimating according to Median-Flow algorithm extracts weak marking image piece sample set

X_{t + 1}^{w} = {x^{'} : < | | l_{s} (x^{'}) - l_{s}^{w} | | < α, s = t + 1}

Wherein, the frame number that t is video, l _s(x ') is illustrated in s two field picture, the position of image block x ',

the rough position of the target obtaining for Median-Flow track algorithm, α is sample radius, α=4.

7) the weak marker samples of extraction is concentrated the multi-scale image compressive features x of each image block, and concrete computation process is the same with step 3).

8) the training sample feature of step 3) and step 7) extraction is sorted, ordering rule is

in feature ordering higher than

with

in feature,

and

in number of samples, N ₀for training sample set

with

in number of samples sum.

9) study ranking functions F _t+1(x), target and background is separated, obtain the actual position of target, specific practice is as follows:

9.1) use with

in proper vector and sort criteria 8) training ranking functions F _t+1(x), first solve linear programming problem:

minL(α,ξ)＝Σ _iα _i+Σ _ijξ _ij

s.t.Σ _iα _i(K(x _i,x _u)-K(x _i,x _v))≥1-ξ _uv,α≥0,ξ≥0

In formula, u=1 ..., N ₁, v=N1+1 ..., N ₁+ N ₀, ξ is slack variable, K (. .) be kernel function, for linear kernel, K (x, z)=<x, z>;

9.2) obtaining the optimum solution α of linear programming problem ^*afterwards, ranking functions F _t+1(x) can be represented by the formula

F_{t + 1} (x) = Σ_{i} α_{i}^{*} K (x_{i}, x)

In formula, K (. .) be kernel function, i=1 ..., N ₁;

9.3) make ranking functions F _t+1(x) be worth maximum image block and be the actual position of target

l_{t + 1}^{*} = l (\arg \max F_{t + 1} (x)), x &Element; X_{t + 1}^{w}

In formula,

first to obtain the rough position of target in current frame image with Median-Flow track algorithm, and then the weak mark training sample set extracting, x is weak mark training sample set

represent the actual position of target in present frame.

10) according to actual position extracting objects image block sample set and the background sample set of target, and extract the multi-scale compress feature of each image block in sample set, this process is with step 2) and step 3).

11) determine whether video last frame, if whole algorithm flow finishes, words that no, forward step 4) to.

The foregoing is only preferred embodiment of the present invention, not with restriction the present invention, all any amendments of doing within the spirit and principles in the present invention, be equal to and replace and improvement etc., within all should being included in protection scope of the present invention.

Claims

1. the video tracing method based on sequence study, is characterized in that, comprises following steps:

4) build training sample set

with

it is the weak mark training sample set extracting from current frame image;

2. the video tracing method based on sequence study according to claim 1, is characterized in that, described step 1) is:

l_{i, j} (\tilde{x}, \tilde{y}) = s (\tilde{x}, \tilde{y}) * h_{i, j} (\tilde{x}, \tilde{y})

Wherein, represent target O,

with

the pixel coordinate of target O,

for multiple dimensioned rectangular filter, be defined as follows

In formula, i and j are rectangular filter label, i=1 ..., w, j=1 ..., h;

(2) by filtered image array

be launched into column vector l _k, k=1 ..., r, r= w× h;

h = {(l_{1}^{T}, . . ., l_{r}^{T})}^{T}

In formula, h ∈ R ^m, m=r ², T represents vectorial transposition.

3. the video tracing method based on sequence study according to claim 1, is characterized in that described step 2) be:

φ_{i, j} = \sqrt{s} \times \{\begin{matrix} 1, & p = 1 / 2 s \\ 0, & p = 1 - 1 / s \\ - 1, & p = 1 / 2 s \end{matrix}

x＝φh

Obtain multi-scale image compressive features x, x ∈ R ^{n × 1}.

4. the video tracing method based on sequence study according to claim 1, is characterized in that, described step 3) is:

FB (T_{f}^{k} | S) = ED (T_{f}^{k}, T_{b}^{k})

In formula, S is one section of video or image sequence, and ED represents Euclidean distance, the t frame that t is video, representation feature point x _tforward direction is followed the tracks of k step,

representation feature point x _t+kbackward tracking k step,

T_{b}^{k} = ({\hat{x}}_{t}, {\hat{x}}_{t + 1}, . . ., {\hat{x}}_{t + k});

5. the video tracing method based on sequence study according to claim 1, is characterized in that, described step 4) is:

Build training sample set

X_{t}^{1}, X_{t}^{1} = {x^{'} : | | l_{s} (x^{'}) - l_{s}^{*} | | \leq α, s = 1, t - Δt, . . ., t};

Build training sample set

X_{t}^{0}, X_{t}^{0} = {x^{'} : γ < | | l_{s} (x^{'}) - l_{s}^{*} | | < β, s = t - Δt, . . ., t};

Build weak mark training sample set

X_{t + 1}^{w}, X_{t + 1}^{w} = {x^{'} : < | | l_{s} (x^{'}) - l_{s}^{w} | | < α, s = t + 1};

represent the actual position of target O, the rough position of the target O obtaining for Median-Flow track algorithm, α, β and γ are sample radius.

6. the video tracing method based on sequence study according to claim 1, is characterized in that, described step 5) is:

(1) extract mark training sample set

with

in the multi-scale image compressive features x of each image block;

(2) establish training sample set in feature ordering higher than

with

in feature ordering,

and

in number of samples, N ₀for training sample set

with

in number of samples sum;

(3) according to the condition of setting in step (2), use training sample set

with

minL(α,ξ)＝Σ _iα _i+Σ _ijξ _ij

s.t.Σ _iα _i(K(x _i,x _u)-K(x _i,x _v))≥1-ξ _uv,α≥0,ξ≥0

F_{t + 1} (x) = Σ_{i} α_{i}^{*} K (x_{i}, x)

In formula, K (. .) be kernel function, i=1 ..., N ₁;

l_{t + 1}^{*} = l (\arg \max F_{t + 1} (x)), x &Element; X_{t + 1}^{w}

In formula, first to obtain the rough position of target O in current frame image with Median-Flow track algorithm, and then the weak mark training sample set extracting, x is weak mark training sample set

in the multi-scale image compressive features of image block, l (x) represents the position of the image block that multi-scale image compressive features x is corresponding, represent the actual position of target in present frame.