CN102999921A

CN102999921A - Pixel label propagation method based on directional tracing windows

Info

Publication number: CN102999921A
Application number: CN2012104524331A
Authority: CN
Inventors: 钟凡; 秦学英; 彭群生; 孟祥旭
Original assignee: Shandong University
Current assignee: Shandong University
Priority date: 2012-11-09
Filing date: 2012-11-09
Publication date: 2013-03-27
Anticipated expiration: 2032-11-09
Also published as: CN102999921B

Abstract

The invention discloses a pixel label propagation method based on directional tracing windows. The method includes the steps of firstly, determining a region to be labeled in a target image; secondly, setting tracing windows in the target image and allowing the tracing windows to cover the region to be labeled; thirdly, establishing a Gaussian mixture model for each label by applying pixels in the input image, covered by the tracing windows, as samples; fourthly, calculating density in probability of belonging to each label for each pixel, to be labeled and covered by the tracing window; and fifthly, calculating confidence in probability estimated by the tracing windows for each pixel to be labeled; sixthly, processing all the tracing windows in all directions; and seventhly, determining a label belonging to each pixel to be labeled according to the probability and confidence of belonging to the label for the pixel to be labeled, covered by each tracing window. The method has the advantages that spatial context is effectively utilized, and errors caused by ambiguous features are reduced.

Description

Pixel label transmission method based on the directivity tracking window

Technical field

The present invention relates to a kind of pixel label transmission method, relate in particular to a kind of pixel label transmission method based on the directivity tracking window.

Background technology

It is Video processing that label between frame of video is propagated, especially common problem in the video editing.Label can represent the result of Video processing usually, and label is propagated the result can be understood as known a certain frame, the process that the result of other frame is found the solution.Such as in area tracking and foreground segmentation, the user can be by the mutual result who obtains a certain frame, and the recycling label is propagated the result who obtains other frame of video.Label is propagated the following three kinds of methods that usually adopt:

1, based on the method for images match

At first the image of the frame of incoming frame and target is carried out registration based on the method for images match, copy the pixel label of incoming frame to target frame according to the corresponding relation of pixel again.Therefore, the label transmission method of this class is equivalent to and carries out images match.Images match is the classical problem in the computer vision, generally adopts optical flow tracking to carry out.Because block, the impact of edge fog etc., accurate images match is difficult to obtain.Be not suitable for the flat site of image based on the optical flow approach of local feature, and very sensitive to the video uncontinuity that causes such as blocking based on the method for global optimization.Therefore, although the label propagation can be equivalent to images match in theory, in fact these class methods are less by independent utility, all are to be used for obtaining an initial result usually.

2, based on the method for global classification device

At first to its feature of each pixel extraction, and according to distance and the syntople of pixel at feature space, finish the propagation of label at feature space based on the method for global classification device.So-called global classification device refers to that all pixels of target frame are all shared same sorter, and with the location independent of pixel.An exemplary of these class methods is based on the Video segmentation that global color distributes, the method with pixel color as feature, at first take the prospect of known label and background pixel color as sample, acquisition prospect and background are classified to unknown pixel based on distribution function at the distribution function of color space again.The spatial relation of having ignored pixel based on the method for global classification device, and directly at feature space label is propagated, this is so that it has ambiguous zone in feature is easy to make mistakes, such as in the prospect zone similar with background color, the methods of video segmentation that distributes based on global color can produce a large amount of mistakes.But, owing to ignored the spatial relation of pixel, and can sample in a big way, also so that the method that distributes based on global characteristics can be processed time discontinuity in the video (namely owing to the new region that blocks, change in topology, rapid movement etc. cause) preferably.

3, based on the method for local classifiers

The new RotoBrush instrument of introducing has adopted local classifiers to carry out the label propagation and its objective is in order to overcome the global classification device to have the shortcoming of makeing mistakes easily in ambiguous zone in feature in Adobe After Effects 5, different from the global classification device, a regional area of each local classifiers coverage goal image, the sample of training local classifiers then comes from the corresponding region of input picture.This is actually a kind of mode of utilizing the pixel space position relationship.On the other hand, because the zone that local classifiers covers is more much smaller than global classification device, so the feature distribution is also comparatively simple, thereby has further reduced its possibility of makeing mistakes.

Technical matters solved by the invention is different from common vision and follows the tracks of, a kind of visual tracking method of non-parametric model, No.200910080381.8 and video object mark real-time multi-target marker and centroid calculation method, No.200510047785.9, and the multi-characteristic points tracking of a kind of micro-sequence image of feature point tracking, No.201010516768.6.Vision tracking and target label all can be summed up as the mark problem to the zone, and pixel label is propagated and need to be carried out mark to each pixel, and be therefore more tight with the Video segmentation relation.The present invention also can be directly used in Video segmentation.Feature point tracking belongs to the method for images match, but only processes the pixel that small part is easy to follow the tracks of in the video, can not be used to pixel label and propagate.Directivity window of the present invention mainly is in order to utilize better color distribution, therefore the difference of essence to be arranged with signature tracking and images match.

A committed step that adopts local classifiers is the overlay area of each sorter of definition, i.e. tracking window.Tracking window is larger, and the feature in each window distributes just more complicated, and it is just larger to comprise the possibility of ambiguous feature, and this will cause and the similar problem of global classification device; Tracking window is less, then can be better to the robustness of ambiguous feature, but can cause simultaneously the partial discontinuous between frame of video responsive, at rapid movement with emerging zone is easier makes mistakes; It is square or circular tracking window that existing local classifiers all adopts the tracking window of regular shape, but is difficult to obtain gratifying effect when facing at the same time ambiguous feature and interframe discontinuous problem; Directivity tracking window disclosed in this invention will help to overcome this shortcoming of local classifiers.

Summary of the invention

Purpose of the present invention is exactly to provide a kind of pixel label transmission method based on the directivity tracking window in order to address the above problem, and it has and effectively utilizes the space hereinafter to concern and reduce the advantage that is led to errors by the ambiguous feature.

To achieve these goals, the present invention adopts following technical scheme:

A kind of pixel label transmission method based on the directivity tracking window, concrete steps are

Step 1: treating in the input picture propagated regional expansion 30-70 pixel, and the result is as the zone to be marked in the target image;

Step 2: to the direction of all appointments, in target image, arrange tracking window along each direction, make the tracking window on each direction cover zone to be marked fully;

Step 3: to each tracking window, the pixel that covers in incoming frame take tracking window is as sample, take pixel color as feature, each label L is set up corresponding gauss hybrid models p (x|L) to represent its color distribution, x is the color of pixel to be marked; Described label represents the mark of class that pixel is divided into, and each class is come mark with a label;

Step 4: to each tracking window, calculate the probability that its each pixel to be marked that covers belongs to every kind of label;

Step 5: to each tracking window, calculate it to the degree of confidence of the estimated probability of each pixel to be marked;

Step 6: process successively all tracking windows on all directions;

Step 7: to each pixel to be marked, record covers all tracking windows of this pixel to its probability that calculates and degree of confidence, and the probability of being exported with the window of degree of confidence maximum is determined the label of this pixel.

The width of described tracking window is determined, adjustable length tracking window, and the width of described tracking window is W pixel.

The concrete steps of step 2 are:

(2-1) at first arrange horizontal tracking window, the top-down the first row that comprises zone to be marked that scans is designated as r ₀Respectively with r ₀Row and r ₀The top and bottom of first tracking window of+W-1 behavior; Calculate initial row and the end column of pixel to be marked in these row, i.e. from left to right scanning, comprise the initial row of classifying as of first pixel to be marked, comprise the end column of classifying as of last pixel to be marked, and be made as respectively left end and the right-hand member of first tracking window; With r ₀The initial row of the 2nd tracking window of+2W/3 behavior is with r ₀+ 2 (k-1) W/3 is the initial row (overlapping region that W/3 is arranged between the adjacent tracking window) of k tracking window, adopts and arranges in the same way follow-up tracking window, until all pixels to be marked are covered fully, k is natural number;

(2-2) to other any direction θ, the θ degree that can first target image be turned clockwise is arranged tracking window according to the method for arranging horizontal tracking window in the step (2-1), target image is rotated counterclockwise the θ degree again, obtains the tracking window on the θ direction.

The concrete form of the gauss hybrid models p (x|L) of step 3 is

Wherein N is normal distribution, π _k, σ _kBe respectively its average and variance, ω _kBe the weight of k item, K is the number of Gauss's item, generally is taken as between the 3-5 parameter π _k, σ _k, ω _kCan maximize by expectation value (Expectation-Maxmization) algorithm obtains.

The concrete steps of step 4 are:

(4-1) pixel color of note tracking window internal label l is distributed as p (x|L=l), is calculated by the gauss hybrid models of step 3 gained;

The number of (4-2) establishing label is M, and then pixel i to be marked belongs to the probability of label l and is:

p (x_{i}) = \frac{p (x_{i} | L = l)}{Σ_{j = 1}^{M} p (x_{i} | L = j)}

The concrete grammar of step 5 is: the estimated degree of confidence that belongs to tracking window internal label l probability of each pixel i to be marked that tracking window covers is:

c (x_{i}) = \frac{p_{\max} (x_{i}) - p_{\min} (x_{i})}{p_{\max} (x_{i}) + p_{\min} (x_{i}) + ϵ}

P wherein _Max(x _i) and p _Min(x _i) be respectively p (x _i| L=j), j=1 ..., the maximal value among the M and minimum value, j refers to certain label in the tracking window; ε is constant, and value is 1e-3 usually; When the probability density maximal value of pixel i and minimum value are all very large or all very little, it will be invested lower degree of confidence, pixel i belongs to the situation of the very large color similarity corresponding to the tracking window internal label of the probability density of each label in the tracking window, and that pixel i belongs in the tracking window probability density of each label is very little of the time discontinuity zone, can not find related sample in incoming frame;

The concrete grammar of step 7 is: each tracking window can be to its Probability p (x who belongs to each label l of pixel output that covers _i| L=l) with degree of confidence c (x _i), note p ' (x _i| L=l) be a corresponding probability of degree of confidence maximum in all tracking windows that cover pixel i, then the label of pixel i is

J refers to certain label in the tracking window, namely comes marked pixels i with the corresponding label of most probable value.

Beneficial effect of the present invention: the present invention adopts rectangular tracking window, when guaranteeing enough large spans, has kept again the area coverage of less.Enough large span can have the image related information that utilizes apart from each other, thereby realizes having the processing of interframe uncontinuity; And less area coverage reduces the ambiguous feature, and can keep relatively simple feature to distribute, thereby reduces the error rate of sorter.Tracking window is arranged along different directions, can realize the effective processing to different directions motion and difformity zone to be marked, and can more effectively utilize the space hereinafter to concern, further reduces the mistake that is caused by the ambiguous feature.

Description of drawings

Fig. 1 is cut apart synoptic diagram for the video foreground of propagating based on pixel label, and wherein question mark represents segmentation result to be found the solution;

Fig. 2 is to be the synoptic diagram of traditional square tracking window;

Fig. 3 (a) is the horizontal tracking window of directivity provided by the invention;

Fig. 3 (b) is directivity 45 degree tracking windows provided by the invention;

Fig. 3 (c) is the tracking window of the different directions centered by same pixel provided by the invention;

Fig. 4 (a) comes discontinuous synoptic diagram between processed frame for the present invention utilizes the image association of long distance;

Fig. 4 (b) processes the best tracking window synoptic diagram at different situations and each place, place, position of target image for the present invention utilizes directivity;

Fig. 5 (a) is the synoptic diagram that horizontal tracking window acquired results merges;

Fig. 5 (b) is the synoptic diagram that 45 ° of tracking window acquired results merge;

Fig. 5 (c) is the synoptic diagram that 90 ° of tracking window acquired results merge;

Fig. 5 (d) is the synoptic diagram that 135 ° of tracking window acquired results merge;

Fig. 5 (e) is the synoptic diagram of the final output of tracking window acquired results on all directions;

Fig. 5 (f) is the direction synoptic diagram of selecting at each pixel place;

Fig. 6 (a) is the incoming frame image;

Fig. 6 (b) is the target frame image;

Fig. 6 (c) is used for the design sketch of video matting for square window;

Fig. 6 (d) is used for the design sketch of video matting for the directivity tracking window.

Embodiment

The invention will be further described below in conjunction with accompanying drawing and embodiment.

As shown in Figure 1, the below is take Video segmentation as example, and wherein question mark represents result to be found the solution, and the invention will be further described by reference to the accompanying drawings.

Based on the Video segmentation that label is propagated, the key issue that needs to solve is to utilize the segmentation result of former frame (incoming frame), and present frame (target frame) is cut apart.Segmentation result represents that with bianry image the foreground pixel value is 255, and background pixel value is 0.The main difficulty that faces in this process is color similarity before overcoming, between background and the contradiction of video time uncontinuity.The method that employing directivity tracking window carries out front background segment is as follows:

1) with the segmentation result expansion 30-70 pixel (can suitably adjust according to foreground moving speed) of incoming frame, as the zone to be marked in the target image, other zone is considered to background with the foreground area after expanding;

2) width of setting tracking window is W pixel (W generally is taken as 15), at first arranges horizontal tracking window, and concrete grammar is: the top-down the first row that comprises zone to be marked that scans is designated as r ₀Respectively with r ₀Row and r ₀The top and bottom of first tracking window of+W-1 behavior; Calculate the starting and ending row of pixel to be marked in these row, and be made as respectively left end and the right-hand member of first tracking window; With r ₀The initial row of the 2nd tracking window of+2W/3 behavior is with r ₀+ 2 (k-1) W/3 is the initial row (overlapping region that W/3 is arranged between the adjacent tracking window) of k tracking window, adopts and arranges in the same way follow-up tracking window, until all pixels to be marked are covered fully; Resulting horizontal tracking window is shown in Fig. 3 (a);

3) target image is turned clockwise 45 degree are with 2) method arrange horizontal tracking window, be rotated counterclockwise afterwards 45 degree, obtain the tracking windows on the 45 degree directions, the result is as shown in Figure 3 (b);

4) with 3) mode arrange 90 degree and 135 tracking window such as Fig. 3 (c) that spend;

5) to each tracking window, respectively take the RGB color of its prospect that in incoming frame, covers and background pixel as sample, gauss hybrid models (the Gaussian Mixtu re Model that training prospect and background color distribute, GMM) p (x|F) and p (x|B), wherein x is the color of pixel to be marked;

6) to each tracking window, calculate the probability density p (x of each pixel i to be marked in prospect and background GMM that it covers _i| F) and p (x _i| B), and then calculate the probability that it belongs to prospect:

p (x_{i}) = \frac{p (x_{i} | F)}{p (x_{i} | F) + p (x_{i} | B)}

7) to each tracking window, calculate it to the degree of confidence of the estimated probability of each pixel i to be marked:

c (x_{i}) = \frac{| p (x_{i} | F) - p (x_{i} | B) |}{p (x_{i} | F) + p (x_{i} | B) + ϵ}

Wherein ε is a constant, usually can be taken as 1e-3.Following formula represents, when the probability density of color in prospect and background distributions of pixel i is all very large or all very little, it will be invested lower degree of confidence, the color of pixel i is all very large corresponding to the prospect situation similar with background color in the probability density of prospect and background, and the color of pixel i is all very little of the time discontinuity zone in the probability density of prospect and background, can not find related sample in incoming frame;

8) process successively all tracking windows on all directions;

9) because each pixel can be covered by a plurality of tracking windows, the sorter of each window can be exported a Probability p (x who belongs to prospect to pixel _i) and degree of confidence c (x _i), therefore need to therefrom select the output that conduct is final, as shown in Figure 5.Note p ' (x _i) be a corresponding probability of degree of confidence maximum in all tracking windows that cover pixel i, if p ' (x then _i)＞0.5 then is labeled as prospect with pixel i; Otherwise it is labeled as background.

Fig. 4 (a) is that rectangle is followed the tracks of the image association how to utilize long distance and come between processed frame discontinuously, and how Fig. 4 (b) is for utilizing directivity to process the synoptic diagram of different situations, and Fig. 4 (b) is depicted as the best tracking window at each place, place, position.

Synoptic diagram to different directions tracking window acquired results merges such as Fig. 5 (a), Fig. 5 (b), Fig. 5 (c), Fig. 5 (d), shown in Fig. 5 (e), is the synoptic diagram of the final output of the tracking window acquired results on all directions; Be depicted as the direction synoptic diagram of selecting at each pixel place such as Fig. 5 (f);

Rectangle tracking window and square window are used for the Contrast on effect of video matting, the rectangle window can correctly identify the new region between two legs, square window then with this zone errors be identified as prospect, shown in Fig. 6 (a), Fig. 6 (b), Fig. 6 (c), Fig. 6 (d).

As shown in Figure 2, be traditional square tracking window, with and the problem that between processing video frames, exists during uncontinuity.

Although above-mentionedly by reference to the accompanying drawings the specific embodiment of the present invention is described; but be not limiting the scope of the invention; one of ordinary skill in the art should be understood that; on the basis of technical scheme of the present invention, those skilled in the art do not need to pay various modifications that creative work can make or distortion still in protection scope of the present invention.

Claims

1. pixel label transmission method based on the directivity tracking window is characterized in that concrete steps are:

Step 3: to each tracking window, the pixel that covers in incoming frame take tracking window take pixel color as feature, is set up corresponding gauss hybrid models p (x|L) to represent its color distribution to each label L as sample;

Step 6: process successively all tracking windows on all directions;

2. as claimed in claim 1 based on the pixel label transmission method of directivity tracking window, it is characterized in that the width of described tracking window is determined, adjustable length directivity window, the width of described tracking window is W pixel.

3. as claimed in claim 1 based on the pixel label transmission method of directivity tracking window, it is characterized in that the concrete steps of described step 2 are:

(2-1) at first arrange horizontal tracking window, the top-down the first row that comprises zone to be marked that scans is designated as r ₀Respectively with r ₀Row and r ₀The top and bottom of first tracking window of+W-1 behavior; Calculate initial row and the end column of pixel to be marked in these row, i.e. from left to right scanning, comprise the initial row of classifying as of first pixel to be marked, comprise the end column of classifying as of last pixel to be marked, and be made as respectively left end and the right-hand member of first tracking window; With r ₀The initial row of the 2nd tracking window of+2W/3 behavior is with r ₀+ 2 (k-1) W/3 is the initial row of k tracking window, and the overlapping region of W/3 is arranged between the adjacent tracking window, adopts and arranges in the same way follow-up tracking window, until all pixels to be marked are covered fully, k is natural number;

3. as claimed in claim 1 based on the pixel label transmission method of directivity tracking window, it is characterized in that the concrete form of the gauss hybrid models p (x|L) of described step 3 is

Wherein N is normal distribution, π _k, σ _kBe respectively its average and variance, ω _kBe the weight of k item, K is the number of Gauss's item, parameter π _k, σ _k, ω _kAll obtain by expectation value maximization algorithm.

4. as claimed in claim 1 based on the pixel label transmission method of directivity tracking window, it is characterized in that the concrete steps of described step 4 are:

(4-1) gauss hybrid models of the pixel color distribution of note tracking window internal label l is p (x|L=l);

p (x_{i}) = \frac{p (x_{i} | L = l)}{Σ_{j = 1}^{M} p (x_{i} | L = j)}

5. as claimed in claim 1 based on the pixel label transmission method of directivity tracking window, it is characterized in that the concrete grammar of described step 5 is: the estimated degree of confidence that belongs to tracking window internal label l probability of each pixel i to be marked that tracking window covers is:

c (x_{i}) = \frac{p_{\max} (x_{i}) - p_{\min} (x_{i})}{p_{\max} (x_{i}) + p_{\min} (x_{i}) + ϵ}

P wherein _Max(x _i) and p _Min(x _i) be respectively p (x _i| L=j), j=1 ..., the maximal value among the M and minimum value; ε is constant, and value is 1e-3 usually; When the probability density maximal value of pixel i and minimum value are all very large or all very little, it will be invested lower degree of confidence, pixel i belongs to the situation of the very large color similarity corresponding to the tracking window internal label of the probability density of each label in the tracking window, and that pixel i belongs in the tracking window probability density of each label is very little of the time discontinuity zone, can not find related sample in incoming frame.

6. as claimed in claim 1 based on the pixel label transmission method of directivity tracking window, it is characterized in that the concrete grammar of described step 7 is: each tracking window can be to its Probability p (x who belongs to each label l of pixel output that covers _i| L=l) with degree of confidence c (x _i), note p ' (x _i| L=l) be a corresponding probability of degree of confidence maximum in all tracking windows that cover pixel i, then the label of pixel i is

Namely come marked pixels i with the corresponding label of most probable value.

7. as claimed in claim 1 based on the pixel label transmission method of directivity tracking window, it is characterized in that described unidirectional window is parallel to each other.

8. as claimed in claim 1 based on the pixel label transmission method of directivity tracking window, it is characterized in that, very close to each other and certain overlapping region arranged between the described unidirectional adjacent window apertures.