CN103400386B

CN103400386B - A kind of Interactive Image Processing method in video

Info

Publication number: CN103400386B
Application number: CN201310326815.4A
Authority: CN
Inventors: 王好谦; 邓博雯; 张永兵; 戴琼海
Original assignee: Shenzhen Graduate School Tsinghua University
Current assignee: Shenzhen Graduate School Tsinghua University
Priority date: 2013-07-30
Filing date: 2013-07-30
Publication date: 2016-08-31
Anticipated expiration: 2033-07-30
Also published as: CN103400386A

Abstract

The invention provides a kind of Interactive Image Processing method, the method extracts key frame in the video sequence, and the frame superposition one by one between adjacent key frame formed be used for the key frame bunch of interactive markup, with this, image of described key frame is divided into foreground area, background area and zone of ignorance；Then described key frame is carried out spectral clustering and α value is estimated, it is thus achieved that the stingy figure result of described key frame；Finally, the stingy figure result of described key frame is transferred to whole video sequence, obtains final stingy figure result.Because being background dot before interactive markup on each key frame bunch, taking full advantage of the space time information that video sequence is comprised, user interface based on key frame ensure that the mutual naturality of user and intuitive, meets the observation habit of human visual system；The labelling replaced individually on key frame by labelling on key frame bunch, for foreground object activities bigger in the case of have stronger robustness.

Description

A kind of Interactive Image Processing method in video

Technical field

The present invention relates to image processing field, particularly to a kind of interactive image based on image blurring Connected degree Processing method.

Background technology

Digital Matting is that a kind of logical too small amount of user is mutual, by foreground object exactly from image or regard The technical process extracted in frequency sequence.Digital matting is the image processing field such as photo editing or production of film and TV The key technology on basis, has just obtained paying close attention to widely and studying at the beginning of computer image processing technology rises.

Stingy diagram technology is exactly that image is intactly divided into foreground area (F) and background area (B) in simple terms, Wherein need process is color vector C of the pixel p of zone of ignorance (C)_p, color vector C_pBy foreground zone Territory pixel F_p, background area pixels B_pWith transparency parameter α_pLinear composition:

C_p=α_pF_p+(1-α_p)B_p,

Wherein α_p∈ [0,1], 0 represents background, and 1 represents prospect.For most of natural pictures, F and B is specifically worth all without being confined to some, and α, F, B value of each pixel is unknown undetermined.Right It is color vector C of its three-dimensional in some pixel information known to us_p, unknown information is F_p、B_pWith α_p, so scratching figure process is a non-binding problem being sought seven unknown quantitys by three known quantities.

Single image is scratched diagram technology and is developed so far and has produced the most different algorithms, the such as overall situation method of sampling, KNN (closest Node Algorithm) method, LargeKernel method, Nonlocal method, PSF (Point-spread Function, point spread function) method, Shared scratch drawing method etc., either stingy figure result accuracy, Marked improvement in various degree is suffered from MSE performance test or algorithm speed.But then, to video The stingy figure more challenge of the stingy figure phase comparison single image of sequence: the big data quantity of video sequence, every two field picture The fluency of edge treated, to influence factors such as the adaptabilities of object significantly action at the stingy figure to single image All without considering in technology.Current video matting algorithm have Bayes video scratch drawing system, based on The video matting of Rotoscoping, video based on Graphcut clip and paste system, Snapcut etc., but these are calculated In method, or stingy figure result is caused to be satisfied with owing to estimation itself exists bigger error, or for The robustness of foreground object motion is not strong, or the algorithm amount of calculation used is excessive, it is impossible to meet video sequence big The requirement of scale data stream, or the information that needs user to input is the most complicated and directly perceived, system need through Professional's operation of training, practicality is the strongest.

Summary of the invention

The present invention is on the basis of forefathers study, and combining video sequences is scratched and schemed distinctive space-time three-dimensional information, proposes A kind of fast video scratches drawing method: the frame between front and back's key frame is superimposed as key frame bunch, at key frame bunch On carry out user's interactive markup, it is ensured that naturality that user is mutual and intuitive, and for foreground object office The situation that portion's action is bigger has stronger robustness.(Spectral Clustering is subspace also to use standardization spectral clustering The one of learning algorithm, spectral clustering is set up on the basis of spectrogram reason, compared with traditional clustering algorithm, It has the advantage that can cluster and converge on globally optimal solution on the sample space of arbitrary shape.) complete cluster Process, is extended in the three dimensions of video sequence complete α value by traditional fuzzy connectedness segmentation and estimates. Use SURF (Speeded Up Robust Features) detection coupling search window the most further, by key frame Stingy figure result be transferred to whole video sequence, substantially reduce the time complexity of algorithm.Meanwhile, always according to In video sequence, the different motion situation of foreground object devises the video flowing of a kind of adaptively selected direction of propagation Assignment method so that algorithm can have for different types of foreground object sport video preferably scratches figure result.

The technical problem to be solved is: overcome the defect of prior art, it is provided that a kind of user is the simplest List is directly perceived, robustness is relatively strong and scratches the Interactive Image Processing method in video that figure is effective, the method Extract key frame in the video sequence, and the frame superposition one by one formation between adjacent key frame is used for interactive markup Key frame bunch, with this, image of described key frame is divided into foreground area, background area and zone of ignorance； Then described key frame is carried out spectral clustering and α value is estimated, it is thus achieved that the stingy figure result of described key frame；Finally, The stingy figure result of described key frame is transferred to whole video sequence, obtains final stingy figure result.

According to embodiment, the present invention also can use following preferred technical scheme:

Described the stingy figure result of described key frame is transferred to whole video sequence includes: a. detects at key frame The edge line of foreground object, b. travels through described edge line and arranges several search windows, c. SURF characteristic point The described search window of frame coupling before and after detection method searching labelling, d. is by search window corresponding for described key frame Stingy figure result be sequentially assigned to the search window that other intermediate frames are corresponding.

In described step d, if prospect is single or separate moving object, then the video flowing of assignment Both direction is propagated forwards, backwards；If prospect is to have multiple objects of relative motion in video sequence, then assignment Video flowing is only propagated from front to back.

Described SURF feature point detecting method comprises the steps:

1) integrogram is calculated；

2) Hessian matrix is built；

3) metric space is built；

4) location feature point；

5) rotation principal direction is determined；

6) calculate SURF characteristic point and describe son.

Described extraction key frame is to extract a frame as key frame every 10～20 frames.

Described spectral clustering uses standardization Spectral Clustering.

The step of described standardization spectral clustering is as follows:

1) similarity matrix W is built by raw data set,

w_i,_jFor representing the similarity between data；

2) each column element of W is added to obtain N number of number, build one by this paired linea angulata of N number of array,

Other elements are all the N N matrix D of 0,

3) the similarity matrix W built by raw data set builds Laplacian Matrix L=D-W,To obtain after L standardizationSolve front k the feature of L ' ValueAnd characteristic of correspondence vectorWillIt is multiplied by one by oneObtain matrix L Front k characteristic vector；

4) k characteristic vector line up composition one N × k matrix, regard each of which row as k A vector in dimension space, clusters；

Wherein, w_i,_jFor representing the similarity between two pixel numbers evidences, i, j represent two different pictures Vegetarian refreshments, N represents the data amount check on diagonal, and k represents characteristic vector number.

The selection of described k uses heuristic, i.e. the least to the eigenvalue of m-th from the 1st, from The m+1 beginning, eigenvalue has the change on the order of magnitude big, then k takes m.

Described step 4) in cluster use K-means algorithm or Mean-shift algorithm or GMM algorithm.

Described α value is estimated to use α value method of estimation based on three-dimensional fuzzy connectedness, is used for calculating unknown area Fuzzy connectedness between territory and known region pixel, and obtain according to basic figure formula C=α F+ (1-α) B that scratches Stingy figure result to key frame foreground object.

The present invention is compared with the prior art and provides the benefit that: because being on each key frame bunch before interactive markup Background dot, takes full advantage of the space time information that video sequence is comprised, and user interface based on key frame ensure that Naturality that user is mutual and intuitive, meet the observation habit of human visual system, in combination with video flowing Temporal information, superposition forms key frame bunch, is replaced individually on key frame by labelling on key frame bunch Labelling, for foreground object activities bigger in the case of have stronger robustness.

In one preferred technical scheme, also include the step that described stingy figure result is transferred to whole video sequence Suddenly, detecting foreground object edges including a. at key frame, b. arranges search window, c. SURF along described edge The described search window of frame coupling before and after feature point detecting method searching labelling.Owing to SURF detection method employs long-pending The summation process repeatedly that component is loaded down with trivial details in replacing general rectangular area, subtracts on the basis of ensureing matching performance significantly Lack amount of calculation, accelerate processing speed.

In one further preferred technical scheme, described stingy figure result is transferred to the step of whole video sequence In, if prospect is single or separate moving object, then the video flowing of assignment both direction forwards, backwards Propagate, be can ensure that by two-way propagation the movable information of foreground object is transferred to whole video sequence by more complete Row；If prospect is to have multiple objects of relative motion in video sequence, then the video flowing of assignment is the most from front to back Propagate, can avoid when propagating stingy figure information owing to the overlap during relative motion of object produces the reverse of mistake Propagation information.

Accompanying drawing explanation

Fig. 1 is the use procedural block diagram of an embodiment of processing method of the present invention.

Fig. 2 is that graph key frame extracts schematic diagram.

Fig. 3 is to connect 6 adjacent pixels of pixel in video sequence to set up the schematic diagram of three-dimensional space-time model.

Fig. 4 is the stingy figure result schematic diagram of a pixel of the zone of ignorance of an embodiment.

Fig. 5 is that in an embodiment, the video flowing of assignment only allows the schematic diagram along time orientation forward-propagating.

Fig. 6 is that in another embodiment, the video flowing of assignment can pass along positive and negative both direction from key frame The schematic diagram broadcast.

Detailed description of the invention

The mutual simple, intuitive of a kind of user, robustness are relatively strong and scratch the interactive map in video that figure is effective As processing method, the method is mainly passed through to extract key frame in the video sequence, and by between adjacent key frame Frame superposition one by one forms the key frame bunch for interactive markup, with this, foreground and background is separated acquisition and scratches figure knot Really.Its basic process and use flow process are represented by FB(flow block) as shown in Figure 1, are included in input video Carrying out key frame superposition step after sequence, the spectral clustering step after user's mutual (input), based on fuzzy connection The step that degree α value is estimated, SURF finds the step of corresponding search window, and the step of adaptive video stream assignment Suddenly.

Below against accompanying drawing with combine preferred embodiment one preferred embodiment of the present invention is carried out in detail Thin elaboration.

1. key frame superposition

Input original video sequence, key frame in abstraction sequence, by the frame superposition one by one between adjacent key frame, Forming some groups of key frames bunch, user, on each key frame bunch before interactive markup, background dot, makes full use of and regards The space time information that frequency sequence is comprised, user interface based on key frame ensure that the mutual naturality of user and straight The property seen, meets the observation habit of human visual system, and in combination with the temporal information of video flowing, superposition is formed Key frame bunch, the labelling replaced individually on key frame by the labelling on key frame bunch, for foreground object The situation that activities is bigger has stronger robustness.It is described as follows:

Input video sequence, extracts key frame, and the key frame used in this algorithm is to extract a frame every 10 frames, As shown in Figure 2.Those skilled in the art can adjust according to the motion conditions of foreground object and extract between key frame Every, typically can select between 10～20 frames, all be capable of the goal of the invention of the present invention.Such as, front In the case of scape object motion is little, a frame can be extracted every 20 frames.By the frame between front and back's key frame one by one Superposition, forms a series of key frame bunch, such as Fig. 2, before user marks on each key frame bunch, background dot, After user is mutual, each key frame images can be divided into three regions: foreground area, background area And zone of ignorance, the main target of follow-up stingy figure i.e. determines that the distribution of color of zone of ignorance pixel.

2. standardization spectral clustering

Use standardization spectral clustering that key frame picture is clustered after aforementioned key frame superposition, standardization Spectral clustering utilizes the similar matrix of sample data to carry out feature decomposition, then carries out by the characteristic vector obtained Cluster, it is only necessary to the similarity matrix of image data just can complete cluster, with the characteristic vector unit of data usually Represent original data, serve important dimensionality reduction effect, it is possible to identify sample space and the convergence of arbitrary shape In globally optimal solution, and computation complexity is less than general clustering algorithm, shows particularly evident on high dimensional data.

Specifically comprise the following steps that

1) becoming figure G=(V, E) according to data configuration, wherein V and E represents vertex set and the limit of figure G respectively Collection, the corresponding data point of each pixel of figure G.Similar point is coupled together, it is assumed that the two of limit e Individual different pixels point is i and j, and weight is w_i,j, w_i,jFor representing the similarity between two pixel numbers evidences. Define according to similarity, raw data set build similarity matrix W,

W_{i, j} = \{\begin{matrix} w_{i, j}, i &NotEqual; j \\ 0, i = j \end{matrix},

2) each column element of similarity matrix W is added to obtain N number of number, builds one and become by this N number of array Diagonal, other elements are all the N N matrix D of 0,

D_{i, j} = \{\begin{matrix} Σ_{j = 1}^{n} w_{i, j}, i = j \\ 0, i &NotEqual; j \end{matrix},

3) Laplacian Matrix L=D-W is built by similarity matrix,

L_{i, j} = \{\begin{matrix} Σ_{j = 1}^{n} w_{i, j}, i = j \\ - w_{i, j}, i &NotEqual; j \end{matrix},

To obtain after L standardization

L^{'} = D^{- \frac{1}{2}} {LD}^{- \frac{1}{2}},

Solve front k (arrangement from small to large) eigenvalue of L 'And characteristic of correspondence vector WillIt is multiplied by one by oneObtain front k the characteristic vector of matrix L.Wherein, every two field picture is come by N and k Saying it is fixed value, N is the data amount check on diagonal, and k is characteristic vector number.

One preferred way is: select the process of k to use heuristic, if i.e. from the 1st to m Individual eigenvalue is the least, and starting eigenvalue from the m+1 has the change the order of magnitude big, then k takes m.

4) k characteristic vector line up composition one N × k matrix, each of which row is regarded as k dimension A vector in space, and use general clustering algorithm to cluster, such as K-means algorithm etc., cluster Pixel generic in every classification i.e. artwork G belonging to a line in result.

It should be understood that use general spectral clustering can realize the goal of the invention of the present invention, this step In 2, accepted standard spectral clustering is to calculate and/or obtain superior technique effect for convenience, namely 3) standardization in is selectable operation.

3. α value based on three-dimensional fuzzy connectedness is estimated

In video sequence, each pixel of each key frame has 6 neighborhood pixels: in including 4 same frames Neighborhood pixels spatially, temporal neighborhood pixels between frame before and after 2.Thus construct video sequence Three-dimensional space-time model, calculate the fuzzy connectedness between zone of ignorance and known region pixel on this basis, The maximum of a section that i.e. in point-to-point access path, similarity is minimum, this company being similar to " wooden barrel short slab " Degree of connecing design makes after having calculated zone of ignorance any arrives the fuzzy connectedness of known region, its of the same area He puts the result of calculation before the Connected degree of known region can be continued to use, thus can greatly reduce amount of calculation and algorithm Required time.

Specifically: in video sequence, connect 6 neighborhood pixels of each pixel, set up three-dimensional space-time model such as Shown in Fig. 3, in this model, calculate the fuzzy connectedness FC between zone of ignorance and known region pixel, false If the pixel that zone of ignorance is to be calculated is p₁, it is known that a certain pixel in region is q₁, then p₁With q₁Between Fuzzy connectedness FC is:

FC(p₁,q₁)=maxmin{ μ_κ(p₁,r),μ_κ(r,q₁)}

Wherein, r is p₁To q₁Any point on path, and μ_κIt is the similarity between two pixels:

μ_{κ} (x, y) = \exp {- \frac{1}{2} {[I (x) - I (y)]}^{T} Σ^{- 1} [I (x) - I (y)]},

I (x), I (y) represent the three-dimensional color vector of pixel x, y, and what T represented is matrix transpose.

Thus can quickly obtain pixel p in image₁α value:

α (p_{1}) = \frac{{FC}^{f} (p_{1})}{{FC}^{f} (p_{1}) + {FC}^{b} (p_{1})},

Wherein, FC^fAnd FC^bIt is p respectively₁To prospect and p₁Fuzzy connectedness to background known region.

After trying to achieve α (opacity) value of zone of ignorance pixel, it is easy to just can be public according to basic stingy figure Formula: C=α F+ (1-α) B obtains the distribution of color of zone of ignorance pixel, has namely obtained key frame prospect The stingy figure result of object, such as Fig. 4.Wherein, C, F, B represent respectively RGB three-dimensional value composition to Amount.

When calculating fuzzy connectedness FC, design owing to being similar to the Connected degree of " wooden barrel short slab " so that calculate Step can have the process of simplification: such as Fig. 4, when calculating p₁Point is to all fuzzy connectedness in other regions Afterwards, FC (p₁,q₁) and FC (p₁,p₂) it is all known.Set based on mathematics for fuzzy connectedness FC before Meter is apparent from, at any three pixels (three pixels in the three-dimensional space-time model of above-mentioned foundation, Ye Jishang State three spatial neighbor pixels in figure G) be summit, fuzzy connectedness FC between any two be that limit is constituted Spatial triangle in, certainly exist both sides equal and less than the structural relation on the 3rd limit.In the diagram, because The definition of fuzzy connectedness, FC (p1, q1) refers on a certain path from p1 to q1, finds out weight Weak one section, is i.e. equivalent to the short slab of a wooden barrel, relatively the short slab of every paths, finds out that short slab is A big paths, that short slab on this paths, as FC to be calculated, i.e. finds out that short slab is The short slab length of strong wooden barrel.Return in Fig. 4, it is known that in the case of FC (q1, p1) and FC (p1, p2), root According to the transitivity in mathematical definition, FC (q1, p2) is necessarily less equal in both.If that is, both are not Equal, then FC (q1, p1) is less equal in FC (p1, p2) and FC (q1, p2), if FC(p₁,q₁) ＜ FC (p₁,p₂), then FC (p₂,q₁)=FC (p₁,q₁)；If FC is (p₁,q₁) ＞ FC (p₁,p₂), Then FC (p₂,q₁)=FC (p₁,p₂)；Only equal when both, i.e. FC (p₁,q₁)=FC (p₁,p₂) time, FC(p₂,q₁) need to recalculate, only under the use of this skill, so that it may amount of calculation is reduced to originally The 1/3 of traversal pixel calculating one by one.

4.SURF finds corresponding search window

Obtain the stingy figure result of video sequence key frame via above step, this result need to be transferred to whole video Sequence.First, key frame detects the edge of foreground object, and search window is set along described edge；Connect , the search window of frame coupling before and after use SURF point-of-interest detection method searching labelling.Owing to SURF examines Survey method employs summation process repeatedly loaded down with trivial details in integrogram replaces general rectangular area, is ensureing matching performance On the basis of greatly reduce amount of calculation, accelerate algorithm speed.It is described in detail below:

Obtain the stingy figure result of key frame via above step after, use Sobel operator (in rim detection, Conventional a kind of template is Sobel operator, and Sobel operator has two, and one is detection level edge, separately One is detection vertical edge) detect the edge line of foreground object, on described edge line clockwise every N pixel selected point, as the central point of search window, travels through several square aearch of whole edge line extraction Window, the length of side of search window typically takes the 1/10 of the foreground object minimum enclosed rectangle length of side, and n takes the search window length of side Half.

After setting up search window, the search window that before and after selecting SURF to detect, frame is corresponding, follow-up video flowing is composed Value process is all carried out in each search window, is tied by the stingy figure in search window corresponding for key frame in subsequent step Fruit is sequentially assigned to search window corresponding to other intermediate frames, and (search window that described intermediate frame is corresponding is i.e. walked by above Suddenly the corresponding window obtained is mated by the search window in key frame).What wherein SURF detected specifically comprises the following steps that

1) integrogram is calculated；

2) Hessian matrix is built；

3) metric space is built；

4) precise positioning feature point；

5) rotation principal direction is determined；

6) calculate SURF characteristic point and describe son.

5. adaptive video stream assignment

By the stingy figure result of search window by key frame during other intermediate frame assignment, if scratch figure result Middle prospect is single or separate moving object, then the video flowing of assignment should pass from former and later two directions Broadcast, as it is shown in figure 5, by two-way propagation guarantee the movable information of foreground object by more complete be transferred to whole Video sequence；If foreground object is the multiple objects having relative motion, the then video flowing of assignment in video sequence Only should propagate from front to back, as shown in Figure 6, can avoid when propagating stingy figure information due to relative motion of object mistake The overlapping back propagation information producing mistake in journey.It is each that described intermediate frame refers between each adjacent key frame Frame.

First, it is determined that whether foreground object is the multiple objects having relative motion, if it is, as it is shown in figure 5, The video flowing arranging assignment only allows along time orientation forward-propagating；If it is not, then as shown in Figure 6, arrange The video flowing of assignment can be propagated along positive and negative both direction from key frame.

Above content is to combine concrete preferred implementation further description made for the present invention, it is impossible to Assert the present invention be embodied as be confined to these explanations.Ordinary skill for the technical field of the invention For personnel, without departing from the inventive concept of the premise, make some equivalents and substitute or obvious modification, and Performance or purposes are identical, all should be considered as belonging to protection scope of the present invention.

Claims

1. the Interactive Image Processing method in video, it is characterized in that: extract key frame in the video sequence, and the frame superposition one by one between adjacent key frame formed be used for the key frame bunch of interactive markup, with this, image of described key frame is divided into foreground area, background area and zone of ignorance；Then described key frame is carried out spectral clustering and α value is estimated, it is thus achieved that the stingy figure result of described key frame；Finally, the stingy figure result of described key frame is transferred to whole video sequence, obtains final stingy figure result；

Described the stingy figure result of described key frame is transferred to whole video sequence includes: a. is at the edge line of key frame detection foreground object, b. travel through whole described edge line and several search windows are set, c. the described search window of frame coupling before and after use SURF feature point detecting method finds labelling, the stingy figure result in search window corresponding for described key frame is sequentially assigned to the search window that other intermediate frame is corresponding by d..

2. the Interactive Image Processing method in video as claimed in claim 1, it is characterised in that: in described step d, if prospect is single or separate moving object, then the video flowing of assignment both direction forwards, backwards is propagated；If prospect is to have multiple objects of relative motion in video sequence, then the video flowing of assignment is only propagated from front to back.

3. the Interactive Image Processing method in video as claimed in claim 1, it is characterised in that described SURF feature point detecting method comprises the steps:

1) integrogram is calculated；

2) Hessian matrix is built；

3) metric space is built；

4) location feature point；

5) rotation principal direction is determined；

6) calculate SURF characteristic point and describe son.

4. the Interactive Image Processing method in video as claimed in claim 1, it is characterised in that: described extraction key frame is to extract a frame as key frame every 10～20 frames.

5. the Interactive Image Processing method in video as claimed in claim 1, it is characterised in that: described spectral clustering uses standardization Spectral Clustering.

6. the Interactive Image Processing method in video as claimed in claim 5, it is characterised in that: the step of described standardization spectral clustering is as follows:

1) similarity matrix W is built by raw data set,

2) each column element of W being added to obtain N number of number, other elements are all the N N matrix D of 0 by this paired linea angulata of N number of array to build one,

3) the similarity matrix W built by raw data set builds Laplacian Matrix L=D-W,

To obtain after L standardization

Solve front k the eigenvalue of L 'And characteristic of correspondence vectorWillIt is multiplied by one by oneObtain front k the characteristic vector of matrix L；

4) k characteristic vector line up composition one N × k matrix, each of which row is regarded as a vector in k dimension space, clusters；

Wherein, w_i,jFor representing the similarity between two pixel numbers evidences, i, j represent two different pixels, and N represents the data amount check on diagonal, and k represents characteristic vector number.

7. the Interactive Image Processing method in video as claimed in claim 6, it is characterized in that: the selection of described k uses heuristic, even the least to the eigenvalue of m-th from the 1st, from the beginning of m+1, eigenvalue has the change on the order of magnitude big, then k takes m.

8. Interactive Image Processing method in video as claimed in claim 6, it is characterised in that: described step 4) in cluster use K-means algorithm or Mean-shift algorithm or GMM algorithm.

9. the Interactive Image Processing method in video as claimed in claim 1, it is characterized in that: described α value is estimated to use α value method of estimation based on three-dimensional fuzzy connectedness, it is used for calculating the fuzzy connectedness between zone of ignorance and known region pixel, and obtain the stingy figure result of key frame foreground object according to basic figure formula C=α F+ (1-α) B that scratches, wherein, α is transparency parameter, C is the color vector of zone of ignorance, F is foreground area pixel, and B is background area pixels.