CN104954780A

CN104954780A - DIBR (depth image-based rendering) virtual image restoration method applicable to high-definition 2D/3D (two-dimensional/three-dimensional) conversion

Info

Publication number: CN104954780A
Application number: CN201510386465.XA
Authority: CN
Inventors: 刘伟; 崔明月; 郑扬冰; 张新刚; 马世榜; 刘红钊
Original assignee: Nanyang Normal University
Current assignee: Nanyang Normal University
Priority date: 2015-07-01
Filing date: 2015-07-01
Publication date: 2015-09-30
Anticipated expiration: 2035-07-01
Also published as: CN104954780B

Abstract

The invention discloses a DIBR (depth image-based rendering) virtual image restoration method applicable to high-definition 2D/3D (two-dimensional/three-dimensional) conversion. The method comprises steps as follows: generating a general background dictionary DB; synthesizing a sparse dictionary DI: constructing a sample dictionary DS in a background area of a video frame image IF, and synthesizing the sparse dictionary DI according to the generated general background dictionary DB and the sample dictionary DS; performing void estimation and classification processing; performing filling restoration processing on each II<th>-class void RHII. With the adoption of the method, through classification processing and sparse dictionary expression, the complexity of calculation is significantly reduced while the 3D virtual image rendering effect is improved, and the method is more applicable to processing of mass video information in high-definition 2D/3D conversion.

Description

A kind of DIBR virtual image restorative procedure being applicable to high definition 2D/3D conversion

Technical field

The present invention relates to and belong to 3 D video technical field, be specifically related to the Video Quality Metric technology of 2D/3D, particularly a kind of DIBR virtual image restorative procedure being applicable to high definition 2D/3D conversion.

Background technology

At present, three-dimensional (3D) video is popularized gradually, and Chinese Central Television (CCTV) has also piloted 3D channel when New Year's Day in 2012, and 3D video has become a kind of trend of current development gradually.But video source deficiency becomes the Main Bottleneck that this industry of restriction is risen.In this case, 2D video is transferred to effective way that 3D video is head it off.

2D video is transferred to 3D video generally speaking to have two kinds and play up mode: wherein a kind of is from single frame of video, directly reconstruct the right and left eyes image pair with parallax someway by adopting; Another kind plays up (Depth Image-based Rendering based on depth map, DIBR), its transformation result is on the basis of former video, addition of each depth map corresponding to frame, just can carry out viewing and admiring (see " film 2D/3D switch technology summarizes [J] " after being finally converted to binocular tri-dimensional video by the display terminal output embedding DIBR processing module, Liu Wei, Wu Yihong, Hu Zhanyi, " computer-aided design and graphics journal ", 2012,24 (1): 14-28).Compared with the former, three original features that the latter has with it: compress efficiency of transmission efficiently, adjust with the compatibility of existing 2D technology and the distinct device depth of field that is strong and that have on real time tridimensional video generates and the technical advantage such as Fast rendering synthesis, in the leading position that the market shares such as emerging 3DTV, 3D mobile terminal are absolute, it is the direction of 3D Rendering future development.

It is important step based in the 2D/3D conversion method of depth map that DIBR plays up, and it can utilize depth information to render virtual three-dimensional video-frequency, thus finally completes 2D to 3D " fundamental change ".Although this technology has a lot of advantages, still there is its limitation.Because DIBR fictionalizes right and left eyes image according to the mapping relations of depth map conversion from reference picture, the change of viewpoint may cause being come out in new images by the part background area that foreground object blocks in original image, and this part region does not have corresponding texture mapping in conversion process, therefore cavitation will be produced on target image.This problem is DIBR technology study hotspot in recent years, is also the importance improving 3D rendering quality.That commonly uses at present for this problem has three class solutions, but these three class methods have its limitation when applying:

1) depth of seam division video (LDV) form.These class methods fundamentally solve the cavitation produced in depth map owing to blocking by new data Layer.But utilize special equipment during this technical requirement video acquisition, so and be not suitable for 2D/3D conversion.

2) preliminary treatment of depth image.Such as first smoothing to depth map, draw in the new viewpoint obtained like this and will comprise less cavity, be conducive to further filling up.This method operational efficiency is high, and effect is obvious, but smothing filtering may cause the object edges areas in virtual image, and (especially the edge of vertical direction) produces geometric deformation.

3) hole-filling.These class methods are the draftings first utilizing DIBR algorithm to carry out new viewpoint, then carry out hole-filling, are specifically divided into again three kinds.The first is based on partial differential equation, and this kind of method is applicable to deriving and repairs cavity, zonule; The second patch-based texture synthesis technology fills the information of loss, and this kind of method can repair large area region cavity, and operational efficiency is high, but when repairing, the coupling of block, based on greedy search, may cause obvious mis repair; The third is the sparse representation theory based on dictionary, achievement in research in recent years shows that the method can obtain good repairing effect, but this kind of method needs through iteration optimization when generating dictionary usually, amount of calculation is comparatively large, can not meet the conversion requirements of video 2D/3D completely.

Therefore in 2D/3D Video Quality Metric, the existing virtual image restorative procedure based on DIBR effectively cannot ensure the conversion synthesis of the high clear video image that virtual image particularly becomes more and more popular at present, thus have impact on the actual converted effect of 3D video.

Summary of the invention

In view of this, the object of the invention is for the deficiencies in the prior art, represented by classification process and sparse dictionary, propose a kind of DIBR virtual image restorative procedure being applicable to high definition 2D/3D conversion, to promote high definition 3D virtual image rendering effect, reduce the complexity of calculating simultaneously.

For achieving the above object, the present invention by the following technical solutions:

Be applicable to a DIBR virtual image restorative procedure for high definition 2D/3D conversion, comprise the steps:

A) common background dictionary D is generated _b;

B) sparse dictionary D is synthesized _i, comprising: at video frame images I _fbackground area in construct sample dictionary D _s; According to the common background dictionary D generated _bwith sample word dictionary D _ssynthesis sparse dictionary D _i;

C) cavity is estimated and process of classifying, and comprising:

At depth map I _din, utilize cavity to estimate operator according to the change in depth rule of right and left eyes virtual image to the hole region R produced _hestimate;

At the video frame images I corresponding with depth map _fin, according to the texture situation near hole region, utilize texture quantificational operators to classify to prediction cavity, the prediction cavity that neighbouring texture information not too enriches is labeled as I class cavity R _hI;

For I class cavity R _hI, adopt asymmetric smothing filtering mode to depth map I _din corresponding region carry out filtering, then carry out DIBR and play up process and generate destination virtual image I _v, and by destination virtual image I _vin all cavities of still existing be labeled as II class cavity R _hII;

D) to each II class cavity R _hIIcarry out filling repair process.

Preferably, described steps A) generate common background dictionary D _b, comprising: in a natural image set, extract n ₁the image block of individual N*N pixel size, utilizes KSVD algorithm and OMP Algorithm for Training size to be the common background dictionary D of (N*N) × M _b.

Preferably, described at video frame images I _fbackground area in construct sample word dictionary D _s, comprising:

At pending video frame images I _fbackground area extract the image block that N*N Pixel Dimensions is N capable N row at random, these image blocks are pulled into column vector and are configured to image block matrix as sample dictionary D _s.

Preferably, the described common background dictionary D according to generating _bwith sample word dictionary D _ssynthesis sparse dictionary D _i, comprising:

According to sample dictionary D _swith error threshold T to common background dictionary D _bin each row, by formula obtain sparse coefficient representing matrix wherein D _bkd _bin the representation of a kth column vector, Y is D _bsparse coefficient representing matrix, y _kthe kth row of Y;

Utilize sparse coefficient representing matrix with the sample dictionary D of correspondence _s, obtain sparse dictionary it is the estimated value of Y.

Preferably, described step D) to each II class cavity R _hIIcarry out filling repair process, comprising:

Da) hole region limb recognition is re-started to this II class cavity, the each non-NULL pixel adjacent with current hole region edge, this II class cavity is labeled as wire-frame image vegetarian refreshments, thus is connected by each wire-frame image vegetarian refreshments of current markers and forms the current non-NULL pixel profile in this II class cavity; Determine the priority of each wire-frame image vegetarian refreshments of current markers; Wherein, any one wire-frame image vegetarian refreshments p of current markers _tpriority P (t) be defined as follows:

P(t)＝C(t)D(t)

C (t) represents p _tbackground confidence level coefficient, and:

C (t) = \frac{Σ_{q &Element; ψ_{t}} E (q)}{&lang; ψ_{t} &rang;}, E (q) = \{\begin{matrix} 0 & &ForAll; q &Element; R_{H} \\ 1 & &ForAll; q &NotElement; R_{H} \end{matrix}

Wherein, ψ _trepresent that one with p _tcentered by pixel, be of a size of the rectangular neighborhood of N capable N row, < ψ _t> represents the pixel number of rectangular neighborhood;

D (t) represents p _ttexture confidence level coefficient, and:

Wherein, | G (p) _x| represent video frame images I _fin the gradient in x direction, p place, | G (p) _y| represent the gradient in y direction, λ _vfor default weight factor;

Db) compare the priority of each wire-frame image vegetarian refreshments of current markers, if the highest profile point of its medium priority only has one, wire-frame image vegetarian refreshments the highest for this priority is labeled as target p to be repaired _t; If the wire-frame image vegetarian refreshments that its medium priority is the highest has multiple, then one of them wire-frame image vegetarian refreshments is selected to be labeled as target p to be repaired _t;

At destination virtual image I _vmiddle by described target p to be repaired _tcentered by Pixel Dimensions be the capable N of N arrange image block as image block X to be repaired _d; Utilize described sparse dictionary D _iwith error threshold T to X _d, by formula carry out hole-filling, obtain sparse coefficient representing matrix wherein x _dx _dcolumn vector representation, y _dx _dsparse coefficient representing matrix, H is the mask matrix of image block to be repaired; Then sparse coefficient representing matrix is utilized with the sample dictionary D of correspondence _i, obtain the image block after repairing and will in image block put back to origin-location in destination virtual image;

Dc) judge whether this II class cavity has been completely filled, if not, then repeat step Da) ~ Dc); Until all empty pixel in II class cavity is completely filled, then the filling repair process in II class cavity is completed.

Preferably, steps A) described in the span of N be 5<=N<=15; Described n ₁span be 10000<=n ₁<=15000; The span of described M is 4*N*N<=M<=16*N*N.

Preferably, steps A) described in natural image set refer to any one group with natural land, streetscape or the picture set that do not have the indoor static scape of character activities to be the theme; And the number of image is no less than 20 width in set, image resolution ratio is not less than 512*512.

Preferably, step C) described in cavity estimate operator and be:

R_{H} = {r (x, y) | η_{i} (x, y) - r (x, y) > α . \frac{λ_{H} \cdot D_{m a x}}{D_{w i d t h}}}

η_{i} (x, y) = \{\begin{matrix} r (x + 1, y) & i f & i = l \\ r (x - 1, y) & i f & i = r \end{matrix}

Wherein, R _hrepresent the cavity of prediction, r (x, y) represents at depth map I _dthe depth value at middle coordinate (x, y) place, D _maxbe the maximum of the virtual image parallax generated, α is normalization factor, D _widththe width of image, λ _hit is default threshold factor; If the virtual view of new synthesis is left-eye view, then i=l, otherwise, i=r.

Preferably, step C) described in texture quantificational operators be:

F (p) = \underset{S (p) \cap \overset{&OverBar;}{R_{H}}}{Σ} (λ_{v} \cdot | G {(p)}_{x} | + (1 - λ_{v}) \cdot | G {(p)}_{y} |)

Wherein, p represents video frame images I _fin position coordinates, F represents the texture quantizating index of video frame images at p place, | G (p) _x| presentation video in the gradient in x direction, p place, | G (p) _y| presentation video in the gradient in y direction, p place, λ _vfor default weight factor, S (p) presentation video is at the neighborhood at p place; Described λ _vbe 0.5 ~ 1, the region that the neighborhood at described p place is the Pixel Dimensions centered by p is the capable N row of N.

Preferably, step C) described near the texture cavity of not too enriching refer to meet the hole region of following formula:

R _HI＝{r|r∈R _H,F(r)＜σ _texture·F _max(r)}

Wherein, R _hIrepresent the hole region being marked as I class, σ _texturefor the threshold value preset, span is 0.1 ~ 0.9, F _maxr () refers to video frame images I _fthe maximum of middle texture quantizating index.

The invention has the beneficial effects as follows:

The common background dictionary that off-line training first obtains by the present invention and the sample dictionary obtained from video image, form sparse dictionary by Fast back-projection algorithm; Carry out classification process based on texture features to the cavity in DIBR virtual image again, wherein, I class cavity is eliminated by the asymmetric smothing filtering of depth map, and II class cavity, after priority is determined in calculating, is eliminated by the image repair based on sparse dictionary.On the one hand, classification process can play the feature of two conventional at present large class DIBR virtual image restorative procedures, makes the virtual image of generation obtain better rendering effect on the whole; On the other hand, the present invention significantly reduces the on-line study time of dictionary by structure sparse dictionary, makes the image repair method based on rarefaction representation more be applicable to the practical application of high definition 2D/3D conversion.

Classification process in the present invention enables different restorative procedures process region more targetedly based on texture features, significantly can promote the DIBR virtual image repairing effect in 2D/3D conversion; The mode that sparse dictionary is then synthesized by off-line general dictionary and sample dictionary both ensure that flexibility, the adaptability of dictionary, significantly reduce again the on-line study time of dictionary, decrease the computation complexity of algorithm, make the restorative procedure based on rarefaction representation be more suitable for the demand of massive video information processing in high definition 2D/3D conversion.

Accompanying drawing explanation

Fig. 1 is the flow chart of the existing 2D/3D video conversion method played up based on depth map;

Fig. 2 is the empty schematic diagram in DIBR virtual image;

Fig. 3 is method flow diagram of the present invention;

Fig. 4 is common background dictionary of the present invention and image background sample dictionary synthesis sparse dictionary schematic diagram;

Fig. 5 is experimental image multidomain treat-ment and corresponding depth image filtering schematic diagram thereof;

Fig. 6 is the virtual image effect contrast figure after Fig. 5 magnification region adopts multiple method to repair.

Embodiment

Below in conjunction with drawings and Examples, the invention will be further described.

Fig. 1 shows the existing 2D/3D video conversion method played up based on depth map, as shown in Figure 1, for the 2D video of input, is first decomposed from the video flowing of this 2D video by decoding and obtains frame of video; Then, utilize various Depth cue to extract effective depth information from this 2D video, and generate the depth map corresponding with frame of video through calculation process; Then, this depth map and described frame of video are played up process through DIBR again, thus obtains 3D video and export.

Wherein, it is important step in 2D/3D conversion method that DIBR plays up process, it is described that the mapping relations of an accurate point-to-point, depth information can be utilized to render virtual three-dimensional video-frequency, thus finally complete 2D to 3D " fundamental change ".Although this technology has a lot of advantages, still there is its limitation.Because DIBR fictionalizes right and left eyes image according to the mapping relations of depth map conversion from reference picture, the change of viewpoint may cause being come out in new images by the part background area that foreground object blocks in original image, and this part region does not have corresponding texture mapping in conversion process, therefore cavitation will be produced on target image.This problem is DIBR technology study hotspot in recent years, is also the importance improving 3D rendering quality.As shown in Figure 2, the discontinuous part of the black in newly-generated virtual image is cavity.

That commonly uses at present for this problem has three class solutions:

2) preliminary treatment of depth image.Such as first smoothing to depth map, draw in the new viewpoint obtained like this and will comprise less cavity, be conducive to further filling up.Discontinuous (degree of depth acute variation) region in this method energy depth of smoothness figure thus the cavity of reducing in depth map, the intensity increasing gaussian filtering can improve the quality of generation stereotome.But filtering easily causes the torsional deformation of object vertical direction fringe region on the other hand.Because human visual system obtains depth preception and mainly derives from horizontal parallax, the effect of vertical parallax is less, and in order to alleviate this problem, what usually adopt is asymmetric smothing filtering.Asymmetric level and smooth principle is consistent with human eye binocular vision system, so during depth of smoothness image, intensity is in the vertical direction greater than the intensity of horizontal direction.Apply asymmetric Gaussian filter and carry out change in depth sharp-pointed in depth of smoothness image energy depth of smoothness figure, thus reduce the generation in cavity, also can retain rational horizontal parallax simultaneously, reduce the geometric distortion of image.But, when the smoothing effect of longitudinal direction is excessive, the subregion object of the new viewpoint view synthesized still can be caused to produce geometric deformation.

3) hole-filling.These class methods are the draftings first utilizing DIBR algorithm to carry out new viewpoint, then carry out hole-filling, are specifically divided into again three kinds.The first is based on partial differential equation, and the main thought of this kind of method utilizes the thermic vibrating screen in physics by the Information Communication around region to be repaired in repairing area, and this kind of method is applicable to deriving and repairs cavity, zonule; The second patch-based texture synthesis technology fills the information of loss, the main thought of this kind of method first chooses a pixel from the border in region to be repaired, simultaneously centered by this point, according to the textural characteristics of image, choose sizeable texture block, around region to be repaired, then find Texture Matching block the most close with it carry out this texture block alternative.This kind of method can repair large area region cavity, and operational efficiency is high, but when repairing, the coupling of block, based on greedy search, may cause obvious mis repair.The third is the sparse representation theory based on dictionary, achievement in research in recent years shows that the method can obtain good repairing effect, but this kind of method needs through iteration optimization when generating dictionary usually, amount of calculation is comparatively large, can not meet the conversion requirements of video 2D/3D completely.

The inventive method be using by video image and the depth map that generated by certain Depth cue as the data source of input, after conversion process, generate the right and left eyes virtual image after repairing.Fig. 3 is method flow diagram of the present invention, and composition graphs 3 pairs of the specific embodiment of the present invention are described.

Repairing effect based on the image repair method of rarefaction representation depends on whether the dictionary selected has well structural and flexibility to a great extent, generally speaking there is three major types dictionary: 1) fixing dictionary, as the conventional complete DCT dictionary of mistake, small echo dictionary, this category dictionary has structural preferably, and do not need training, computational efficiency is higher, but lacks adaptivity and the better specific aim of characteristics of image; 2) sample dictionary, this category dictionary directly randomly draws the atomic element (atom) of image block as dictionary from sample image, although method is simple, but make in dictionary, to contain a large amount of image basis element information, achieve very large success in the field of face identification based on rarefaction representation at present; 3) based on the dictionary of study, this category dictionary utilizes certain mode of learning, as conventional KSVD algorithm, dictionary element can carry out iteration optimization by great amount of samples data, there is best rarefaction representation and similitude, but the computational efficiency of dictionary learning is too low, be unfavorable for the popularization of some practical applications.In order to solve the contradiction between repairing effect and computational efficiency better, present invention employs sparse dictionary framework, the generation of dictionary is divided into the off-line learning of common background dictionary and online synthesis two parts of sample dictionary, while reduction algorithm computation complexity, ensure that adaptivity and the similitude of dictionary element characteristic information.Steps A) and step B) specifically describe this process:

Steps A) generation of off-line background general dictionary.Concentrate at a natural image and extract n ₁the image block of individual N*N pixel size, utilizes KSVD algorithm and OMP Algorithm for Training size to be the common background dictionary D of (N*N) × M _b.

Come out in new images by the part background area that foreground object blocks when viewpoint changes in cavity in DIBR virtual image, and this part region does not have corresponding texture mapping to produce in conversion process, therefore the relevant information of background is mainly extracted in the reparation in cavity.Consider that background content in general video content is mainly with outdoor natural land, streetscape or do not have the indoor static scape of character activities to be main, the natural image collection that the present invention selectes is also based on the image of this type of theme; In order to ensure versatility, the image number of natural image collection is no less than 20; In order to avoid the plyability of small scale image sample information extraction when dictionary learning, setting image resolution ratio is not less than 512*512.

In the present invention, the span of N is 5<=N<=15, n ₁span be 10000<=n ₁<=15000; The span of M is 4*N*N<=M<=16*N*N, and in emulation experiment, N gets 5, n ₁get 10000, M and get 125, structure size is the training matrix X of 25*10000 _bwith 25 × 125 common background dictionary D _b, iterations is set to 30 times, use size be 25 × 125 DCT dictionary carry out the initialization of dictionary, and the degree of rarefication S of image block is set _b=3, utilize KSVD algorithm and OMP algorithm to train common background dictionary as follows:

Wherein, φ _bx _bsparse sacrifice representing matrix, φ _bjth row.

Described KSVD refers to k-singular value decomposition, and it is a kind of method that repetitive exercise crosses complete dictionary; Described OMP refers to Orthogonal Matching Pursuit, and it is a kind of method of signal being carried out on the complete dictionary of mistake Its Sparse Decomposition.The present invention have employed a kind of implementation based on above-mentioned algorithm principle in an embodiment.

B) sparse dictionary D _istructure.First at pending video frame images I _fbackground area extract the image block that N*N Pixel Dimensions is N capable N row at random, these image blocks are pulled into column vector and are configured to image block matrix as sample dictionary D _s; Then sample dictionary D is utilized _swith error threshold T to steps A) the middle common background dictionary D generated _bin each row, by formula obtain sparse coefficient representing matrix wherein D _bkd _bin the representation of a kth column vector, Y is D _bsparse coefficient representing matrix, y _kthe kth row of Y; Finally, sparse coefficient representing matrix is utilized with the sample dictionary D of correspondence _s, obtain sparse dictionary sample dictionary and Background learning dictionary effectively combine by sparse dictionary of the present invention, improve the structural of dictionary and adaptivity simultaneously.

Video frame images I pending in the present invention _fbackground area refer at the depth image I corresponding with video frame images _dmiddle depth value is less than the image-region of dominant depth value.Dominant depth has reacted the average distance of primary objects apart from camera lens under residing visual angle of three dimensions Scene.For example, the depth bounds of a certain video scene is 68 ~ 126 meters, and primary objects is 80 meters apart from the average distance of camera lens under residing visual angle, then dominant depth is 80 meters.And dominant depth value is the imbody of dominant depth in depth map, refer to depth map I _din depth capacity and minimum-depth respectively as upper and lower bound benchmark, be some equally spaced depth areas by continuous print change in depth spatial division, and count the pixel number in depth map corresponding to each region; Using the average of the maximum depth areas of pixel number as the dominant depth value of the scene under residing visual angle.Also be described with example above, if depth map I _deach pixel preserve with the type of UINT 8, so the depth value scope of depth map is [0,255], then dominant depth value is INT (255* (80-68)/(126-68))=53; If each pixel of depth map is preserved with type double precision (doubletype) numerical value, the depth value scope of depth map is [0,1], then dominant depth value is 1* (80-68)/(126-68)=0.2069.

C) cavity is estimated and is classified.First, at depth map I _din, utilize cavity to estimate operator according to the change in depth rule of right and left eyes virtual image to the hole region R produced _hestimate; Then, then at the video frame images I corresponding with depth map _fin, according to the texture situation near hole region, utilize texture quantificational operators to prediction cavity classify, the prediction cavity that neighbouring texture information not too enriches be divided into be labeled as I class cavity R _hI; For I class cavity, adopt existing asymmetric smothing filtering mode to corresponding depth map I _din corresponding region carry out filtering, then carry out DIBR again and play up process and generate destination virtual image I _v, now, I _vall cavities of middle existence are labeled as II class cavity R _hII.

Empty restorative procedure treatment effeciency based on depth map filtering is high, successful, but easily causes the straight line texture deformation of image in the region of texture-rich; And more easily obtain the characteristic information in region based on the restorative procedure of rarefaction representation in the region of texture-rich thus obtain better repairing effect.The present invention carries out classification process according to this feature to the cavity of virtual image, obtains better repairing effect with this.In addition, the restorative procedure of depth map filtering is lower than the restorative procedure complexity based on rarefaction representation, and classification process also effectively can improve the computational efficiency of whole method.

Operator is estimated in described cavity:

R_{H} = {r (x, y) | η_{i} (x, y) - r (x, y) > α \cdot \frac{λ_{H} \cdot D_{m a x}}{D_{w i d t h}}}

η_{i} (x, y) = \{\begin{matrix} r (x + 1, y) & i f & i = l \\ r (x - 1, y) & i f & i = r \end{matrix}

Wherein, R _hrepresent the cavity of prediction, r (x, y) represents at depth map I _dthe depth value at middle coordinate (x, y) place, D _maxbe the maximum of the virtual image parallax generated, α is normalization factor (such as, for typical gray level image, α=255), D _widththe width of image, λ _hit is default threshold factor.If the virtual view of new synthesis is left-eye view, then i=l, otherwise, i=r.λ _hbe 1 ~ 6.λ in emulation experiment _hget 3.

Described texture quantificational operators are:

F (p) = \underset{S (p) \cap \overset{&OverBar;}{R_{H}}}{Σ} (λ_{v} \cdot | G {(p)}_{x} | + (1 - λ_{v}) \cdot | G {(p)}_{y} |)

Wherein, p represents video frame images I _fin position coordinates, F represents the texture quantizating index of video frame images at p place, | G (p) _x| presentation video in the gradient in x direction, p place, | G (p) _y| presentation video in the gradient in y direction, p place, λ _vfor default weight factor, S (p) presentation video is at the neighborhood at p place, and the Pixel Dimensions of this example middle finger centered by p is the region of the capable N row of N.Because the vision system of people is more responsive than longitudinal parallax to transverse parallaxes, in veining operator, the gradient weight factor in x direction is larger, so λ _vspan is 0.5 ~ 1, λ in emulation experiment _vget 0.7.

The cavity that near described, texture not too enriches refers to meet the hole region of following formula:

R _HI＝{r|r∈R _H,F(r)＜σ _texture·F _max(r)}

Wherein, R _hIrepresent the hole region being marked as I class, σ _texturefor the threshold value preset, span is 0.1 ~ 0.9, λ in emulation experiment _vget 0.7; F _maxr () refers to video frame images I _fthe maximum of middle texture quantizating index.

D) for described II class cavity, respectively to each II class cavity Da as follows)-Dc) described method carries out filling repair process:

Da) hole region limb recognition is re-started to this II class cavity, the each non-NULL pixel adjacent with this current hole region edge, II class cavity is labeled as wire-frame image vegetarian refreshments, thus forms the current non-NULL pixel profile in this II class cavity by each wire-frame image vegetarian refreshments of current markers is adjacent.Determine the priority of each wire-frame image vegetarian refreshments of current markers; Wherein, any one wire-frame image vegetarian refreshments p of current markers _tpriority P (t) be defined as follows:

P(t)＝C(t)D(t)

C (t) represents p _tbackground confidence level coefficient, and:

C (t) = \frac{Σ_{q &Element; ψ_{t}} E (q)}{&lang; ψ_{t} &rang;}, E (q) = \{\begin{matrix} 0 & &ForAll; q &Element; R_{H} \\ 1 & &ForAll; q &NotElement; R_{H} \end{matrix}

Wherein, ψ _trepresent that one with p _tcentered by Pixel Dimensions be the rectangular neighborhood of N capable N row, < ψ _t> represents the pixel number of rectangular neighborhood;

D (t) represents p _ttexture confidence level coefficient, and:

D (t) = \frac{1}{&lang; ψ_{t} &rang;} \underset{q &Element; ψ_{t}}{Σ} (λ_{v} \cdot | G {(p)}_{x} | + (1 - λ_{v}) \cdot | G {(p)}_{y} |)

Wherein, | G (p) _x| represent video frame images I _fin the gradient in x direction, p place, | G (p) _y| represent the gradient in y direction, λ _vfor default weight factor.Consider that the vision system of people is more responsive than longitudinal parallax to transverse parallaxes equally, in veining operator, the gradient weight factor in x direction is larger, so λ _vspan is 0.5 ~ 1, λ in emulation experiment _vget 0.7.

Db) first compare the priority of each wire-frame image vegetarian refreshments of current markers, if the highest profile point of its medium priority only has one, wire-frame image vegetarian refreshments the highest for this priority is labeled as band and repairs target p _t; If the wire-frame image vegetarian refreshments that its medium priority is the highest has multiple, then the wire-frame image vegetarian refreshments that selection priority is the highest is labeled as target p to be repaired _t; Again at destination virtual image I _vmiddle by described target p to be repaired _tcentered by Pixel Dimensions be the capable N of N arrange image block as image block X to be repaired _d; Utilize described sparse dictionary D _iwith error threshold T to X _d, by formula carry out hole-filling, obtain sparse coefficient representing matrix wherein x _dx _dcolumn vector representation, y _dx _dsparse coefficient representing matrix, H is the mask matrix of image block to be repaired; Finally, sparse coefficient representing matrix is utilized with the sample dictionary D of correspondence _i, obtain the image block after repairing and will in image block put back to origin-location in destination virtual image.

Described mask matrix H presses formula calculate, wherein m _dwith described target p to be repaired _tcentered by Pixel Dimensions be the hole information matrix M of N capable N row _dcolumn vector representation, M _dsatisfy condition:

M_{d} (p) = \{\begin{matrix} 0 & i f & p &Element; R_{H} \\ 1 & i f & p &NotElement; R_{H} \end{matrix}

The repair process of the present invention to II class cavity have employed as step Da) ~ Dc) as described in mode, often perform a step Da) ~ Dc), after repairing the region of one block of destination virtual image, when next time performs, all can re-start identification to remaining hole region; The present invention is in step Da) priority considers background and texture confidence level when calculating, and can ensure the preferential diffusion of texture information like this, and make the final hole-filling repairing effect that obtains and background image degrees of fusion higher; Then in step Db) in, the present invention is under in image block to be repaired, all elements (valid data and data to be repaired) has the hypothesis of identical rarefaction representation coefficient, first utilize mask matrix H to obtain the rarefaction representation matrix of image block to be repaired according to valid data, recycling dictionary and rarefaction representation matrix repair whole image block.

It is below the experimental verification of DIBR virtual image restorative procedure of the present invention.

1) experiment condition:

At CPU be core ^tM2Quad CPU Q9400@2.66GHz, internal memory 2G, Windows 7 tests by system.

2) experiment content:

The advantage realizing details according to the experiment of the inventive method and have compared with existing restorative procedure is specifically described referring to Fig. 4 to Fig. 6.

Fig. 4 is common background dictionary of the present invention and image background sample dictionary synthesis sparse dictionary schematic diagram, as we can see from the figure, common background dictionary after sample training contains some more structured messages, sample dictionary some essential informations then containing pending image itself, these information have then been carried out organic fusion by the sparse dictionary obtained after both synthesis, intuitively can see some slight changes of single dictionary element from figure.

Fig. 5 is the filtering situation schematic diagram to partitioning scenario during one group of experimental image process and corresponding depth map thereof.Wherein, sculpture head in Fig. 5 (a) and the hole region on the left of hand (grey is coated with and retouches region) are the II class hole region detected, the hole region of all the other black is the I class hole region detected, can see that skin texture detection operator filters out II class hole region and is mostly the outline of straight line of building profile and step in background and the intersection of foreground people profile, if these regions adopt depth map filtering, easily make straight bending, then more easily obtain texture information based on sparse dictionary reparation and obtain good repairing effect; Fig. 5 (b) is depth map correspondingly, can see that the inventive method is only carried out smoothly asymmetric in I class hole region (mainly concentrate on foreground people do not have grey to be coated with retouch the profile left side edge of mark) to depth map.

Fig. 6 is the magnification region of grey box indicating in Fig. 5 (a), and main presentation the present invention is based on the image repair effect of sparse dictionary and the contrast with other conventional restorative procedure at present.Wherein, Fig. 6 (a) is the image do not repaired, and can be clearly seen that hole region mainly concentrates on the left side edge of foreground people; Fig. 6 (b) is the reparation in the asymmetric smothing filtering realization cavity based on depth map, and vertical outline line there occurs significantly bending can to see background building; Fig. 6 (c) is the reparation that patch-based texture synthesis realizes cavity, can see that the similar image block due to search well can not mate and causes the region of reparation to there is obvious texture entanglement phenomenon; Fig. 6 (d) is a kind of image repair method based on morphology constituent analysis, this method avoid obvious texture entanglement as seen, achieves good visual effect; Fig. 6 (e) achieves best visual effect, this is a kind of image repair method represented based on grouping sparsity, it is one of best at present image repair method, but from the statistics of the Riming time of algorithm of table 1 below, can see that the operational efficiency of the method is not high, constrain it 2D/3D conversion in practical application; Fig. 6 (f) have employed the inventive method, repairing effect of the present invention is similar to the existing method for amending image represented based on grouping sparsity, but from the statistics of the Riming time of algorithm of table 1 below, can see that the inventive method occupies obvious advantage in operational efficiency.

Table 1: each restorative procedure run time statistics in Fig. 6

Composition graphs 6 and table 1 can be found out, traditional restorative procedure based on rarefaction representation (MCA and packet-based rarefaction representation) is although the hole region repairing effect in DIBR virtual image is better, but operational efficiency is too low, and the inventive method is by the synthesis of off-line dictionary with sampling dictionary, equal amendment effect can be reached, and operational efficiency significantly improves, be more applicable to the process of massive video information in high definition 2D/3D conversion.

Therefore, traditional restorative procedure based on rarefaction representation (MCA and packet-based rarefaction representation) effectively cannot be applied to the application of 2D/3D conversion because operational efficiency is too low; And the present invention by the mode of sparse dictionary while maintaining repairing effect, significantly improve operational efficiency, make it on repairing effect and remediation efficiency, reach a balance, be more applicable for and require higher " high definition 2D/3D conversion " becoming more meticulous.What finally illustrate is, above embodiment is only in order to illustrate technical scheme of the present invention and unrestricted, other amendments that those of ordinary skill in the art make technical scheme of the present invention or equivalently to replace, only otherwise depart from the spirit and scope of technical solution of the present invention, all should be encompassed in the middle of right of the present invention.

Claims

1. be applicable to a DIBR virtual image restorative procedure for high definition 2D/3D conversion, it is characterized in that: comprise the steps:

A) common background dictionary D is generated _b;

C) cavity is estimated and process of classifying, and comprising:

D) to each II class cavity R _hIIcarry out filling repair process.

2. as claimed in claim 1 a kind of be applicable to high definition 2D/3D conversion DIBR virtual image restorative procedure, it is characterized in that,

Described steps A) generate common background dictionary D _b, comprising: in a natural image set, extract n ₁the image block of individual N*N pixel size, utilizes KSVD algorithm and OMP Algorithm for Training size to be the common background dictionary D of (N*N) × M _b.

3. as claimed in claim 2 a kind of be applicable to high definition 2D/3D conversion DIBR virtual image restorative procedure, it is characterized in that,

Described at video frame images I _fbackground area in construct sample word dictionary D _s, comprising:

4. as claimed in claim 3 a kind of be applicable to high definition 2D/3D conversion DIBR virtual image restorative procedure, it is characterized in that,

The described common background dictionary D according to generating _bwith sample word dictionary D _ssynthesis sparse dictionary D _i, comprising:

Utilize sparse coefficient representing matrix with the sample dictionary D of correspondence _s, obtain sparse dictionary

5. as claimed in claim 4 a kind of be applicable to high definition 2D/3D conversion DIBR virtual image restorative procedure, it is characterized in that,

Described step D) to each II class cavity R _hIIcarry out filling repair process, comprising:

P(t)＝C(t)D(t)

C (t) represents p _tbackground confidence level coefficient, and:

C (t) = \frac{Σ_{q &Element; ψ_{t}} E (q)}{< ψ_{t} >},

E (q) = \{\begin{matrix} 0 & &ForAll; q &Element; R_{H} \\ 1 & &ForAll; q &NotElement; R_{H} \end{matrix}

D (t) represents p _ttexture confidence level coefficient, and:

D (t) = \frac{1}{< ψ_{t} >} \underset{q &Element; ψ_{t}}{Σ} (λ_{v} \cdot | G {(p)}_{x} | + (1 - λ_{v}) \cdot | G {(p)}_{y} |)

At destination virtual image I _vmiddle by described target p to be repaired _tcentered by Pixel Dimensions be the capable N of N arrange image block as image block X to be repaired _d; Utilize described sparse dictionary D _iwith error threshold T to X _d, by formula

\min_{y_{d}} | y_{d} |_{1} s . t . | H * x_{d} - H * D_{I} * y_{d} |_{2} < = T

Carry out hole-filling, obtain sparse coefficient representing matrix wherein x _dx _dcolumn vector representation, y _dx _dsparse coefficient representing matrix, H is the mask matrix of image block to be repaired; Then sparse coefficient representing matrix is utilized with the sample dictionary D of correspondence _i, obtain the image block after repairing and will in image block put back to origin-location in destination virtual image;

6. a kind of DIBR virtual image restorative procedure being applicable to high definition 2D/3D conversion as claimed in claim 2, is characterized in that, steps A) described in the span of N be 5<=N<=15; Described n ₁span be 10000<=n ₁<=15000; The span of described M is 4*N*N<=M<=16*N*N.

7. a kind of DIBR virtual image restorative procedure being applicable to high definition 2D/3D conversion as described in claim 2 or 6, it is characterized in that, steps A) described in natural image set refer to any one group with natural land, streetscape or the picture set that do not have the indoor static scape of character activities to be the theme; And the number of image is no less than 20 width in set, image resolution ratio is not less than 512*512.

8. a kind of DIBR virtual image restorative procedure being applicable to high definition 2D/3D conversion as claimed in claim 1, is characterized in that, step C) described in cavity estimate operator and be:

R_{H} = {r (x, y) | η_{i} (x, y) - r (x, y) > α \cdot \frac{λ_{H} \cdot D_{m a x}}{D_{w i d t h}}}

η_{i} (x, y) = \{\begin{matrix} r (x + 1, y) & i f & i = 1 \\ r (x - 1, y) & i f & i = r \end{matrix}

9. as claimed in claim 1 a kind of be applicable to high definition 2D/3D conversion DIBR virtual image restorative procedure, it is characterized in that, step C) described in texture quantificational operators be:

F (p) = \underset{S (p) \cap \overset{&OverBar;}{R_{H}}}{Σ} (λ_{v} \cdot | G {(p)}_{x} | + (1 - λ_{v}) \cdot | G {(p)}_{y} |)

10. a kind of DIBR virtual image restorative procedure being applicable to high definition 2D/3D conversion as claimed in claim 1, is characterized in that, step C) described near the texture cavity of not too enriching refer to meet the hole region of following formula:

R _HI＝{r|r∈R _H,F(r)＜σ _texture·F _max(r)}