CN104506872A

CN104506872A - Method and device for converting planar video into stereoscopic video

Info

Publication number: CN104506872A
Application number: CN201410697508.1A
Authority: CN
Inventors: 张新; 柯家琪; 廖智宏
Original assignee: SHENZHEN KAIAOSI TECHNOLOGY Co Ltd
Current assignee: Hangzhou Youshu Technology Co., Ltd.
Priority date: 2014-11-26
Filing date: 2014-11-26
Publication date: 2015-04-08
Anticipated expiration: 2034-11-26
Also published as: CN104506872B

Abstract

The invention discloses a method and a device for converting a planar video into a stereoscopic video. The method comprises the following steps executed on each frame image of the planar frame: S1, acquiring a depth map D of a current frame image: acquiring a first depth map D1 through motion estimation based on block matching; constructing a second depth map D2 based on a geometric perspective relation through an edge detection algorithm and a Hough transform algorithm; estimating a third depth map D3 through a method based on color information; and executing deep integration on D1, D2 and D3 to obtain the depth map D; S2, generating a multi-viewpoint stereoscopic view through a reference map and the depth map D based on a DIBR (Depth Image Based Rendering) algorithm; and S3, selecting at least a part of left-eye and right-eye views from the multi-viewpoint stereoscopic view according to the stereoscopic video format requirement of a user for executing stereoscopic rendering to generate a color stereoscopic video of a corresponding format. The stereoscopic video generated through the method has a good stereoscopic effect, and stereoscopic videos of different formats can be generated according to user demands.

Description

A kind of method of converting plane video into stereoscopic video and device

Technical field

The present invention relates to a kind of method and device of converting plane video into stereoscopic video.

Background technology

Converting plane video into stereoscopic video technology, turn 3D technology also known as 2D, refer to and existing planar video is used necessary technological means, fully excavate the depth information in planar video, according to the virtual scene that depth information simulation multiple views is observed, thus reach the effect of three-dimensional perception.In three-dimensional video-frequency technology, realize three-dimensional perceived effect by binocular stereo vision.Binocular stereo vision utilizes the principle of binocular imaging, picture is perceived as by simulation binocular, in the right and left eyes of the people using special means to project respectively left and right two-way image or video, the brain of people can reconstruct the stereo scene in image or video, reaches three-dimensional perceived effect.

Stereo-picture is as a kind of mode of novel description three-dimensional world, and it not only comprises the surface information about scene of conventional planar image, but also comprises the 3 D stereo information relevant to scene particular location, i.e. depth information.Compared with traditional planar video, three-dimensional video-frequency can reflect more really to the concrete scene of objective world.

Can be obtained the depth map of scene by stereoscopic vision algorithm, depth map can reflect the corresponding front and back of scene of plane picture or the relation of distance.Usually in the application, depth map is represented with the gray level image that 8 positions are dark.On depth map, the value of certain point is 0, represent that the image of this point of corresponding flat image is positioned at the farthest of relative depth scope, on depth map, the depth value of certain point is 255, represent to should the plane picture on location point be positioned at relative depth scope the most nearby, in 0 ~ 255 scope other values then represent the degree of depth within the scope of some relative depths.

Current 3D video technique has achieved comparatively long-range development, market has occurred a series of stereoscopic video acquisition equipment from high-end to low side.Through technological accumulation for many years, the price of stereoscopic display device is also popular gradually, and 3D TV starts to come into increasing average family.But the behind of 3D Industry Prosperity in recent years, there is high-end stereoscopic acquisition equipment expensive, the three-dimensional film source shortage of high-quality, the difficult problems such as artificial 3D video production is with high costs, these difficult problems become the bottleneck of 3D video development gradually.

In addition, market also exists the stereoscopic display device based on multiple displaying principle, as raster pattern, shutter, polarization type etc.Usual shutter and polarization type display establishing need to wear special anaglyph spectacles and could watch, the display device of raster pattern does not then need Special spectacles to present stereo scene, but the three-dimensional video-frequency form that the stereoscopic display device of raster pattern is supported have binocular form and multiple views form point, and existing three-dimensional video-frequency processing unit often only can export the three-dimensional video-frequency of a certain form, and the stereoeffect of the three-dimensional video-frequency exported is poor, significantly limit the scope of application of three-dimensional video-frequency processing unit.

Summary of the invention

Main purpose of the present invention is the method and the device that propose a kind of converting plane video into stereoscopic video, to solve the single and technical problem that stereoeffect is not good enough of three-dimensional video-frequency form that existing three-dimensional video-frequency processing unit exports.

The method of the converting plane video into stereoscopic video that the present invention proposes is as follows:

A method for converting plane video into stereoscopic video, comprises and performs following steps to each two field picture of planar video:

The depth map D of S1, acquisition current frame image: by obtaining the first depth map D based on the estimation of Block-matching ₁; By the end point in edge detection algorithm and Hough transformation algorithm extraction current frame image and vanishing line, build the second depth map D according to picture depth and the relation between end point and vanishing line ₂; By method estimation the 3rd depth map D based on colouring information ₃; To described first depth map D ₁, described second depth map D ₂with described 3rd depth map D ₃the execution degree of depth merges, to obtain the depth map D of described current frame image;

S2, based on DIBR algorithm, generate multi-viewpoint three-dimensional view by reference diagram and described depth map D, wherein said reference diagram is described current frame image, and described multi-viewpoint three-dimensional view comprises multipair right and left eyes view;

S3, three-dimensional video-frequency output format requirement according to user, choose right and left eyes view described at least one pair of and perform solid and play up, to generate the color solid video of corresponding format from described multi-viewpoint three-dimensional view.

The method of above-mentioned converting plane video into stereoscopic video, diverse ways is adopted to obtain different depth maps to each two field picture, again the depth map that these obtain in different ways is weighted fusion, obtain the depth map that of each two field picture is final, again based on this final depth map and this two field picture, adopt DIBR algorithm to generate multi-viewpoint three-dimensional view, perform solid and play up rear generation three-dimensional video-frequency.Owing to trying to achieve multiple depth map to single-frame images by distinct methods in this programme, carry out degree of depth fusion again and obtain final depth map, follow-up process is performed based on this final depth map, the stereoeffect of the three-dimensional video-frequency therefore finally obtained is good, and, multi-viewpoint three-dimensional view due to what obtain, therefrom can select different views to (view is to comprising left-eye image and eye image), generate the three-dimensional video-frequency of different-format, such as, select a pair three-dimensional view of wherein a certain viewpoint, generate red blue form three-dimensional video-frequency, binocular form three-dimensional video-frequency or side-by-side form three-dimensional video-frequency, also multipair three-dimensional view can be selected, generate multiple views form three-dimensional video-frequency or row interleaving format three-dimensional video-frequency, user can carry out the three-dimensional video-frequency of the corresponding format that solid is played up to obtain according to the displaying principle of its stereoscopic display device, the stereoscopic display device for different displaying principle shows.

The device of the converting plane video into stereoscopic video that the present invention proposes is as follows:

A device for converting plane video into stereoscopic video, comprises control module, cache module, video conversion module and three-dimensional rendering module; Described cache module is for storing pending rgb video and the intermediate object program of process; Described video conversion module is connected with described cache module, described three-dimensional rendering module respectively, for the plane picture of described pending rgb video is converted to multi-viewpoint three-dimensional view, and described multi-viewpoint three-dimensional view is inputed to described three-dimensional rendering module, described multi-viewpoint three-dimensional view comprises multipair right and left eyes view; Described three-dimensional rendering module is used for the described right and left eyes view requiring to choose from described multi-viewpoint three-dimensional view at least one pair of according to the three-dimensional video-frequency output format of user, and the described right and left eyes view execution solid chosen is played up, generate the color solid video of corresponding format; Described control module is connected with described video conversion module, described three-dimensional rendering module respectively, and for requiring to be configured described device according to user, described user requires to include the requirement of described three-dimensional video-frequency output format.

The device of above-mentioned converting plane video into stereoscopic video provided by the invention compared to existing technology, have the following advantages: can according to user to the three-dimensional video-frequency call format exported, from the multipair right and left eyes view of multi-viewpoint three-dimensional view, carry out different selections, the three-dimensional view of selection is carried out solid play up, generate the three-dimensional video-frequency of corresponding format, this device can meet the stereoscopic display device of different displaying principle, and the scope of application is very extensive.

Accompanying drawing explanation

Fig. 1 is the method flow diagram of a kind of converting plane video into stereoscopic video that the specific embodiment of the invention provides;

Fig. 2 is the particular flow sheet of the step 40 in Fig. 1;

Fig. 3 is the schematic diagram that edge detection algorithm realizes in FPGA;

Fig. 4 is the schematic diagram that Hough transformation algorithm realizes in FPGA;

Fig. 5 carries out to depth map D the schematic diagram that bilateral filtering realizes in FPGA;

Fig. 6 is the device block diagram of a kind of converting plane video into stereoscopic video that the specific embodiment of the invention provides;

Fig. 7 is the operation principle block diagram of a kind of specific embodiment of video conversion module in Fig. 6;

Fig. 8 is the operation principle block diagram of a kind of specific embodiment of video input module in Fig. 6;

Fig. 9 is the operation principle block diagram of a kind of specific embodiment of Video Output Modules in Fig. 6;

Figure 10 is the operation principle block diagram of a kind of specific embodiment of cache module in Fig. 6.

Embodiment

Below in conjunction with the drawings and specific embodiments, the invention will be further described.

The specific embodiment of the present invention provides a kind of method of converting plane video into stereoscopic video, the method take FPGA as core processing device, hardware designs is realized by FPGA, the method comprises and performs following steps to each two field picture in pending video (planar video), can with reference to figure 1:

Step 10: start

Step 21: by obtaining the first depth map D based on the estimation of Block-matching ₁

Step 22: by the end point in edge detection algorithm and Hough transformation algorithm extraction current frame image and vanishing line, build the second depth map D according to picture depth and the relation between end point and vanishing line ₂

Step 23: by method estimation the 3rd depth map D based on colouring information ₃

Step 30: to the first depth map D ₁, the second depth map D ₂with the 3rd depth map D ₃the execution degree of depth merges, and obtains the depth map D of described current frame image

Step 40: based on DIBR (Depth Image Based Rendering, the drafting based on depth image) algorithm, generate multi-viewpoint three-dimensional view by a width reference diagram and a width depth map D, wherein said reference diagram is described current frame image

Step 50: according to the three-dimensional video-frequency call format of user, chooses at least part of view execution solid and plays up, to generate the color solid video of corresponding format from described multi-viewpoint three-dimensional view

Need to illustrate, in above-mentioned step, step 21,22 and 23 can perform simultaneously.

For step 21, a kind of concrete algorithm FSBMA (Full-search block matching algorithm) can be adopted, comprising: suppose a current frame image I ₁perform this step 21, then also need the previous frame image I extracting current frame image ₂as with reference to frame, to present frame and reference frame, adopt the estimation based on Block-matching, calculate the first motion vector, according to current frame image I ₁predictive frame image I is obtained with the first motion vector _pre; Again with predictive frame image I _preas reference frame, described previous frame image I ₂as present frame, calculate the second motion vector with aforesaid method, obtain described current frame image I according to described second motion vector ₁the first depth map D ₁, and the first depth map D ₁in the gray value of each point be I ₁previous frame image I ₂in the modulus value of motion vector of each pixel.Particularly, carry out aforesaid estimation and ask for motion vector and can adopt and be more suitable for hard-wired average angle value criterion (MAD) block matching criterion absolutely, that is:

MAD = \frac{1}{N^{2}} Σ_{u = 0}^{N - 1} Σ_{v = 0}^{N - 1} | I_{1} (x, y) - I_{2} (x + u, y + v) | - - - (1)

In above-mentioned formula (1), I ₁the grey scale pixel value that (x, y) is present frame, I ₂the grey scale pixel value that (x+u, y+v) is reference frame, N is the macroblock size selected, and (x, y) represents motion vector (or claiming displacement vector), and (u, v) represents the pixel coordinate in macro block.The method of (1) tries to achieve aforesaid first motion vector and the second motion vector respectively with the formula.

For step 22, described edge detection algorithm uses sobel operator, is respectively the horizontal operator at detection level edge

[\begin{matrix} - 1 & 0 & 1 \\ - 2 & 0 & 2 \\ - 1 & 0 & 1 \end{matrix}]

And detect the vertical operator of vertical edge

[\begin{matrix} - 1 & - 2 & - 1 \\ 0 & 0 & 0 \\ 1 & 2 & 1 \end{matrix}],

In FPGA, this edge detection algorithm realizes principle as shown in Figure 3, input original image (i.e. described current frame image), adopt above-mentioned horizontal operator and vertical operator, executive level (namely laterally) rim detection and vertical (namely longitudinally) rim detection respectively, again transverse edge detected image and longitudinal edge detected image are carried out gradient combination, carry out threshold processing, export edge-detected image.

Described Hough transformation algorithm uses polar equation to represent straight line, and concrete equation is as follows:

ρ＝xcosθ+ysinθ,0≤θ＜180 (2)

In above-mentioned formula (2), ρ represents the vertical line distance of initial point to straight line, and θ is the vertical line of initial point to straight line and the angle in x-axis direction, and x is the row-coordinate of the relative initial point of pixel, and y is the row coordinate of the relative initial point of pixel.Specific implementation principle in FPGA as shown in Figure 4, using the input as Hough transformation algorithm of the edge-detected image that obtains in Fig. 3, according to the calculation process in Fig. 4, export straight line parameter, obtain vanishing line, according to geometrical perspective relation " intersection point of vanishing line is end point, and end point is the maximum point of the degree of depth; picture depth along vanishing line from maximum change to minimum ", obtain the second depth map D of described current frame image ₂.

For step 23, can according to following realization: to described current frame image, calculate the blue component of each pixel and the difference (being designated as the first difference) of red component, the difference (being designated as the second difference) of blue component and green component, then the first difference and the second difference are done product to calculate, the result obtained is as the 3rd depth map D ₃the pixel value of middle respective pixel, thus form the 3rd complete depth map D ₃.

In FPGA, adopt hardware parallel computation structure to design, form parallel processing array at multiple processing unit and calculate the first depth map D simultaneously ₁, the second depth map D ₂with the 3rd depth map D ₃, improve real-time.

For step 30, can adopt depth map Weighted Fusion, the ultimate depth figure D required for acquisition, concrete methods of realizing is as follows:

To the first depth map D ₁, the second depth map D ₂with the 3rd depth map D ₃perform depth map Weighted Fusion D=α D ₁+ β D ₂+ γ D ₃, wherein alpha+beta+γ=1, and configure different α, β, γ values according to different video scenes, so, the image definition of the three-dimensional video-frequency generated through subsequent treatment based on depth map D is high and stereoeffect good.Such as, when in current frame image, artificial scene is more, weight coefficient is configured to artificial scene, that is, 0.5 < α < 1,0.2 < β < 0.5,0 < γ < 0.1, such as: α=0.6875, β=0.25, γ=0.0625; When in current frame image, natural scene is more, weight coefficient is configured to natural scene, i.e. 0.5 < α < 1,0 < β < 0.1,0.2 < γ < 0.5, such as: α=0.6875, β=0.0625, γ=0.25.

For step 40, adopt drafting (DIBR) algorithm based on depth image, using described current frame image as the reference diagram needed for DIBR algorithm, depth map D is carried out after bilateral filtering as the depth map needed for this algorithm, thus, generate multi-viewpoint three-dimensional view by a width reference diagram and a width depth map.Detailed process is as follows: simultaneously with reference to figure 2, and first carry out bilateral filtering as shown in Figure 5 with the comparatively level and smooth depth map of acquisition to depth map D, the formula of bilateral filtering is as follows,

BF {[I]}_{p} = \frac{1}{W_{p}} \underset{q &Element; S}{Σ} G_{σs} (| | p - q | |) G_{σr} (| I_{p} - I_{q} |) I_{q} - - - (3)

In above-mentioned formula (3), BF [I] _pbe the depth map after bilateral filtering, W _prepresent normalized parameter (to be converted between 0 ~ 255 by depth value), G _{σ s}, G _{σ r}represent the Gaussian function being standard deviation with σ s, σ r, I _p, I _qrepresent the pixel p of depth map D, the grey scale pixel value of pixel q respectively, S represents the neighborhood of pixel p.As shown in Figure 7, input depth map D, Gauss's weighted index of similarity degree between difference calculating pixel | I _p-I _q| with Gauss's weight of space length || p-q||, negative exponent computing module is used when adopting above-mentioned formula (3) to carry out bilateral filtering, with CORDIC (Coordinate Rotation DigitalComputer in FPGA, Coordinate Rotation Digital calculates) algorithm calculates hyperbolic cosine and the hyperbolic sine of variate-value respectively, and hyperbolic cosine result deducts hyperbolic sine string result and obtains negative exponential function result of calculation.Consider and facilitate FPGA to realize and the effect taking into account the three-dimensional video-frequency changed out, in this example, S gets the eight neighborhood of p, and σ s gets one of them in 4,8,16,32, and σ r gets one of them in 0.5,0.25,0.125,0.0625.

Level and smooth depth map BF [I] is obtained through above-mentioned bilateral filtering _pafter, based on described current frame image (reference diagram) and described depth map BF [I] _pimage mapped formula is used to carry out mapping to generate multi-viewpoint three-dimensional view, can with reference to figure 2, described image mapped formula is as follows:

x_{l} = x_{c} + \frac{t_{x}}{2} \cdot \frac{f}{Z} - - - (4 - 1)

x_{r} = x_{c} - \frac{t_{x}}{2} \cdot \frac{f}{Z} - - - (4 - 2)

In above-mentioned formula (4-1) and (4-2), x _crepresent the pixel abscissa of reference diagram, x _lrepresent the pixel abscissa of left-eye view, x _rrepresent the pixel abscissa of right-eye view; t _xrepresent the parallax range of right and left eyes view, change t _xcan change the parallax of right and left eyes view, f is right and left eyes view virtual video camera focal length, in this example, gets f=1, and Z is the depth value at respective pixel place.By changing parallax range t _x, multipair right and left eyes view can be obtained, thus form described multi-viewpoint three-dimensional view.Mean filter then can be utilized to carry out necessary cavity to described multi-viewpoint three-dimensional view fill and repair.

For step 50, specifically, such as, can be: add the three-dimensional video-frequency that access customer needs to obtain red blue form or side-by-side form, then from multi-viewpoint three-dimensional view, select a pair right and left eyes view, carry out solid and play up; If need to obtain multi-viewpoint three-dimensional video or row intertexture three-dimensional video-frequency, then need to choose multipair right and left eyes view and carry out solid and play up.Thus the color solid video of multiple format can be generated.

The specific embodiment of the present invention also provides a kind of device of converting plane video into stereoscopic video, as shown in Figure 6, this device mainly builds each operational module based on FPGA, and this device comprises control module, cache module, video conversion module and three-dimensional rendering module; Described cache module is for storing pending rgb video and the intermediate object program of process; Described video conversion module is connected with described cache module, described three-dimensional rendering module respectively, for the plane picture of described pending rgb video is converted to multi-viewpoint three-dimensional view, and described multi-viewpoint three-dimensional view is inputed to described three-dimensional rendering module, described multi-viewpoint three-dimensional view comprises multipair right and left eyes view; Described three-dimensional rendering module is used for requiring from described multi-viewpoint three-dimensional view, choose corresponding described right and left eyes view according to the three-dimensional video-frequency output format of user, and the described right and left eyes view execution solid chosen is played up, generate the color solid video of corresponding format; Described control module is connected with described video conversion module, described three-dimensional rendering module respectively, and for requiring to be configured described device according to user, described user requires to include the requirement of described three-dimensional video-frequency output format.

With reference to figure 7, in some specific embodiments, described video conversion module can comprise the first depth estimation module, the second depth estimation module, the 3rd depth estimation module, degree of depth Fusion Module and multi-viewpoint three-dimensional view generation module, and described first depth estimation module, described second depth estimation module are all connected with described cache module and described degree of depth Fusion Module with described 3rd depth estimation module.

Described first depth estimation module is used for performing based on the estimation of Block-matching, to obtain the first depth map D to a current frame image of described pending rgb video ₁, concrete implementation method is with reference to the step 21 of preceding method.

Described second depth estimation module is used for performing the estimation of geometrical perspective relation, to obtain the second depth map D to described current frame image ₂, concrete implementation is with reference to aforesaid step 22 and Fig. 3, Fig. 4.

Described 3rd depth estimation module is used for the estimation of described current frame image execution based on colouring information, to obtain the 3rd depth map D ₃, concrete implementation is with reference to aforesaid step 23.

Described degree of depth Fusion Module is used for performing depth map Weighted Fusion to described first depth map D1, described second depth map D2 and described 3rd depth map D3, and to obtain the depth map D of described current frame image, concrete implementation is with reference to aforesaid step 30.

Described multi-viewpoint three-dimensional view generation module is used for based on described depth map D and described current frame image, generates described multi-viewpoint three-dimensional view.As shown in Figure 7, multi-viewpoint three-dimensional view is generated after described video conversion module process, again by described three-dimensional rendering module, from multi-viewpoint three-dimensional view, select corresponding right and left eyes view carry out solid and play up, the three-dimensional video-frequency of different-format can be generated, with reference to the method for the aforementioned converting plane video into stereoscopic video provided, can not repeat them here.

In some specific embodiments, with reference to figure 6, described device also can comprise video input module and Video Output Modules: described video input module is connected with described cache module, for inputting described pending rgb video to described cache module; Described Video Output Modules is connected with described three-dimensional rendering module, for by described color solid video frequency output; Described control module comprises data configuration module and man-machine communication module, wherein, described data configuration module respectively with described video conversion module, described three-dimensional rendering module, described video input module, described Video Output Modules, described man-machine communication model calling.More preferably, described man-machine communication module comprises man-machine interface and host computer communication module, wherein man-machine interface can comprise key panel and remote control module, and user can complete configuration to described device by key panel, remote control module or host computer communication module.

In the embodiment be more preferably, with reference to figure 8, described video input module specifically comprises vision signal input panel 100, video input transducer group 300 and input signal selector 500, wherein, and described input signal selector 500 and described data configuration model calling; With reference to figure 9, described Video Output Modules specifically comprises output signal selection device 200, video frequency output transducer group 400 and vision signal output slab 600, wherein, and described output signal selection device and described data configuration model calling; As shown in FIG. 8 and 9, described vision signal input panel 100 and described vision signal output slab 600 have included various video interface, multiple interfaces such as such as AV interface, bnc interface, USB interface, described video input transducer group 300 and described video frequency output transducer group 400 include multiple transducers of corresponding often kind of described video interface respectively, such as AV transducer, BNC transducer, VGA transducer etc.; As shown in Fig. 6,8 and 9, described man-machine communication module is for inputting described user's requirement, and described data configuration module requires to carry out described configuration to described input signal selector 500, described signal outlet selector 200, described video conversion module, described three-dimensional rendering module respectively according to described user.Citing: if need the planar video from bnc interface to be converted to three-dimensional video-frequency, and play with the output of AV interface shape, then user is from man-machine communication module input configuration data, data configuration module is configured input signal selector 500, to select BNC transducer from described video input transducer group 300, the planar video of bnc interface input is changed, be converted to the 24bit rgb video (plane) that described video conversion module is general, input to described cache module or simultaneously input buffer module and video conversion module again, perform the method for aforesaid converting plane video into stereoscopic video, user is according to stereoscopic display device used, data configuration is carried out to three-dimensional rendering module, to render form (the such as red blue form that the stereoscopic display device used with it mates, multiple views form, side-by-side form, row interleaving format etc.) color solid video, user carries out data configuration to output signal selection device 200 again, output signal selection device 200 is made to select AV transducer from described video frequency output transducer group 400, the color solid Video Quality Metric exported by three-dimensional rendering module becomes the three-dimensional video-frequency with AV Interface Matching, i.e. exportable broadcasting.

User inputs data to be configured can to comprise various configurations to described device by man-machine communication module to data configuration module, except aforementioned description, described video scene (artificial scene or natural scene) can also be selected to regulate the effect of perspective transformations.Wherein the data configuration module i.e. configuration data be responsible for user inputs is configured to corresponding module.

Described remote control module such as comprises Digiplex and remote control signal receiving module, user can carry out data configuration by remote control module, Digiplex is used to complete the configuration of video signal input interface at a distance, the configuration of video signal output interface, the selection of the three-dimensional video-frequency form exported, the adjustment of three-dimensional video-frequency conversion effect, the operations such as the selection of video scene translative mode.Described key panel comprises the sub-button of multiple function, and user is by corresponding key configurations video signal input interface, and configuration video signal output interface, configuration exports three-dimensional video-frequency form, regulates the operations such as three-dimensional video-frequency conversion effect.

Described host computer communication module can communicate with PC host computer with certain means of communication, by host computer communication module, by status information transmission current for device to host computer, also can pass through the configuration of the complete twin installation of this module on host computer.

Described cache module comprises storage control module in FPGA sheet and the outer Large Copacity SDRAM of sheet, with reference to Figure 10, the 24bit RGB digital video signal inputted for buffer memory front-end module (such as aforesaid video input module) and the data such as the intermediate object program of transfer algorithm and final result.Described storage control module uses FPGA ram in slice, FIFO control module, SDRAM control module to form asynchronous high-capacity FIFO structure buffer memory.In addition, the random read-write to SDRAM is realized by the sdram controller in described storage control module.

Above content is in conjunction with concrete preferred implementation further description made for the present invention, can not assert that specific embodiment of the invention is confined to these explanations.For those skilled in the art, without departing from the inventive concept of the premise, some equivalent to substitute or obvious modification can also be made, and performance or purposes identical, all should be considered as belonging to protection scope of the present invention.

Claims

1. a method for converting plane video into stereoscopic video, comprises and performs following steps to each two field picture of planar video:

S3, three-dimensional video-frequency call format according to user, choose right and left eyes view described at least one pair of and perform solid and play up, to generate the color solid video of corresponding format from described multi-viewpoint three-dimensional view.

2. the method for claim 1, is characterized in that: obtain described first depth map D in described step S1 ₁specifically comprise: the two continuous frames image extracted in buffer memory is respectively I ₁, I ₂, with I ₁as present frame, I ₂as reference frame, wherein I ₂for I ₁former frame, by the estimation based on Block-matching, calculate the first motion vector, according to present frame I ₁with described first motion vector, obtain predictive frame I _pre; Again with predictive frame I _preas reference frame, I ₂as present frame, by the estimation based on Block-matching, calculate the second motion vector, according to described second motion vector, obtain described first depth map D ₁, wherein, described first depth map D ₁in the gray value of each point be I ₂in the modulus value of motion vector of each pixel.

3. method as claimed in claim 2, is characterized in that: in FPGA, adopt parallel algorithm to carry out the Design of Hardware Architecture of described estimation, and in multiple processing units of FPGA, forms parallel processing array to calculate described first depth map D simultaneously ₁, described second depth map D ₂with described 3rd depth map D ₃.

4. the method for claim 1, it is characterized in that: picture depth described in described step S1 and the pass between end point and vanishing line are: the intersection point of vanishing line is end point, end point is the maximum point of the degree of depth, picture depth along vanishing line from maximum change to minimum.

5. the method for claim 1, is characterized in that: obtain described 3rd depth image D in described step S1 ₃specifically comprise: the second difference calculating the first difference of the blue component of each pixel in described current frame image and red component, blue component and green component, using the product of described first difference and described second difference as the pixel value of each pixel of described 3rd depth map, thus obtain described 3rd depth map D ₃.

6. the method for claim 1, is characterized in that: carry out described degree of depth fusion in described step S1 and specifically comprise to obtain described depth map D: to described first depth map D ₁, described second depth map D ₂with described 3rd depth map D ₃perform depth map Weighted Fusion D=α D ₁+ β D ₂+ γ D ₃, wherein alpha+beta+γ=1.

7. method as claimed in claim 6, it is characterized in that: described video scene comprises artificial scene and natural scene, when video scene is artificial field scape, 0.5 < α < 1,0.2 < β < 0.5,0 < γ < 0.1; When video scene is natural scene, 0.5 < α < 1,0 < β < 0.1,0.2 < γ < 0.5.

8. the method for claim 1, is characterized in that: described step S2 specifically comprises:

S21, bilateral filtering is carried out to described depth map D;

S22, use image mapped formula complete the mapping of multi-view image, generate described multi-viewpoint three-dimensional view;

S23, described multi-viewpoint three-dimensional view carried out to cavity and fill and repair.

9. a device for converting plane video into stereoscopic video, is characterized in that: comprise control module, cache module, video conversion module and three-dimensional rendering module;

Described cache module is for storing pending rgb video and the intermediate object program of process;

Described video conversion module is connected with described cache module, described three-dimensional rendering module respectively, for the plane picture of described pending rgb video is converted to multi-viewpoint three-dimensional view, and described multi-viewpoint three-dimensional view is inputed to described three-dimensional rendering module, described multi-viewpoint three-dimensional view comprises multipair right and left eyes view;

Described three-dimensional rendering module is used for requiring to choose right and left eyes view described at least one pair of from described multi-viewpoint three-dimensional view according to the three-dimensional video-frequency output format of user, and the described right and left eyes view execution solid chosen is played up, generate the color solid video of corresponding format;

Described control module is connected with described video conversion module, described three-dimensional rendering module respectively, and for requiring to be configured described device according to user, described user requires to include the requirement of described three-dimensional video-frequency output format.

10. device as claimed in claim 9, it is characterized in that: described video conversion module comprises the first depth estimation module, the second depth estimation module, the 3rd depth estimation module, degree of depth Fusion Module and multi-viewpoint three-dimensional view generation module, described first depth estimation module, described second depth estimation module are all connected with described cache module and described degree of depth Fusion Module with described 3rd depth estimation module;

Described first depth estimation module is used for performing based on the estimation of Block-matching, to obtain the first depth map D to a current frame image of described pending rgb video ₁, described second depth estimation module is used for performing the estimation of geometrical perspective relation, to obtain the second depth map D to described current frame image ₂, described 3rd depth estimation module is used for the estimation of described current frame image execution based on colouring information, to obtain the 3rd depth map D ₃;

Described degree of depth Fusion Module is used for described first depth map D ₁, described second depth map D ₂with described 3rd depth map D ₃perform depth map Weighted Fusion, to obtain the depth map D of described current frame image;

Described multi-viewpoint three-dimensional view generation module is used for based on described depth map D and described current frame image, generates described multi-viewpoint three-dimensional view.

11. devices as claimed in claim 9, is characterized in that: also comprise video input module and Video Output Modules: described video input module is connected with described cache module, for inputting described pending rgb video to described cache module; Described Video Output Modules is connected with described three-dimensional rendering module, for the described color solid video frequency output that will change out;

Described control module comprises data configuration module and man-machine communication module, wherein, described data configuration module respectively with described video conversion module, described three-dimensional rendering module, described video input module, described Video Output Modules, described man-machine communication model calling.

12. devices as claimed in claim 11, is characterized in that: described video input module comprises vision signal input panel, video input transducer group and input signal selector, wherein, and described input signal selector and described data configuration model calling;

Described Video Output Modules comprises output signal selection device, video frequency output transducer group and vision signal output slab, wherein, and described output signal selection device and described data configuration model calling;

Described vision signal input panel and described vision signal output slab have included various video interface, and described video input transducer group and described video frequency output transducer group include multiple transducers of corresponding often kind of described video interface respectively;

Described man-machine communication module is for inputting described user's requirement, and described data configuration module requires to carry out described configuration to described input signal selector, described signal outlet selector, described video conversion module, described three-dimensional rendering module respectively according to described user.

13. devices as claimed in claim 12, is characterized in that: described vision signal input panel is used for from various video interface access planar video; When a particular video frequency interface access planar video of described vision signal input panel, described input signal selector requires to select the particular converter corresponding with described particular video frequency interface from the multiple described transducer of described video input transducer group according to described user, described particular converter is used for the planar video of described particular video frequency interface input to be converted to described pending rgb video, and exports described cache module and described video conversion module to; Described user requires to include the data of the input interface for characterizing current plane video from described man-machine communication module input.

14. devices as claimed in claim 12, is characterized in that: described user require to include from the input of described man-machine communication module for characterizing the data that described color solid video need export from a certain particular video frequency interface; Described output signal selection device requires to select the particular converter corresponding with described particular video frequency interface from the multiple described transducer of described video frequency output transducer group according to described user, and it is the three-dimensional video-frequency with described particular video frequency Interface Matching that described particular converter is used for the described color solid Video Quality Metric from described three-dimensional rendering module.