CN102263979B - Depth map generation method and device for plane video three-dimensional conversion - Google Patents

Depth map generation method and device for plane video three-dimensional conversion Download PDF

Info

Publication number
CN102263979B
CN102263979B CN 201110223804 CN201110223804A CN102263979B CN 102263979 B CN102263979 B CN 102263979B CN 201110223804 CN201110223804 CN 201110223804 CN 201110223804 A CN201110223804 A CN 201110223804A CN 102263979 B CN102263979 B CN 102263979B
Authority
CN
China
Prior art keywords
frame image
current frame
depth map
described current
image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN 201110223804
Other languages
Chinese (zh)
Other versions
CN102263979A (en
Inventor
戴琼海
李唯一
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tsinghua University
Original Assignee
Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tsinghua University filed Critical Tsinghua University
Priority to CN 201110223804 priority Critical patent/CN102263979B/en
Publication of CN102263979A publication Critical patent/CN102263979A/en
Application granted granted Critical
Publication of CN102263979B publication Critical patent/CN102263979B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention discloses a depth map generation method for plane video three-dimensional conversion, comprising the following steps of: segmenting a current frame image based on scene static model distribution to obtain segmented regions, marking pixels and clustering the pixels to different regions to generate a static model depth map of the current frame image; selecting characteristic points ofthe current frame image and a reference frame image, calculating a motion vector of a moving object, and performing motion analysis on the motion vector to obtain motion analysis result of the current frame image and the reference frame image; generating a dynamic model depth map of the current frame image according to the segmented regions of the current frame image and the motion analysis result of the current frame image and the reference frame image; and adaptively fusing the static model depth map and the dynamic model depth map of the current frame image. The invention further disclosesa depth map generation device for plane video three-dimensional conversion. With the method and the device, a static model and a dynamic model of a scene are fused, a stereoscopic video with more vivid effect can be obtained, and better stereoscopic effect can be achieved.

Description

A kind of degree of depth drawing generating method and device of planar video three-dimensional
Technical field
The present invention relates to technical field of computer multimedia, particularly a kind of degree of depth drawing generating method and device of planar video three-dimensional.
Background technology
Three-dimensional video-frequency technology (also claiming the 3D video technique) is following Development of Multimedia Technology direction, and three-dimensional video-frequency can provide relief novel video technique.Compare with the single channel video, three-dimensional video-frequency generally has two video channels, and data volume will be far longer than the single channel video.The third dimension that three-dimensional video-frequency can provide human vision to experience.Nowadays, the 3D technology has been widely used in many aspects, comprises communication, broadcasting, medical treatment, education, computer game, animation etc.In addition, the fast development of 3D display makes people can experience third dimension preferably and obtains the maximal comfort of eyes.
Traditional 2D video converts the method for three-dimensional video-frequency to, be to utilize degree of depth clue single in the piece image such as the space geometry information of object texture information, scene, object relative size information etc., carry out the extraction of depth information, finally obtain image in the projection of three-dimensional world.But the scene in the most video often changes complexity, has both comprised that dynamic scene also comprises static scene, and therefore single degree of depth clue can not be applicable to all camera lenses from start to end.
In addition, the conventional motion parallax information is that the hypothesis object near apart from the beholder is faster than the speed of the object of which movement of distance, therefore utilizes the speed of the speed of related movement between the object can obtain the parallax depth map.But the scene of one section video is a lot of samples often.It is fast the object of which movement far away apart from the beholder to occur through regular meeting in the video, slow apart near object of which movement, and use conventional methods processing this moment again, will obtain wrong disparity map.
Summary of the invention
Purpose of the present invention is intended to solve at least one of above-mentioned technological deficiency.
First purpose of the present invention is to provide a kind of degree of depth drawing generating method of planar video three-dimensional, and this method merges static models and dynamic model, generates the depth map that adapts to the scene truth more, thereby obtains effect three-dimensional video-frequency more true to nature.
Second purpose of the present invention is to provide a kind of depth map generating apparatus of planar video three-dimensional.
For achieving the above object, the embodiment of first aspect present invention has proposed a kind of degree of depth drawing generating method of planar video three-dimensional, comprise the steps: current frame image is carried out cutting apart based on the image that the scene static models distribute, comprise the characteristic vector design of graphics topological structure that utilizes pixel in the described current frame image, according to the graph topological structure of described current frame image described current frame image is carried out image and cut apart to obtain cut zone, the pixel that belongs to same zone in the described cut zone is carried out mark to generate the static models depth map of described current frame image; Select the characteristic point of described current frame image and described reference frame image, follow the tracks of described characteristic point, calculate the motion vector of the moving object in described current frame image and the described reference frame image, described motion vector is carried out motion analysis to obtain the motion analysis result of described current frame image and described reference frame image; According to the cut zone of described current frame image and the motion analysis result of described current frame image and described reference frame image, generate the dynamic model depth map of described current frame image; The static models depth map of described current frame image and the dynamic model depth map of described current frame image are carried out degree of depth fusion.
Degree of depth drawing generating method according to the planar video three-dimensional of the embodiment of the invention, merge static models and dynamic model, generation both had been fit to the depth map that moving scene also is fit to static scene, thereby obtained effect three-dimensional video-frequency more true to nature, reached stereoeffect preferably.
The embodiment of second aspect present invention proposes a kind of depth map generating apparatus of planar video three-dimensional, comprise: static models depth map generation module, described static models depth map generation module makes up the graph topological structure of described current frame image for the characteristic vector of utilizing described current frame image pixel, according to the graph topological structure of described current frame image described current frame image is carried out image and cut apart to obtain cut zone, the pixel that belongs to same zone in the described cut zone is carried out the static models depth map that mark generates described current frame image; The motion analysis module, described motion analysis module is used for selecting the characteristic point of described current frame image and described reference frame image, and follow the tracks of described characteristic point, calculate the motion vector of the moving object in described current frame image and the described reference frame image, described motion vector is carried out motion analysis to obtain the motion analysis result of described current frame image and described reference frame image; Dynamic model depth map generation module, described dynamic model depth map generation module links to each other with described motion analysis module with described static models depth map generation module respectively, be used for according to the cut zone of described current frame image and the motion analysis result of described current frame image and described reference frame image, generate the dynamic model depth map of described current frame image; Degree of depth Fusion Module, described degree of depth Fusion Module links to each other with described dynamic model depth map generation module with described static models depth map generation module respectively, is used for the static models depth map of described current frame image and the dynamic model depth map of described current frame image are carried out adaptive degree of depth fusion.
Depth map generating apparatus according to the planar video three-dimensional of the embodiment of the invention, merge static models and dynamic model, generation both had been fit to the depth map that moving scene also is fit to static scene, thereby obtained effect three-dimensional video-frequency more true to nature, reached stereoeffect preferably.
The aspect that the present invention adds and advantage part in the following description provide, and part will become obviously from the following description, or recognize by practice of the present invention.
Description of drawings
Above-mentioned and/or the additional aspect of the present invention and advantage are from obviously and easily understanding becoming the description of embodiment below in conjunction with accompanying drawing, wherein:
Fig. 1 is the flow chart according to the degree of depth drawing generating method of the planar video three-dimensional of the embodiment of the invention;
Fig. 2 is the product process according to the static models depth map of the embodiment of the invention;
Fig. 3 is the product process according to the dynamic model depth map of the embodiment of the invention;
Fig. 4 analyzes schematic diagram according to the movable information of the embodiment of the invention;
Fig. 5 is the schematic diagram according to the depth map generating apparatus of the planar video three-dimensional of the embodiment of the invention.
Embodiment
Describe embodiments of the invention below in detail, the example of described embodiment is shown in the drawings, and wherein identical or similar label is represented identical or similar elements or the element with identical or similar functions from start to finish.Be exemplary below by the embodiment that is described with reference to the drawings, only be used for explaining the present invention, and can not be interpreted as limitation of the present invention.
The degree of depth drawing generating method of the planar video three-dimensional that the embodiment of the invention provides, input plane video at first, then planar video is divided into present frame, present frame and two branches of reference frame, generate static models depth map and the dynamic model depth map of current frame image respectively, again static models depth map and dynamic model depth map are carried out degree of depth fusion, thereby obtain the depth map after present frame merges.
Below with reference to the degree of depth drawing generating method of Fig. 1 to Fig. 4 description according to the planar video three-dimensional of the embodiment of the invention.
As shown in Figure 1, the degree of depth drawing generating method according to the planar video three-dimensional of the embodiment of the invention comprises the steps:
S101: current frame image is carried out cutting apart based on the image that the scene static models distribute.As shown in Figure 2, image is cut apart and is comprised following several steps:
S1011: after current frame image is input to internal memory, at first this current frame image carried out gray processing and handle.The step that gray processing is handled comprises that the color space conversion with 24 in each pixel RGB passage of current frame image is 8 color spaces of single channel.Handle by current frame image being carried out gray processing, thereby can reduce the quantity of deal with data in the subsequent step.Current frame image after then gray processing being handled carries out cluster operation.Be appreciated that to be that above-mentioned current frame image is carried out the step that gray processing handles is optional, namely can handle and directly carry out cluster operation without gray processing current frame image.
S1012: the process that image is cut apart can be used as the element clustering problem and handle, particularly, the basic element (for example pixel) of current frame image or other composition assembled being grouped in the chunk with similar component, namely current frame image is carried out cluster operation.
According to the difference of handling the data structure of image in the image cutting procedure, cluster operation comprises following dual mode:
1) in vector space, with the element of current frame image as a vector independently.The Partial Feature component of pixel as characteristic vector, is carried out the merger grouping by central authorities' algorithm that clusters with above-mentioned characteristic vector.The global characteristics of the current frame image that obtains by said method is very effective.But also have following shortcoming, this method is owing to utilize the space structure information of vector, thus a lot of detailed information that can not preserve current frame image, therefore through regular meeting with the disconnected zone errors of reality be integrated in the group.
2) utilize the spatial relationship of element in the current frame image and be communicated with boundary information, in spatial domain, handle current frame image, then current frame image is converted into the graph topological structure of two dimension.This method has been utilized spatial relationship and UNICOM's boundary information of element, thus the detailed information that can preserve current frame image.
Preferably, adopt the second way that current frame image is carried out cluster operation.
With current frame image be converted to graph topological structure G (V, E), wherein, V is the node in the image, E is the weight on the limit between the adjacent node.Vi can also can be a small subregion that uniform shapes, similar texture or other feature structures are arranged for independent pixel, for example piece image can be divided into several fritters of 5 * 5, and each fritter is as a fundamental node.In one embodiment of the invention, adopt color vector to explain the factor as characteristic vector.In feature space, Xi is the vector of RGB colouring information, the cost Ei on limit, j have one on the occasion of weights W i, j, Wi wherein, j has reflected node (i, j) the similitude degree between.The present invention adopts two Euclidean distances between the node to weigh Wi, the size of j.
Wi,j=||Xi-Xj||,
total = Σ 1 n W ( i , j ) ,
Wherein, n is the quantity of the node in the graph topological structure, and total is total Wi, the value of j.
All nodes in the image are linked together, make that total Wi, the value total of j are minimum cost.
According to the graph topological structure of the current frame image that obtains among the step S1021 current frame image is carried out image and cut apart to obtain a plurality of effective coverages.The effective coverage can be understood as significant zone, and particularly the effective coverage is the zone of the pixel correspondence that the user needs in the image.Particularly, the graph topological structure that step S1021 is obtained is cut apart to carry out image as static models.In one embodiment of the invention, adopt the minimum spanning tree method that the graph topological structure of current frame image is carried out image and cut apart, obtain image segmentation result.Adopt the minimum spanning tree method to carry out image and cut apart the communication information that can keep each node in the graph topological structure, and the boundary cost cost of all nodes with minimum is connected together.Thus can the dendrogram topological structure, image is divided into a plurality of effective coverages, i.e. a plurality of significant image-regions.
S102: the static models depth map that generates current frame image.
S1021: the image cut zone merges.
In the process of the structure that makes up minimum spanning tree, it also is the process of graph topological structure cluster.Because the texture in the image is too abundant, thereby the segmentation result that causes image to be cut apart obtaining is too trifling, need the image after tentatively cutting apart be optimized.
The quantity of the subregion that comprises at least in each effective coverage among the step S1012 at first is set, and wherein, the effective coverage can be considered as the bigger zone of scope, and the quantity that needs the subregion that comprises in the bigger zone of each scope at least is set.According to the quantity of each subregion of above-mentioned setting merger being carried out in a plurality of effective coverages then merges.Merge and to eliminate noise region effectively by the image cut zone being carried out merger, reach generally with the prospect of moving object and the purpose of background separation.
S1022: element marking.
By step S1021 image is divided into a plurality of significant zones, the pixel that will belong to same zone then is labeled as identical mark, finishes thus the image of current frame image is cut apart, and determines the affiliated area of each pixel.
S1023: the static models depth map that generates current frame image
The static models depth map that generates current frame image in this step is based on following setting: the top of plane picture is away from the beholder, and the bottom is near the beholder, and namely the top of image is made as maximum distance, and the bottom is made as minimum distance.Belong to all pixels in the identical image cut zone, belong to the inside of an object, thereby be endowed identical depth value.The depth value of different cut zone is along with this zone changes with the variation of the vertical absolute distance of image apex.Can obtain the static models depth map D of current frame image at last p
D p = 255 × cos tY [ i ] number _ of _ block [ i ] ] ,
Wherein, costY[i] summation of Y coordinate figure of all pixels in expression i zone, number_of_block[i] expression i zone comprises the quantity of pixel.
S103: present frame and reference frame are carried out movable information analysis based on the scene dynamics model profile.In step S103, utilize the dynamic model of scene, calculate the motion vector of characteristic point in present frame and the reference frame, object analysis component motion, camera motion component and compound movement component three's relation.As shown in Figure 3, the movable information analysis to present frame and reference frame comprises the steps:
S1031: characteristic point is selected.
Any one object all has the feature of self, for example some sharp point, edge line, boundary curve etc., and above-mentioned feature is called characteristic point, feature straight line, indicatrix etc.When object moved in the space, so long as in observer's visual range, the feature on the object just can reflect at video image.In other words, can come the motion of observation analysis object by the feature of moving object.
S1032: characteristic point is followed the trail of.
Follow the tracks of the characteristic point that occurs in current frame image and the reference frame image.One group of point is chosen in moving object on plane picture, and the coordinate of (t1 constantly) is p before the motion i(x i, y i), i=1,2 ...In like manner, motion back (t2 constantly) point moves to p ' i(x ' i, y ' i), i=1,2 ..., then the two-dimensional space displacement of the plane of delineation then is d (x i, t 1t 2), i=1,2 ...
S1033: calculating kinematical vector.
In above-mentioned steps, adopt static models that current frame image is cut apart and obtain a plurality of effective coverages after finishing, calculate the motion vector of each effective coverage.
Order
Figure BDA0000081444890000061
With
Figure BDA0000081444890000062
Distribute and represent present frame and reference frame,
Figure BDA0000081444890000063
In n region representation be α n, α nThe movement representation of middle pixel x is d (x; β n), β wherein nThe expression region alpha nMotion vector.In one embodiment of the invention, α nCan be following any one model: affine model, bilinear model or projection model.
Setting area α nOn error function be:
Figure BDA0000081444890000064
By being minimized, above-mentioned error can obtain motion vector β n
S1034: movable information analysis.
As shown in Figure 4, current frame image and reference frame image are divided into two parts, comprise central region (the I district among Fig. 4) and borderline region (the II district among Fig. 4).Wherein, the residing main region of foreground moving object in the piece image can roughly be reflected as the central region of image in the I district, and the II district is as the borderline region of image, three borderline regions that comprise top, left part and the right part of image, expression is used for the moving region of the camera of taking moving object.If there is camera motion, then whole scene all can be moved.Borderline region mainly reflects the zone of camera motion.Picture altitude is that H, picture traverse are W, the right (right margin in II district among Fig. 4) in the right in I district (right margin in I district among Fig. 4) and II district be spaced apart 1/5 of W, the top (coboundary in II district among Fig. 4) in the top in I district (coboundary in I district among Fig. 4) and II district be spaced apart 1/5 of H.
According to the motion vector among the above-mentioned steps S1033, central region and borderline region are analyzed, it is as follows to obtain the motion analysis result:
Wherein, cost1 represents the object component motion.
Wherein, cost2 represents camera motion component.
As cost2 during greater than the camera motion threshold value, illustrate there is camera motion that prospect and background are all moved.Otherwise be that camera is static, the scene of stationary background foreground object motion.Wherein, the camera motion threshold value is whether the camera of judging that sets in advance exists the threshold value of motion, as cost2 during greater than the camera motion threshold value, illustrates to have camera motion, otherwise judges that camera does not move.
Cost3 represents the compound movement component.
Wherein, the notable difference zone refers to that the histogrammic difference of the same area in present frame and the reference frame is greater than the zone of default compound movement threshold value.As cost3 during greater than the compound movement threshold value, then there is rambling compound movement in explanation.
S104: the dynamic model depth map that generates current frame image.
The image segmentation result that utilizes above-mentioned steps to obtain obtains dynamic model depth map D in conjunction with dynamic model m
D m=d cost1×cost1+d cost2×(1-cost2)+d cost3×(1-cost3),
Wherein, d Cost1Represent the motion parallax value of I district characteristic point, d Cost2Represent the motion parallax value of II district characteristic point, d Cost3Representative changes the motion parallax value of violent provincial characteristics point.
When the value of cost2 surpasses the camera motion threshold value of setting, d then Cost2* (1-cost2) value just exists, otherwise d Cost2* (1-cost2) be zero.When the value of cost3 surpasses the compound movement threshold value of setting, d then Cost3* (1-cost3) value just exists, otherwise d Cost3* (1-cost3) be zero.
The present invention is when calculating the motion model depth map, and analysis-by-synthesis object of which movement component, camera motion component and compound movement component are adjusted the each several part ratio dynamically, thereby obtain rational dynamic model depth map.
S105: static models depth map and dynamic model depth map are carried out degree of depth fusion.
With the static models depth map D that obtains in the above-mentioned steps pWith dynamic model depth map D mMerge to obtain merging the final depth map in back according to separately weighted value.
At first, the weight w2 of the dynamic model depth map of the weight 1 of static models depth map of current frame image and current frame image is set, calculates the depth map D after the current frame image degree of depth merges then Depth
D depth=w1×D p+w2×D m
Wherein, w 1+ w 2=1.
Utilize the segmentation result of the image that obtains in the above-mentioned steps, and reference Value, static models depth map and dynamic model depth map are carried out the degree of depth merge.
Wherein, when
Figure BDA0000081444890000073
During greater than default movement threshold, then to look like be dynamic scene to key diagram, then increases w 2, reduce w 1
When
Figure BDA0000081444890000081
Less than default movement threshold, and
Figure BDA0000081444890000082
Greater than zero the time, then to look like be static scene to key diagram, then reduces w 2, increase w 1
When
Figure BDA0000081444890000083
When equalling zero, illustrate that then scene is static fully, then w 2=0, w 1=1.
Degree of depth drawing generating method according to the planar video three-dimensional of the embodiment of the invention, merge static models and dynamic model, generation both had been fit to the depth map that moving scene also is fit to static scene, thereby obtained effect three-dimensional video-frequency more true to nature, reached stereoeffect preferably.
Below with reference to the depth map generating apparatus 500 of Fig. 5 description according to the planar video three-dimensional of the embodiment of the invention.
As shown in Figure 5, the depth map generating apparatus 500 according to the planar video three-dimensional of the embodiment of the invention comprises static models depth map generation module 510, motion analysis module 520, dynamic model depth map generation module 530 and depth map Fusion Module 540.Wherein, static models depth map generation module 510 links to each other with dynamic model depth map generation module 530 respectively with motion analysis module 520, and static models depth map generation module 510 links to each other with degree of depth Fusion Module 540 respectively with dynamic model depth map generation module 530.
After current frame image was input to internal memory, static models depth map generation module 510 at first carries out gray processing to this current frame image to be handled.The step that gray processing is handled comprises that the color space conversion with 24 in each pixel RGB passage of current frame image is 8 color spaces of single channel.Handle by current frame image being carried out gray processing, thereby can reduce the quantity of deal with data in the subsequent step.Current frame image after static models depth map generation module 510 is handled gray processing more then carries out cluster operation.Be appreciated that to be that above-mentioned current frame image is carried out the step that gray processing handles is optional, namely static models depth map generation module 510 can be handled and directly carry out cluster operation without gray processing current frame image.
The process that image is cut apart can be used as the element clustering problem and handle, static models depth map generation module 510 is assembled the basic element (for example pixel) of current frame image or other composition and is grouped in the chunk with similar component, namely current frame image is carried out cluster operation.
According to the difference of handling the data structure of image in the image cutting procedure, static models depth map generation module 510 carries out cluster operation and comprises following dual mode:
1) in vector space, with the element of current frame image as a vector independently.The Partial Feature component of pixel as characteristic vector, is carried out merger grouping, design of graphics topological structure by central authorities' algorithm that clusters with above-mentioned characteristic vector.The global characteristics of the current frame image that obtains by said method is very effective.But also have following shortcoming, this method is owing to utilize the space structure information of vector, thus a lot of detailed information that can not preserve current frame image, therefore through regular meeting with the disconnected zone errors of reality be integrated in the group.
2) utilize the spatial relationship of element in the current frame image and be communicated with boundary information, in spatial domain, handle current frame image, then current frame image is converted into the graph topological structure of two dimension.This method has been utilized spatial relationship and UNICOM's boundary information of element, thus the detailed information that can preserve current frame image.
Preferably, static models depth map generation module 510 adopts the second way that current frame image is carried out cluster operation.
Static models depth map generation module 510 with current frame image be converted to graph topological structure G (V, E), wherein, V is the node in the image, E is the weight on the limit between the adjacent node.Vi can also can be a small subregion that uniform shapes, similar texture or other feature structures are arranged for independent pixel, for example piece image can be divided into several fritters of 5 * 5, and each fritter is as a fundamental node.In one embodiment of the invention, adopt color vector to explain the factor as characteristic vector.In feature space, Xi is the vector of RGB colouring information, the cost Ei on limit, j have one on the occasion of weights W i, j, Wi wherein, j has reflected node (i, j) the similitude degree between.The present invention adopts two Euclidean distances between the node to weigh Wi, the size of j.
Wi,j=||Xi-Xj||,
total = Σ 1 n W ( i , j ) ,
Wherein, n is the quantity of the node in the graph topological structure, and total is total Wi, the value of j.
All nodes in the image are linked together, make that total Wi, the value total of j are minimum cost.
Static models depth map generation module 510 carries out image according to the graph topological structure of current frame image to current frame image cuts apart to obtain a plurality of effective coverages.Particularly, graph topological structure is cut apart to carry out image as static models.In one embodiment of the invention, static models depth map generation module 510 adopts the minimum spanning tree methods that the graph topological structure of current frame image is carried out image to cut apart, obtain image segmentation result.Adopt the minimum spanning tree method to carry out image and cut apart the communication information that can keep each node in the graph topological structure, and the boundary cost cost of all nodes with minimum is connected together.Thus can the dendrogram topological structure, image is divided into a plurality of effective coverages, i.e. a plurality of significant image-regions.
In the process of the structure that makes up minimum spanning tree, it also is the process of graph topological structure cluster.Because the texture in the image is too abundant, thereby the segmentation result that causes image to be cut apart obtaining is too trifling, and static models depth map generation module 510 need be optimized the image after tentatively cutting apart.
Static models depth map generation module 510 at first arranges the quantity of the subregion that comprises at least in each effective coverage, and wherein, the effective coverage can be considered as the bigger zone of scope, and the quantity that needs the subregion that comprises in the bigger zone of each scope at least is set.According to the quantity of each subregion of above-mentioned setting merger being carried out in a plurality of effective coverages then merges.Merge and to eliminate noise region effectively by the image cut zone being carried out merger, reach generally with the prospect of moving object and the purpose of background separation.
Static models depth map generation module 510 is divided into a plurality of significant zones by step S1021 with image, the pixel that will belong to same zone then is labeled as identical mark, finish thus the image of current frame image is cut apart, determine the affiliated area of each pixel.
The static models depth map that static models depth map generation module 510 generates current frame image is based on following setting: the top of plane picture is away from the beholder, and the bottom is near the beholder, and namely the top of image is made as maximum distance, and the bottom is made as minimum distance.Belong to all pixels in the identical image cut zone, belong to the inside of an object, thereby be endowed identical depth value.The depth value of different cut zone is along with this zone changes with the variation of the vertical absolute distance of image apex.Can obtain the static models depth map D of current frame image at last p
D p = 255 × cos tY [ i ] number _ of _ block [ i ] ] ,
Wherein, costY[i] summation of Y coordinate figure of all pixels in expression i zone, number_of_block[i] expression i zone comprises the quantity of pixel.
520 pairs of present frames of motion analysis module and reference frame carry out the movable information analysis based on the scene dynamics model profile, comprise the dynamic model that utilizes scene, calculate the motion vector of characteristic point in present frame and the reference frame, object analysis component motion, camera motion component and compound movement component three's relation.
At first, motion analysis module 520 is carried out the selection of characteristic point.Any one object all has the feature of self, for example some sharp point, edge line, boundary curve etc., and above-mentioned feature is called characteristic point, feature straight line, indicatrix etc.When object moved in the space, so long as in observer's visual range, the feature on the object just can reflect at video image.In other words, can come the motion of observation analysis object by the feature of moving object.
Motion analysis module 520 is followed the tracks of the characteristic point that occurs in current frame image and the reference frame image then.One group of point is chosen in moving object on plane picture, and the coordinate of (t1 constantly) is p before the motion i(x i, y i), i=1,2 ...In like manner, motion back (t2 constantly) point moves to p ' i(x ' i, y ' i), i=1,2 ..., then the two-dimensional space displacement of the plane of delineation then is d (x i, t 1t 2), i=1,2,
In the above-described embodiments, static models depth map generation module 510 adopts static models that current frame image is cut apart and obtains a plurality of effective coverages after finishing, and calculates the motion vector of each effective coverage by motion analysis module 520.
Order
Figure BDA0000081444890000102
With
Figure BDA0000081444890000103
Distribute and represent present frame and reference frame,
Figure BDA0000081444890000104
In n region representation be α n, α nThe movement representation of middle pixel x is d (x; β n), β wherein nThe expression region alpha nMotion vector.In one embodiment of the invention, α nCan be following any one model: affine model, bilinear model or projection model.
Motion analysis module 520 setting area α nOn error function be: By being minimized, above-mentioned error can obtain motion vector β n
Motion analysis module 520 is divided into two parts with current frame image and reference frame image, comprises central region (the I district among Fig. 4) and borderline region (the II district among Fig. 4).Wherein, the I district is as the central region of image, can reflect roughly that the residing main region II of foreground moving object district comprises three borderline regions of top, left part and the right part of image as the borderline region of image in the piece image, expression is used for the moving region of the camera of taking moving object.If there is camera motion, then whole scene all can be moved.Borderline region mainly reflects the zone of camera motion.Picture altitude is that H, picture traverse are W, the right (right margin in II district among Fig. 4) in the right in I district (right margin in I district among Fig. 4) and II district be spaced apart 1/5 of W, the top (coboundary in II district among Fig. 4) in the top in I district (coboundary in I district among Fig. 4) and II district be spaced apart 1/5 of H.
According to the motion vector that obtains, central region and borderline region are analyzed, it is as follows to obtain the motion analysis result:
Wherein, cost1 represents the object component motion.
Figure BDA0000081444890000113
Wherein, cost2 represents camera motion component.
As cost2 during greater than the camera motion threshold value, illustrate there is camera motion that prospect and background are all moved.Otherwise be that camera is static, the scene of stationary background foreground object motion.Wherein, the camera motion threshold value is whether the camera of judging that sets in advance exists the threshold value of motion, as cost2 during greater than the camera motion threshold value, illustrates to have camera motion, otherwise judges that camera does not move.
Figure BDA0000081444890000114
Cost3 represents the compound movement component.
Wherein, the notable difference zone refers to that the histogrammic difference of the same area in present frame and the reference frame is greater than the zone of default compound movement threshold value.As cost3 during greater than the compound movement threshold value, then there is rambling compound movement in explanation.
Dynamic model depth map generation module 530 obtains dynamic model depth map D when utilizing the image segmentation result that obtains in conjunction with dynamic model m
D m=d cost1×cost1+d cost2×(1-cost2)+d cost3×(1-cost3),
Wherein, d Cost1Represent the motion parallax value of I district characteristic point, d Cost2Represent the motion parallax value of II district characteristic point, d Cost3Representative changes the motion parallax value of violent provincial characteristics point.
When the value of cost2 surpasses the camera motion threshold value of setting, d then Cost2* (1-cost2) value just exists, otherwise d Cost2* (1-cost2) be zero.When the value of cost3 surpasses the compound movement threshold value of setting, d then Cost3* (1-cost3) value just exists, otherwise d Cost3* (1-cost3) be zero.
The present invention is when calculating the motion model depth map, and analysis-by-synthesis object of which movement component, camera motion component and compound movement component are adjusted the each several part ratio dynamically, thereby obtain rational dynamic model depth map.
Depth map Fusion Module 540 is with the above-mentioned static models depth map D that obtains pWith dynamic model depth map D mMerge to obtain merging the final depth map in back according to separately weighted value.
At first, depth map Fusion Module 540 arranges the weight w2 of the dynamic model depth map of the weight w1 of static models depth map of current frame image and current frame image, calculates the depth map D after the current frame image degree of depth merges then Depth
D depth=w1×D p+w2×D m
Wherein, w 1+ w 2=1.
Depth map Fusion Module 540 utilizes the segmentation result of the image that obtains in the above-mentioned steps, and reference
Figure BDA0000081444890000121
Value, static models depth map and dynamic model depth map are carried out the degree of depth merge.
Wherein, when
Figure BDA0000081444890000122
During greater than default movement threshold, then to look like be dynamic scene to key diagram, then increases w 2, reduce w 1
When Less than default movement threshold, and
Figure BDA0000081444890000124
Greater than zero the time, then to look like be static scene to key diagram, then reduces w 2, increase w 1
When
Figure BDA0000081444890000125
When equalling zero, illustrate that then scene is static fully, then w 2=0, w 1=1.
Depth map generating apparatus according to the planar video three-dimensional of the embodiment of the invention, merge static models and dynamic model, generation both had been fit to the depth map that moving scene also is fit to static scene, thereby obtained effect three-dimensional video-frequency more true to nature, reached stereoeffect preferably.
In the description of this specification, concrete feature, structure, material or characteristics that the description of reference term " embodiment ", " some embodiment ", " example ", " concrete example " or " some examples " etc. means in conjunction with this embodiment or example description are contained at least one embodiment of the present invention or the example.In this manual, the schematic statement to above-mentioned term not necessarily refers to identical embodiment or example.And concrete feature, structure, material or the characteristics of description can be with the suitable manner combination in any one or more embodiment or example.
Although illustrated and described embodiments of the invention, for the ordinary skill in the art, be appreciated that without departing from the principles and spirit of the present invention and can carry out multiple variation, modification, replacement and modification to these embodiment that scope of the present invention is by claims and be equal to and limit.

Claims (17)

1. the degree of depth drawing generating method of a planar video three-dimensional is characterized in that, comprises the steps:
Current frame image is carried out cutting apart based on the image that the scene static models distribute, comprise the characteristic vector design of graphics topological structure that utilizes pixel in the described current frame image, according to the graph topological structure of described current frame image described current frame image is carried out image and cut apart to obtain cut zone, the pixel that belongs to same cut zone in the described cut zone is carried out mark to generate the static models depth map of described current frame image, wherein, the static models depth map D of current frame image p
D p = 255 × cos tY [ i ] number _ of _ block [ i ] ,
Wherein, costY[i] summation of Y coordinate figure of all pixels of the same cut zone i of expression, number_of_block[i] the described same cut zone i of expression comprises the quantity of pixel;
Select the characteristic point of described current frame image and described reference frame image, follow the tracks of described characteristic point, calculate the motion vector of the moving object in described current frame image and the described reference frame image, described motion vector is carried out motion analysis to obtain the motion analysis result of described current frame image and described reference frame image;
According to the cut zone of described current frame image and the motion analysis result of described current frame image and described reference frame image, generate the dynamic model depth map of described current frame image;
The static models depth map of described current frame image and the dynamic model depth map of described current frame image are carried out adaptive degree of depth fusion.
2. the degree of depth drawing generating method of planar video three-dimensional according to claim 1 is characterized in that, before described design of graphics topological structure, also comprises:
Described current frame image is carried out gray processing handle, comprise that the color space with 24 of the pixel RGB triple channel of described current frame image is converted into 8 color spaces of single channel.
3. the degree of depth drawing generating method of planar video three-dimensional as claimed in claim 1 is characterized in that, the graph topological structure of the described current frame image of described structure comprises the steps:
The element of described current frame image as a vector independently, as characteristic vector, is constructed the graph topological structure of described current frame image with the Partial Feature component of the pixel of described current frame image.
4. the degree of depth drawing generating method of planar video three-dimensional as claimed in claim 1 is characterized in that, the graph topological structure of the described current frame image of described structure comprises the steps:
Utilize the spatial relationship of element in the described current frame image and be communicated with boundary information, in spatial domain, handle described current frame image, described current frame image is converted into the graph topological structure of two dimension.
5. the degree of depth drawing generating method of planar video three-dimensional as claimed in claim 1 is characterized in that, describedly described current frame image is carried out image cuts apart, and comprises the steps:
Adopt the minimum spanning tree method that the graph topological structure of described current frame image is cut apart that described current frame image is divided into a plurality of effective coverages.
6. the degree of depth drawing generating method of planar video three-dimensional as claimed in claim 5 is characterized in that, the pixel that belongs to same zone in described a plurality of effective coverages is carried out also comprising the steps: before the mark
The quantity of the subregion that comprises at least in described each effective coverage is set;
According to the quantity of the described subregion that arranges merger being carried out in described a plurality of effective coverages merges.
7. the degree of depth drawing generating method of planar video three-dimensional as claimed in claim 1 is characterized in that, describedly motion vector is carried out motion analysis comprises the steps:
Described current frame image and described reference frame image are divided into central region and borderline region, and wherein, described central region is the central region of image, the zone of living in of expression foreground moving object; Described borderline region comprises top boundary zone, left part borderline region and the right part borderline region of image, and expression is used for the moving region of the camera of the described moving object of shooting;
Described central region and described borderline region analysis are obtained described motion analysis result.
8. the degree of depth drawing generating method of planar video three-dimensional as claimed in claim 7 is characterized in that, described motion analysis result comprises: moving object component motion, camera motion component and compound movement component,
Wherein, described moving object component motion
Described camera motion component
Figure FDA00003472147400022
Described compound movement component
Figure FDA00003472147400023
Wherein, described notable difference zone is that the histogrammic difference of the same area in described current frame image and the described reference frame image is greater than the zone of default compound movement threshold value.
9. the degree of depth drawing generating method of planar video three-dimensional as claimed in claim 1 is characterized in that, described static models depth map and the dynamic model depth map of described current frame image with current frame image carries out the degree of depth and merge, and comprises the steps:
The weight of the dynamic model depth map of the weight of static models depth map of described current frame image and described current frame image is set, calculates the depth map D after the described current frame image degree of depth merges DepthFor:
D depth=w1×D p+w2×D m
Wherein, D pBe the static models depth map of described current frame image, D mBe the dynamic model depth map of described current frame image, w1 is the weight of the static models depth map of described current frame image, and w2 is the weight of the dynamic model depth map of described current frame image, and w1 and w2 satisfy w 1+ w 2=1.
10. the depth map generating apparatus of a planar video three-dimensional is characterized in that, comprising:
Static models depth map generation module, described static models depth map generation module makes up the graph topological structure of described current frame image for the characteristic vector of utilizing described current frame image pixel, according to the graph topological structure of described current frame image described current frame image is carried out image and cut apart to obtain cut zone, the pixel that belongs to same cut zone in the described cut zone is carried out the static models depth map that mark generates described current frame image, wherein, the static models depth map D of current frame image p
D p = 255 × cos tY [ i ] number _ of _ block [ i ] ,
Wherein, costY[i] summation of Y coordinate figure of all pixels of the same cut zone i of expression, number_of_block[i] the same cut zone i of expression comprises the quantity of pixel;
The motion analysis module, described motion analysis module is used for selecting the characteristic point of described current frame image and described reference frame image, and follow the tracks of described characteristic point, calculate the motion vector of the moving object in described current frame image and the described reference frame image, described motion vector is carried out motion analysis to obtain the motion analysis result of described current frame image and described reference frame image;
Dynamic model depth map generation module, described dynamic model depth map generation module links to each other with described motion analysis module with described static models depth map generation module respectively, be used for according to the cut zone of described current frame image and the motion analysis result of described current frame image and described reference frame image, generate the dynamic model depth map of described current frame image;
Degree of depth Fusion Module, described degree of depth Fusion Module links to each other with described dynamic model depth map generation module with described static models depth map generation module respectively, is used for the static models depth map of described current frame image and the dynamic model depth map of described current frame image are carried out adaptive degree of depth fusion.
11. the depth map generating apparatus of planar video three-dimensional as claimed in claim 10, it is characterized in that, described static models depth map generation module is before making up described graph topological structure, described current frame image is carried out gray processing handle, comprise that the color space with 24 of the pixel RGB triple channel of described current frame image is converted into 8 color spaces of single channel.
12. the depth map generating apparatus of planar video three-dimensional as claimed in claim 10 is characterized in that, described static models depth map generation module adopts one of following dual mode split image graph topological structure:
1) with the element of described current frame image as a vector independently, as characteristic vector, adopt central authorities' algorithm that clusters described characteristic vector merger to be divided into groups the split image graph topological structure Partial Feature component of the pixel of described current frame image;
2) utilize the spatial relationship of element in the described current frame image and be communicated with boundary information, in spatial domain, handle described current frame image, described current frame image is converted into the graph topological structure of two dimension, based on described graph topological structure split image.
13. the depth map generating apparatus of planar video three-dimensional as claimed in claim 10, it is characterized in that described static models depth map generation module adopts the minimum spanning tree method that the graph topological structure of described current frame image is carried out image to cut apart that described current frame image is divided into a plurality of effective coverages.
14. the depth map generating apparatus of planar video three-dimensional as claimed in claim 13, it is characterized in that, described static models depth map generation module arranges the quantity of the subregion that will comprise at least in described each effective coverage, according to the quantity of the described subregion that arranges the static models depth map that merger merges to generate described current frame image is carried out in described a plurality of effective coverages.
15. the depth map generating apparatus of planar video three-dimensional as claimed in claim 10, it is characterized in that, described dynamic model depth map generation module is divided into central region and borderline region with described current frame image and described reference frame image, and described central region and described borderline region analysis obtained described motion analysis result, wherein, described central region is the central region of image, the zone of living in of expression foreground moving object; Described borderline region comprises top boundary zone, left part borderline region and the right part borderline region of image, and expression is used for the moving region of the camera of the described moving object of shooting.
16. the depth map generating apparatus of planar video three-dimensional as claimed in claim 15 is characterized in that, described motion analysis result comprises: moving object component motion, camera motion component and compound movement component,
Wherein, described moving object component motion
Figure FDA00003472147400041
Described camera motion component
Described compound movement component
Figure FDA00003472147400043
Wherein, described notable difference zone is that the histogrammic difference of the same area in described current frame image and the described reference frame image is greater than the zone of compound movement threshold value.
17. the depth map generating apparatus of planar video three-dimensional as claimed in claim 10, it is characterized in that, described degree of depth Fusion Module arranges the weight of the dynamic model depth map of the weight of static models depth map of described current frame image and described current frame image, calculates the depth map D after the described current frame image degree of depth merges DepthFor:
D depth=w1×D p+w2×D m
Wherein, D pBe the static models depth map of described current frame image, D mBe the dynamic model depth map of described current frame image, w1 is the weight of the static models depth map of described current frame image, and w2 is the weight of the dynamic model depth map of described current frame image, and w1 and w2 satisfy w 1+ w 2=1.
CN 201110223804 2011-08-05 2011-08-05 Depth map generation method and device for plane video three-dimensional conversion Active CN102263979B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN 201110223804 CN102263979B (en) 2011-08-05 2011-08-05 Depth map generation method and device for plane video three-dimensional conversion

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN 201110223804 CN102263979B (en) 2011-08-05 2011-08-05 Depth map generation method and device for plane video three-dimensional conversion

Publications (2)

Publication Number Publication Date
CN102263979A CN102263979A (en) 2011-11-30
CN102263979B true CN102263979B (en) 2013-10-09

Family

ID=45010407

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 201110223804 Active CN102263979B (en) 2011-08-05 2011-08-05 Depth map generation method and device for plane video three-dimensional conversion

Country Status (1)

Country Link
CN (1) CN102263979B (en)

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102609950B (en) * 2012-03-14 2014-04-02 浙江理工大学 Two-dimensional video depth map generation process
US9098911B2 (en) 2012-11-01 2015-08-04 Google Inc. Depth map generation from a monoscopic image based on combined depth cues
CN102999892B (en) * 2012-12-03 2015-08-12 东华大学 Based on the depth image of region mask and the intelligent method for fusing of RGB image
US9299152B2 (en) * 2012-12-20 2016-03-29 Hong Kong Applied Science And Technology Research Institute Co., Ltd. Systems and methods for image depth map generation
CN103248906B (en) * 2013-04-17 2015-02-18 清华大学深圳研究生院 Method and system for acquiring depth map of binocular stereo video sequence
CN103200417B (en) * 2013-04-23 2015-04-29 华录出版传媒有限公司 2D (Two Dimensional) to 3D (Three Dimensional) conversion method
CN103945211A (en) * 2014-03-13 2014-07-23 华中科技大学 Method for generating depth map sequence through single-visual-angle color image sequence
CN107509043B (en) * 2017-09-11 2020-06-05 Oppo广东移动通信有限公司 Image processing method, image processing apparatus, electronic apparatus, and computer-readable storage medium
CN107742296A (en) * 2017-09-11 2018-02-27 广东欧珀移动通信有限公司 Dynamic image generation method and electronic installation
CN108924408B (en) * 2018-06-15 2020-11-03 深圳奥比中光科技有限公司 Depth imaging method and system
CN110602479A (en) * 2019-09-11 2019-12-20 海林电脑科技(深圳)有限公司 Video conversion method and system
CN112954293B (en) * 2021-01-27 2023-03-24 北京达佳互联信息技术有限公司 Depth map acquisition method, reference frame generation method, encoding and decoding method and device

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101640809A (en) * 2009-08-17 2010-02-03 浙江大学 Depth extraction method of merging motion information and geometric information
CN101765022A (en) * 2010-01-22 2010-06-30 浙江大学 Depth representing method based on light stream and image segmentation
CN101785025A (en) * 2007-07-12 2010-07-21 汤姆森特许公司 System and method for three-dimensional object reconstruction from two-dimensional images

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101785025A (en) * 2007-07-12 2010-07-21 汤姆森特许公司 System and method for three-dimensional object reconstruction from two-dimensional images
CN101640809A (en) * 2009-08-17 2010-02-03 浙江大学 Depth extraction method of merging motion information and geometric information
CN101765022A (en) * 2010-01-22 2010-06-30 浙江大学 Depth representing method based on light stream and image segmentation

Also Published As

Publication number Publication date
CN102263979A (en) 2011-11-30

Similar Documents

Publication Publication Date Title
CN102263979B (en) Depth map generation method and device for plane video three-dimensional conversion
CN101640809B (en) Depth extraction method of merging motion information and geometric information
CN111832655B (en) Multi-scale three-dimensional target detection method based on characteristic pyramid network
CN106504190B (en) A kind of three-dimensional video-frequency generation method based on 3D convolutional neural networks
CN103581650B (en) Binocular 3D video turns the method for many orders 3D video
CN102609950B (en) Two-dimensional video depth map generation process
CN102903096B (en) Monocular video based object depth extraction method
CN102609974B (en) Virtual viewpoint image generation process on basis of depth map segmentation and rendering
CN110570457B (en) Three-dimensional object detection and tracking method based on stream data
CN103051915B (en) Manufacture method and manufacture device for interactive three-dimensional video key frame
CN110688905B (en) Three-dimensional object detection and tracking method based on key frame
CN102098528B (en) Method and device for converting planar image into stereoscopic image
CN102592275A (en) Virtual viewpoint rendering method
CN101287142A (en) Method for converting flat video to tridimensional video based on bidirectional tracing and characteristic points correction
CN101588445A (en) Video area-of-interest exacting method based on depth
CN101742349A (en) Method for expressing three-dimensional scenes and television system thereof
CN107578430A (en) A kind of solid matching method based on adaptive weight and local entropy
CN102098526A (en) Depth map calculating method and device
CN103761734B (en) A kind of binocular stereoscopic video scene fusion method of time domain holding consistency
CN103702103B (en) Based on the grating stereo printing images synthetic method of binocular camera
CN112019828B (en) Method for converting 2D (two-dimensional) video into 3D video
CN103955945A (en) Self-adaption color image segmentation method based on binocular parallax and movable outline
CN115512073A (en) Three-dimensional texture grid reconstruction method based on multi-stage training under differentiable rendering
CN104574391A (en) Stereoscopic vision matching method based on adaptive feature window
CN102420985A (en) Multi-view video object extraction method

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant