CN111462030A - Multi-image fused stereoscopic set vision new angle construction drawing method - Google Patents

Multi-image fused stereoscopic set vision new angle construction drawing method Download PDF

Info

Publication number
CN111462030A
CN111462030A CN202010231534.0A CN202010231534A CN111462030A CN 111462030 A CN111462030 A CN 111462030A CN 202010231534 A CN202010231534 A CN 202010231534A CN 111462030 A CN111462030 A CN 111462030A
Authority
CN
China
Prior art keywords
image
superpixel
depth
pixel
visual angle
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010231534.0A
Other languages
Chinese (zh)
Inventor
高小翎
何克慧
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to CN202010231534.0A priority Critical patent/CN111462030A/en
Publication of CN111462030A publication Critical patent/CN111462030A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/50Image enhancement or restoration by the use of more than one image, e.g. averaging, subtraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects
    • G06T17/20Finite element generation, e.g. wire-frame surface description, tesselation
    • G06T17/205Re-meshing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/90Determination of colour characteristics
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20212Image combination
    • G06T2207/20221Image fusion; Image merging

Abstract

The invention provides a multi-image fusion stereoscopic set vision new angle construction drawing method, which provides depth sample interpolation based on a superpixel block, performs depth interpolation on the missing superpixel block by using known depth information to obtain a three-dimensional point cloud model with each area containing enough three-dimensional point cloud information, performs vision new angle construction by using the three-dimensional point clouds, transforms the image content of the known visual angle to the corresponding position on a vision new angle image, adopts a local deformation method based on the superpixel segmentation result, is relatively independent among the superpixel blocks, can perform parallel processing, and greatly improves the calculation speed. Aiming at the cavity area, sequential iterative improvement is continuously carried out on the cavity area through a block correction-based method until the cavity area is filled, and finally, a new visual angle image with strong sense of reality can be presented to a user, so that the new visual angle construction speed is high, and the visual effect is good.

Description

Multi-image fused stereoscopic set vision new angle construction drawing method
Technical Field
The invention relates to a visual new angle construction drawing method, in particular to a multi-image fusion stereoscopic set visual new angle construction drawing method, and belongs to the technical field of stereoscopic set visual angle construction.
Background
The stereo reconstruction technique plays a very important role in many fields in real life, such as virtual reality, animation, medical images, scenic spot tourism, and the like. Because the technology is widely applied, a plurality of methods are available for realizing stereo reconstruction, and currently, the technology which utilizes more stereo reconstruction can be divided into three types, the first type is to reconstruct a stereo set model through computer software, the second type is to reconstruct a three-dimensional model of a set by utilizing data acquired by scanning equipment, and the third type is to reconstruct a three-dimensional set based on a two-dimensional image set. Three-dimensional model reconstruction software is a three-dimensional model based on a computer system and animation rendering production software, for example, 3D TUdioMax is commonly used in the fields of animation production, industrial modeling, decoration design and the like, and Softimage and Maya are commonly adopted in the film and television field to produce film advertisements, film special effects and the like, so that a good three-dimensional model can be produced and rendered. However, the three-dimensional model reconstruction software is difficult to process for real scenes and large-scale scenes, and the software needs a relatively professional user to operate and use.
The three-dimensional model reconstruction method based on the scanning equipment is more, currently, the Kinect equipment developed by Microsoft is commonly used, the Kinect equipment is used for gradually scanning a scene, then, a depth image is subjected to filtering operation, then, ICP and PC L are used for registration, and the three-dimensional model of the scene is reconstructed.
The two methods have obviously larger consumption of manpower and material resources when the set is modeled, and the obtained three-dimensional model is not satisfactory when the real set or large-scale complex set is reconstructed. The three-dimensional reconstruction technology based on the vision method opens up a new road for the development of the field of three-dimensional reconstruction, the three-dimensional reconstruction technology based on the vision is to carry out three-dimensional reconstruction on a set based on an image set, a digital camera adopts a two-dimensional image of the set, then three-dimensional measurement is carried out by comprehensively utilizing the principles of computer vision, image processing and the like, the dimension measurement on the spot is not needed, and then computer processing is applied to finally obtain a three-dimensional model of the set. The stereo reconstruction technology based on vision has high applicability to the real scene, the reconstruction speed is high, and even the whole process can be completed without increasing any interaction. Therefore, stereo reconstruction based on visual methods is becoming an increasingly promising direction for research and application.
The computer vision can be further divided into a monocular vision method, a stereoscopic vision method and the like according to the difference of the number of cameras, wherein the monocular vision method is a method for performing set stereoscopic reconstruction by using a single camera, and the adopted images can be considered to be obtained from a single image or an image set of a single visual angle or an image set of a plurality of visual angles. If the image is a single visual angle, the related depth information needs to be calculated through the two-dimensional features of the input image, the process is relatively simple, and only a rough three-dimensional model is obtained. The method can be suitable for large-scale and complex scenes, and the final reconstruction result is relatively better if the image set is larger and the image resolution is higher, but the calculation amount and the processing time cost are relatively higher.
Another visual stereo reconstruction method is a stereo vision method, which is also called a binocular vision method because it uses two cameras aligned in the horizontal or vertical direction to acquire image data of a set. The method comprises the steps of calculating depth information by using parallax due to the fact that the parallax exists between two cameras, aligning the two cameras in the horizontal direction or the vertical direction to reduce the calculated amount of camera correction, meanwhile, correspondingly preprocessing collected images, calibrating the cameras, calculating related camera parameters, extracting and matching feature points of the two images, calculating a basic matrix of the cameras by using matched feature data, correcting the images by using an epipolar geometry principle, calculating the three-dimensional matching between the two images after correction, obtaining a parallax image, finally calculating the depth value of corresponding points according to the parallax image according to a triangulation method, obtaining a three-dimensional point cloud model of a scene, and meshing the point clouds to obtain a complete three-dimensional model.
In summary, there are many methods for stereo reconstruction, but there are many problems to be solved by refining to each specific direction, and there are great challenges to improve and enhance the stereo reconstruction in the prior art, and the stereo reconstruction technique in the prior art mainly has the following defects: firstly, due to the operation complexity and the specialty of the three-dimensional modeling software, only a few professionals can adopt and control the process of three-dimensional reconstruction, the operation difficulty of the three-dimensional modeling software is extremely high, and the three-dimensional model reconstruction software is difficult to process for real scenery and large-scale scenery; secondly, due to the limitation of the equipment and the instrument, the three-dimensional reconstruction technology based on the scanning instrument enables the scene to be processed to be too simple, monotonous and narrow, the process of acquiring data is very troublesome, the whole scene needs to be scanned in a mode of holding the equipment by hand, the cost for reconstruction is very high, the result of the existing method is not satisfactory, and the real effect cannot be displayed visually; thirdly, in comparison, the visual-based stereo reconstruction technology can simplify the whole stereo reconstruction process, conveniently acquire image information of a set, and can process large-scale complex sets, but most of the visual stereo reconstruction in the prior art can only process some regular artificial sets, because when processing complex sets, some areas with complex textures do not have enough matching feature points in the initially reconstructed three-dimensional point cloud, the information of spatial three-dimensional points in the areas cannot be obtained, but the three-dimensional point information is very important for subsequent visual construction as a constraint condition, a new visual angle obtained by direct deformation is not an image with complete information, certain noise and holes can exist, the final visual effect is seriously influenced, and the visual reconstruction effect is poor; fourth, the prior art lacks corresponding processing for an incomplete three-dimensional point cloud model of initial reconstruction, and does not design to interpolate depth samples by searching for an area block which is most similar to the missing three-dimensional information and has three-dimensional information, and cannot respectively deform according to a divided superpixel block as a unit, thereby causing large mutual influence between objects of different depths of the same scene, the speed of deformation processing is very slow, large-area cavity areas exist in a visual angle, more noise exists, and the final visual effect is seriously influenced.
Disclosure of Invention
Aiming at the defects of the prior art, the multi-image fusion stereoscopic scenery vision new angle construction drawing method provided by the invention provides a scenery three-dimensional point cloud model with enough three-dimensional point cloud information in each region by means of depth sample interpolation based on a super-pixel block and depth interpolation of the missing super-pixel block by means of the known depth information, and finally, the vision new angle construction is carried out by means of the three-dimensional point cloud information. Aiming at a small void region or a void region which is not true to be filled, the filling priority is calculated by a block correction-based method according to the content and the structure of the image, and the void region is continuously improved by sequential iteration until the void region is filled, so that a new visual angle image with strong sense of reality can be finally presented to a user, the new visual angle construction speed is high, and the visual effect is good.
In order to achieve the technical effects, the technical scheme adopted by the invention is as follows:
the method for constructing and drawing the new visual angle of the multi-image fused stereoscopic scenery comprises multi-image depth fusion, local deformation guided visual new angle construction and visual new angle processing rendering, wherein the multi-image depth fusion comprises a real-time topology preserved superpixel segmentation algorithm, calculation of a similar superpixel set, calculation of a most similar superpixel and depth sample interpolation, the local deformation guided visual new angle construction comprises three-dimensional point cloud constrained image deformation transformation and local deformation driven by superpixel segmentation, and the visual new angle processing rendering comprises visual new angle processing fusion and filling and correcting of a cavity region.
The method comprises the steps of establishing a drawing method for a multi-image fused stereoscopic scene vision new angle, further, aiming at monocular vision stereoscopic reconstruction based on an image set, obtaining a three-dimensional point cloud model of a scene in the first step of the reconstruction process, taking point cloud as a constraint condition, providing a depth sample interpolation method based on a super-pixel block, finding out a super-pixel block which is not reconstructed, finding out a super-pixel block which is optimal in spatial distance and color and contains depth information in an image range, and then carrying out depth interpolation on the missing super-pixel block by utilizing known depth information to obtain a scene three-dimensional point cloud model of which each region contains enough three-dimensional point cloud information; and then, constructing a new visual angle by utilizing three-dimensional point cloud information, wherein the key step of creating the new visual angle is image deformation, converting the image content of a known visual angle to a corresponding position on a new visual angle image, adopting a local deformation method based on the result of superpixel segmentation, enabling the deformation among each superpixel block not to influence each other, performing parallel processing to obtain most of the image content of the new visual angle, and calculating the priority of filling for a fine cavity area or a void area which is not filled with reality by a block correction-based method according to the content and the structure of the image, and performing sequential iterative improvement on the cavity area until the void area is filled, so as to finally present the new visual angle image with strong reality.
The invention provides a multi-image fused stereoscopic scenery vision new angle construction drawing method, further, the invention provides a multi-image depth fusion construction method, which constructs effective and enough used three-dimensional points, firstly, an image is segmented into superpixels, superpixels with poor reconstruction quality are determined by utilizing segmentation results and existing depth information and are called object regions, then, superpixels which are most similar to the object regions in color and closest in spatial distance are found out from the existing depth information to fill the object regions with missing depths, and finally, a scenery complete three-dimensional model is obtained, so that the requirement of vision new angle construction is met;
the invention provides a real-time topology-preserved superpixel segmentation algorithm, which realizes real-time superpixel segmentation from rough to fine, can preserve topology, adopts a rough to fine updating method, and achieves good effect in a minimization improvement process, wherein the detailed process comprises the following two steps: single image superpixel estimation and refinement from coarse to fine.
A multi-image fusion stereoscopic setting vision new angle construction drawing method is further characterized in that in single-image super-pixel estimation, a image is usedc∈ { 1.. multidot.f } denotes a superpixel F, a ═ to which each superpixel c belongs (a ═1,...,aN) Representing a set of all random variables of segmentation, wherein N represents the image size, forming a segmentation problem into an objective function, satisfying the appearance consistency and the regular shape, and adding constraint on the super-pixel size;
definition of diIs the average position of the ith super pixel, eiIs the average color of the ith super pixel. e ═ e (e)1,...,eF),d=(d1,...,dF) Representing the set of center and average positions of all superpixels, respectively. N is a radical of8Representing an eight neighborhood of pixel c, the single image superpixel estimation comprises the following parts according to the Markov random field energy formula: boundary length item, topological structure retention item, minimized size, shape regularization item and appearance consistency item;
boundary length term: keeping a superpixel regularized by ensuring that it has a small boundary length;
topology reservation entry: the property of forcing the super-pixels to remain connected, the unconnected representation being infinite;
minimizing the size: forcing 1/4 the size of the superpixel to be at least the original size;
shape regularization term: keeping the superpixels regular in shape;
appearance consistency term: the uniformity of color of each super pixel is maintained.
The method for constructing and drawing the new perspective of the multi-image fused stereoscopic scenery vision is further improved from rough to fine, firstly, the superpixels are initialized into regular grids, and then the average color and position of each superpixel are calculated; then iteratively refining each layer of refinement process from coarse to fine to achieve a locally good refinement of the objective function, wherein the list is initialized to all boundary blocks, and then it is checked in turn whether connectivity is violated when the label of the boundary block is changed, if connectivity is not violated, the allocation of the block is refined, if the allocation is changed for the block, the mean position and color are updated using the incremental mean equation for the two super-pixel blocks accordingly, the incremental mean equation being:
Figure BDA0002429434670000041
wherein b isn-1Is a previous estimate, anN is the size of the kth super pixel, which is a new element.
If the block at the end of the priority queue is at the boundary, the block's field is added to the queue and the process is repeated until the queue is empty and the next level of improvement is started.
A multi-image fusion stereoscopic scene vision new angle construction drawing method is further characterized in that in calculation of similar superpixel sets, all superpixels in an image are represented as a set A ═ A { A } according to a superpixel segmentation algorithm retained by real-time topologyi}i∈{0...n-1}And n is the number of superpixels in an image, then the reconstructed three-dimensional point cloud is projected on the image to obtain the depth value of each pixel x on the image, and the depth value is expressed as g [ p (x, y)];
The set of depth samples contained in each superpixel is denoted g [ A ]i]={c(x,y)∈Ai|g[c(x,y)]If the pixel point of the depth information is more than 0}, setting a superpixel block with less than 0.58 percent of pixel points of the depth information as a target superpixel, and setting the other superpixels as reliable superpixels;
the present invention employs converting the image to L AB color space and separately creating a histogram for each superpixel that divides each subspace L, A, B into 24 groups, respectivelyDistance, forming a 72-dimensional descriptor, denoted R, for each superpixel blockLab[Ai],Ai∈ A, then, calculating χ of the target superpixel and all superpixels with reliable depth2The distance between the first and second electrodes,
Figure BDA0002429434670000051
where R (i) is the value of the ith dimension in the histogram.
Selecting 32 most similar superpixel blocks with smaller distance to form a set, and expressing the set as N [ A ]i]And determining the number of the most similar superpixels according to the number of all superpixels.
A multi-image fusion stereoscopic setting vision new angle construction drawing method is further characterized in that in the calculation of the most similar superpixel, a superpixel set which is most similar is selected, and according to a superpixel block which is spatially closest to the Euclidean space distance of a target superpixel, the N [ A ] is further reducedi]The number of elements is generally reduced to 3 to 6;
the calculation of the most similar superpixel of the invention adopts a graph traversal algorithm to create a two-dimensional superpixel graph structure, if any two superpixels share a boundary, an edge is added between two corresponding nodes on the graph, and the weight of the edge is the χ of the histogram of L AB of the two superpixels2Distance, then calculating the target superpixel Ai TAnd each similar super pixel
Figure BDA0002429434670000052
By minimizing all possible A' si TTo AjCalculating the path value, then adopting shortest path algorithm to the obtained path, selecting three superpixels with shortest path to form a set
Figure BDA0002429434670000053
After obtaining the superpixel of the three shortest paths, i.e.
Figure BDA0002429434670000054
A histogram of the depth samples of the three superpixels is drawn. If the histogram has a single peak or two consecutive peaks, it indicates that the depth values of the three superpixels are similar, since the three superpixels are from the most similar superpixels, it indicates that their colors are very similar to the target superpixels, and they are also very similar spatially, then these superpixels belong to the same object, if some target superpixels can not find these three superpixels through these two steps, then these superpixels are marked as holes.
A multi-image fusion stereoscopic setting vision new angle construction drawing method is further characterized in that in depth sample interpolation, a super pixel set which is closest to a target super pixel block in space distance and color is obtained
Figure BDA0002429434670000061
Then according to
Figure BDA0002429434670000062
Interpolating the contained effective depth information into a target super-pixel block, randomly selecting 8-12 pixel points in the target super-pixel block for depth interpolation, and determining the number of the pixel points with interpolated depth according to the size of the super-pixel block to meet the subsequent constraint requirement; calculating the depth of the point according to the space distance between the image point of the original effective depth information and the interpolated point, and executing a depth interpolation algorithm on the target superpixel of each image; and supplementing an area which can not obtain depth information in the reconstruction process on the basis of the original three-dimensional point cloud, and respectively carrying out depth interpolation processing on each image in the image set to finally obtain a set three-dimensional point cloud model of which the three-dimensional information is enough for subsequent processing.
A multi-image fused stereoscopic setting vision new angle construction drawing method is further characterized in that a vision new angle is given in image deformation transformation of three-dimensional point cloud constraint, and a camera projection matrix B of the vision new angle is providednIf the image near the new angle of vision is known as D1,D2,...,DNAssuming an input image DiHas a camera matrix of BiFor each point B (x, y) ∈ D on the imageiWherein the three-dimensional point cloud capable of being projected to the input image range is represented as ZiThree-dimensional point q (X, Y, Z) ∈ Z in the three-dimensional point cloud of the setiThen, there exists a mapping relationship F from two-dimensional points on the image to three-dimensional points in spaceiThe formula is as follows:
Bi(q)=Bi(Fi(B))=B
the area to be deformed is firstly divided into a grid of n × m, and for a point B with a depth sample on the two-dimensional image, three vertexes of the triangle where the point B is positioned are expressed as (U)1,U2,U3) The initial triangle on the input image is a right triangle, and B is represented as (B) according to the center coordinate of the point B in the triangle1(B),b2(B),b3(B) If three of the triangles after deformation are defined as (U)1′,U2′,U3') two conditions need to be met during deformation: including a reprojection energy factor condition and a similarity transform factor condition.
In the method for constructing and drawing the new visual angle of the multi-image fused stereoscopic scenery, further, in the process of processing and fusing the new visual angle, after a camera matrix is given for each new visual angle, two images which are closest to the new visual angle in space are determined by camera parameters of input images, wherein the two images comprise a left image and a right image, then deformation processing is carried out on the two input images according to the new visual angle construction guided by local deformation respectively to obtain the new visual angle images of the two input images at the new visual angle after deformation, and the deformed results of the two input images are utilized to complement missing information and a cavity area; meanwhile, input image information closest to the new visual angle is reserved in the processing and fusion process, corresponding weight is increased, and visual angle image information slightly far away is used as supplementary information to obtain an image of the new visual angle with more complete information; selecting more input visual angle images which are closest to the new visual angle to perform deformation operation, and then processing and fusing;
selecting pixel values on the new visual angle image nearby, fusing the multiple images onto the new visual angle image according to weighting, if no effective pixel value can be found on the multiple images, marking the pixel value of the point as (0, 255), indicating that the point is a hole, and then storing the (0, 255) as a mask for subsequent hole filling operation; the filtering mode can remove obvious noise.
Compared with the prior art, the invention has the advantages that:
1. the invention provides a method for constructing and drawing a new visual angle of a multi-image fused three-dimensional scenery, which comprises the steps of providing depth sample interpolation based on a superpixel block, finding out the superpixel block which is optimal in spatial distance and color and contains depth information in an image range, then carrying out depth interpolation on the missing superpixel block by utilizing the known depth information to finally obtain a scenery three-dimensional point cloud model with each area containing enough three-dimensional point cloud information, and then carrying out visual new angle construction by utilizing the three-dimensional point cloud information, wherein the key step of creating the visual new angle is image deformation, the image content of the known visual angle is transformed to the corresponding position on the visual new angle image, and based on the result of superpixel segmentation, a local deformation method is adopted, each superpixel block is relatively independent, so that the deformation among the superpixel blocks is not influenced mutually and can be processed in parallel, the calculation speed is greatly improved. Aiming at a small void region or filling a void region which is not true, the invention calculates the priority of filling by a block correction-based method according to the content and the structure of the image, and continuously improves the void region by sequential iteration until the void region is filled, and finally the visual new angle image with stronger true sense in vision can be presented to a user.
2. The multi-image fused stereoscopic set vision new angle construction drawing method provided by the invention is directed at the problem that the set visual angle is limited, provides a quick visual new angle construction drawing method with better visual effect, can construct more visual new angles with different positions according to the existing visual angle, and can carry out more understanding and display on the set, and the constructed visual new angle has stronger sense of reality. Firstly, based on the spatial deformation theory of stereo reconstruction, one visual angle is converted to another visual angle, in order to improve the accuracy and reduce noise, the image is divided into a plurality of superpixels, and the superpixels with relatively independent information are deformed, so that the processing speed is accelerated, a post-processing method is adopted for the reconstructed visual angle, and a local self-adaptive method is further fused aiming at an artifact region, so that a visual new angle image with stronger reality in visual effect is finally obtained.
3. The invention provides a multi-image fusion stereoscopic setting vision new angle construction drawing method, which uses a stereoscopic setting reconstruction method based on a super-pixel block. For an input image set, dividing each image into a plurality of superpixel blocks with the same size, wherein the colors of the blocks are basically consistent, each block only belongs to one object in a scene, filling an incomplete three-dimensional point cloud model of initial reconstruction by using the property, performing depth sample interpolation by searching an area block which is most similar to the missing three-dimensional information and has three-dimensional information, and simultaneously, deforming the superpixel blocks respectively according to the divided superpixel blocks as units in the image deformation process. The method not only reduces the influence among objects with different depths in the same scene in the deformation process, but also greatly improves the speed of deformation processing.
4. The invention provides a multi-image fusion stereoscopic set vision new angle construction drawing method, and provides a vision new angle construction method based on three-dimensional point constraint. Finding two original input visual angles with the closest visual new angle in space, taking the initially reconstructed three-dimensional point cloud as a constraint condition, respectively deforming the two original input visual angles to the visual new angle, and fusing the image deformation results of the two original visual angles together to obtain most of image information in the visual new angle. Finally, a new visual angle image with strong reality sense can be presented.
Drawings
Fig. 1 is a structural diagram of a method for constructing and drawing a new perspective view of a multi-image fused stereoscopic scene according to the present invention.
FIG. 2 is a schematic diagram of the image transformation process of the three-dimensional point cloud constraint of the present invention.
FIG. 3 is a schematic diagram of the local deformation process of the super-pixel division driving of the present invention.
FIG. 4 is a schematic view of the new visual angle processing fusion effect of the present invention.
Detailed Description
The technical scheme of the multi-image fused stereoscopic set vision new angle construction drawing method provided by the invention is further described below with reference to the accompanying drawings, so that a person skilled in the art can better understand the invention and can implement the method.
The invention provides a multi-image fusion stereoscopic setting vision new angle construction and drawing method, which comprises multi-image depth fusion, local deformation guided vision new angle construction and vision new angle processing rendering; the multi-image depth fusion comprises a real-time topology-preserved superpixel segmentation algorithm, calculation of a similar superpixel set, calculation of a most similar superpixel and depth sample interpolation, wherein the real-time topology-preserved superpixel segmentation algorithm comprises single-image superpixel estimation and rough-to-fine improvement; constructing a visual new angle guided by local deformation, wherein the visual new angle comprises image deformation transformation constrained by a three-dimensional point cloud and local deformation driven by superpixel segmentation; and the visual new angle processing and rendering comprises the processing and fusion of the visual new angle and the filling and correction of the cavity area.
As shown in fig. 1, the present invention is directed to monocular visual stereo reconstruction based on image set, and since the reconstructed object is a real-world scene, the final reconstruction effect must be consistent with human visual perception, so that the reconstruction result is more realistic. The first step of the reconstruction process is to obtain a three-dimensional point cloud model of the scene, and due to the sparsity of the point cloud, the point cloud can only be used as a constraint condition, and enough but not too much constraint point information must be provided in every place of the scene as the constraint condition. Aiming at the problem, the invention provides a depth sample interpolation method based on a super-pixel block, which can divide an image into the super-pixel blocks with smaller sizes and consistent colors, find out the super-pixel blocks which are not reconstructed according to the characteristic, and find out the super-pixel blocks which are optimal in space distance and color and contain depth information in an image range. The missing super-pixel blocks are then depth interpolated using these known depth information. Finally, a set three-dimensional point cloud model with enough three-dimensional point cloud information in each area is obtained. And then, constructing a new visual angle by using the three-dimensional point cloud information, wherein the key step of creating the new visual angle is image deformation, the image content of a known visual angle is converted to a corresponding position on the new visual angle image, and severe distortion is generated on the new visual angle image due to global deformation, particularly among objects with obvious parallax and depth difference. Based on the result of the super-pixel segmentation, the problem is avoided by adopting a local deformation method, and the relative independence between each super-pixel block ensures that the deformation between the super-pixel blocks does not influence each other, so that the super-pixel blocks can be processed in parallel, and the calculation speed is greatly improved. Although most image contents of a new visual angle can be obtained through the method, a small void area or a void area which is not true can be filled finally.
One, multiple image depth fusion
The first step of multi-image fusion stereo reconstruction is to obtain three-dimensional point cloud according to an existing image set, because the three-dimensional point cloud has an important constraint effect on the subsequent visual new angle construction, if a certain area three-dimensional point is missing, the information of a corresponding area cannot be obtained in the visual new angle, and therefore, it is very important to obtain the area three-dimensional information of a scene as much as possible.
Even if the best three-dimensional point cloud reconstruction method is adopted, the depth of some important areas in the scene cannot be reconstructed, and because the complex scene texture is complex, the feature points among the images cannot be correctly matched in the point cloud reconstruction process, and the point cloud reconstruction method cannot become the final effective three-dimensional point and is abandoned.
Although the prior art method cannot directly reconstruct the three-dimensional points of a complex scene region, the depth information of the region can be constructed based on the existing image information. The invention provides a multi-image depth fusion construction method. Three-dimensional points can be constructed that are efficient and adequate for use. First, an image is divided into superpixels, and a superpixel block with poor reconstruction quality, called an object region, is determined by using the division result and the existing depth information. Then find out the super pixel block which is most similar to the object area in color and has the nearest space distance in the existing depth information to fill the object area with the missing depth. Finally, a three-dimensional model with complete scenery is obtained, and the requirement of constructing a new visual angle is met.
Real-time topology preserving superpixel segmentation algorithm
The invention requires that the super-pixel algorithm should have high running speed, strong real-time performance, good reliability and normalization and good topological consistency of image segmentation. Whereas the prior art does not meet these requirements. The shortcomings of the prior art methods, and the real-time requirements of the present invention for superpixels, are addressed. The invention provides a real-time topology-preserved superpixel segmentation algorithm, which realizes real-time superpixel segmentation from rough to fine, and the real-time topology-preserved superpixel segmentation algorithm adopts a rough to fine updating method, so that the minimization improvement process achieves a good effect. The detailed process comprises the following two steps:
(1) single image superpixel estimation
For an image, use ac∈ { 1.. multidot.f } denotes a superpixel F, a ═ to which each superpixel c belongs (a ═1,...,aN) Represents the set of all random variables of the segmentation and N represents the image size. The segmentation problem is formed into an objective function, similar to k-means clustering, which satisfies the appearance consistency and the ruleThe shape of the cell. Constraints are added on the super-pixel size to prevent the super-pixel size from being too small.
Definition of diIs the average position of the ith super pixel, eiIs the average color of the ith super pixel. e ═ e (e)1,...,eF),d=(d1,...,dF) Representing the set of center and average positions of all superpixels, respectively. N is a radical of8Representing an eight neighborhood of pixel c, the single image superpixel estimation comprises the following parts according to the Markov random field energy formula: boundary length terms, topology retention terms, minimum size, shape regularization terms, appearance consistency terms.
Boundary length term: its regularization is maintained by ensuring that the superpixel has a small boundary length.
Topology reservation entry: the property of forcing the connection between superpixels to be maintained, the representation of the disconnection is infinite.
Minimizing the size: the size of the super pixel is forced to be at least 1/4 of the original size.
Shape regularization term: the superpixels are kept regular in shape.
Appearance consistency term: the uniformity of color of each super pixel is maintained.
(2) Improvement from coarse to fine
The invention provides an algorithm from rough to fine aiming at the distribution of pixels and a priority queue FIFO strategy. Firstly, initializing superpixels into a regular grid, and then calculating the average color and position of each superpixel; then iteratively refining each layer of refinement process from coarse to fine to achieve a locally good refinement of the objective function, wherein the list is initialized to all boundary blocks, and then it is checked in turn whether connectivity is violated when the label of the boundary block is changed, if connectivity is not violated, the allocation of the block is refined, if the allocation is changed for the block, the mean position and color are updated using the incremental mean equation for the two super-pixel blocks accordingly, the incremental mean equation being:
Figure BDA0002429434670000101
wherein b isn-1Is a previous estimate, anN is the size of the kth super pixel, which is a new element.
If the block at the end of the priority queue is at the boundary, the block's field is added to the queue and the process is repeated until the queue is empty and the next level of improvement is started.
(II) computation of sets of similar superpixels
According to a real-time topology preserving superpixel segmentation algorithm, all superpixels in one image are represented as a set A ═ A { (A })i}i∈{0...n-1}And n is the number of superpixels in an image, then the reconstructed three-dimensional point cloud is projected on the image to obtain the depth value of each pixel x on the image, and the depth value is expressed as g [ p (x, y)]。
The set of depth samples contained in each superpixel is denoted g [ A ]i]={c(x,y)∈Ai|g[c(x,y)]And more than 0, in order to distinguish the area lacking in the three-dimensional depth information, setting superpixel blocks with less than 0.58% of pixel points of the depth information as target superpixels, and setting the other superpixels as reliable superpixels.
The present invention employs converting the image to L AB color space and separately creating a histogram for each superpixel that divides each subspace of L, A, B into 24 bins, respectively, forming a 72-dimensional descriptor, denoted R, for each superpixel blockLab[Ai],Ai∈ A, then, calculating χ of the target superpixel and all superpixels with reliable depth2The distance between the first and second electrodes,
Figure BDA0002429434670000111
where R (i) is the value of the ith dimension in the histogram.
Selecting 32 most similar superpixel blocks with smaller distance to form a set, and expressing the set as N [ A ]i]The number of the most similar superpixels is determined according to the number of all superpixels, generally 34 to 76, and if the number is obviously selected, the number is largerThe computation complexity is increased considerably.
(III) computation of the most similar superpixels
Selecting the most similar superpixel set in the calculation of the similar superpixel sets, and further reducing the N [ A ] according to the superpixel block which is spatially closest to the Euclidean space of the target superpixeli]The number of elements is generally reduced to 3 to 6. While the irregular and highly non-convex shape of the superpixels makes the euclidean distances between superpixels quite ambiguous, as well as objects that are spatially adjacent to them, since the shapes and sizes of superpixel blocks vary.
To solve these problems, the calculation of the most similar superpixels of the present invention uses an algorithm of graph traversal, creates a two-dimensional superpixel graph structure, and if any two superpixels share a boundary, adds an edge between two corresponding nodes on the graph, and the weight of the edge is χ of the histogram of L AB of the two superpixels2Distance. Then calculating the target superpixel Ai TAnd each similar super pixel Aj∈N[Ai T]By minimizing all possible A' si TTo AjCalculating the path value, then adopting shortest path algorithm to the obtained path, selecting three superpixels with shortest path to form a set
Figure BDA0002429434670000112
After obtaining the superpixel of the three shortest paths, i.e.
Figure BDA0002429434670000113
A histogram of the depth samples of the three superpixels is drawn. If the histogram has a single peak or two consecutive peaks, the depth values of the three superpixels are similar, and since the three superpixels are from the most similar superpixels, the colors of the three superpixels are very similar to the target superpixel, and are also very similar in space, the superpixels belong to the same object. If there are still some target superpixels that cannot find the three superpixels through the two steps, the superpixels are determinedThe pixels are marked as holes.
(IV) depth sample interpolation
Obtaining a super pixel set which is closest to the target super pixel block in terms of space distance and color
Figure BDA0002429434670000114
Then according to
Figure BDA0002429434670000115
And interpolating the contained effective depth information into a target super-pixel block, randomly selecting 8-12 pixel points in the target super-pixel block to perform depth interpolation, and determining the number of the pixel points with interpolated depth according to the size of the super-pixel block to meet the follow-up constraint requirement. And calculating the depth of the point according to the space distance between the image point of the original effective depth information and the interpolated point, and executing a depth interpolation algorithm on the target superpixel of each image. And supplementing an area which can not obtain depth information in the reconstruction process on the basis of the original three-dimensional point cloud, and respectively carrying out depth interpolation processing on each image in the image set to finally obtain a set three-dimensional point cloud model of which the three-dimensional information is enough for subsequent processing.
Second, local deformation guided visual new angle construction
Given a set of images of a scene, the user can only see the scene from a limited viewing angle, depending on the number of images. The present invention relates to a method for displaying a scene, and more particularly, to a method for displaying a scene, which can dynamically browse the scene from a plurality of viewing angles by using an existing image set of the scene, and can make the scene observed at a new viewing angle be the same as real, and have a good visual effect. The prior art method only processes transition between some visual angles or only small visual angle changes, and is difficult to process for complex scenes. The set of scene images can be processed by visual SFM to obtain the parameters of the camera. Therefore, the original camera position can be utilized to set a camera matrix of the new visual angle, then two or four images closest to the new visual angle are selected as reference images, the obtained scenery three-dimensional point cloud is used as a constraint condition, and the image closest to the new visual angle is deformed to the new visual angle to obtain the basic image information of the new visual angle. However, the obtained new visual angle image has to leave some gaps or holes in the deformation process, which affects the final visual effect, and at this time, the new visual angle needs to be fused and corrected, so that the new visual angle image is more real in appearance. The above processing process is repeated by selecting different positions, so that the effect of the whole scene in the scene can be obtained, and more information of one scene can be observed and known from more visual angles.
(I) three-dimensional point cloud constrained image warping
Given a new angle of vision, its camera projection matrix BnIf the image near the new angle of vision is known as D1,D2,...,DNAssuming an input image DiHas a camera matrix of BiFor each point B (x, y) ∈ D on the imageiWherein the three-dimensional point cloud capable of being projected to the input image range is represented as ZiThree-dimensional point q (X, Y, Z) ∈ Z in the three-dimensional point cloud of the setiThen, there exists a mapping relationship F from two-dimensional points on the image to three-dimensional points in spaceiThe formula is as follows:
Bi(q)=Bi(Fi(B))=B
the area to be deformed is firstly divided into a grid of n × m, and for a point B with a depth sample on the two-dimensional image, three vertexes of the triangle where the point B is positioned are expressed as (U)1,U2,U3) The initial triangle on the input image is a right triangle, and B is represented as (B) according to the center coordinate of the point B in the triangle1(B),b2(B),b3(B) If three of the triangles after deformation are defined as (U)1′,U2′,U3') two conditions need to be met during deformation: including a reprojection energy factor condition and a similarity transform factor condition.
(1) Reprojection energy factor condition:
in order to enable the depth sample points and the three vertexes of the triangle after deformation to still meet the barycentric coordinate relationship in the new visual angle after deformation, a least square energy equation is formed according to the constraint of the three-dimensional point cloud.
(2) Similarity transform factor condition
The image has been divided into a grid of n × m, and for each grid cell, the grid cell can be divided into two triangles, then the deformation is performed in units of triangles, where a similarity transformation factor measures the deviation after deformation of the two corresponding grid cells1,U2,U3) With one of the vertices U2As an origin, an edge connected thereto<U2,U3>The straight line is the X axis and is rotated by 90 degrees, the straight line of the side is the Y axis, a local coordinate system is formed, two vertexes are used for representing the other vertex, and then U is formed1Can use U2And U3Expressed in the following form:
Figure BDA0002429434670000131
wherein R is90Is a rotation matrix. From the local coordinate system, u and v are both known coordinates, calculated by the following formula:
u=(U1-U2)T(U3-U2)/||U3-U2||
v=(U1-U2)TR90(U3-U2)/||(U3-U2||
therefore, by reducing the variation among the three vertices after the transformation, the shape of the triangle can be kept from being abnormally changed. The equation of the shape keeping term is obtained, for a region needing deformation, the reprojection energy factor of the region must be minimized, and the shape of the region is kept, as shown in fig. 2, therefore, only a sparse linear equation system is formed by minimizing the objective function for the region needing deformation, the vertex coordinates of each triangle after deformation are obtained through the sparse linear equation system, once the vertex coordinates of the triangle are determined, the image after deformation transformation can be obtained by interpolating the image of a new visual angle according to the central coordinates of the pixels in the triangle in the input image.
Local deformation driven by super pixel division
The point clouds reconstructed by stereo are not completely accurate and may have noise, especially in the boundary and contour regions between objects. Even after the multi-image depth fusion, the obtained result is just to provide reasonable and sufficient constraint points for the cavity region, but the constraint points are not completely consistent with the whole image. If the area needing to be deformed is directly treated as a whole, the deformed artifact is caused by the inaccurate constraint terms, and the whole treatment forms a large multi-dimensional sparse linear equation set, so that the complexity of the solution is increased, and the solution time is prolonged.
Aiming at the problems which can occur based on global deformation, the local deformation driven by superpixel segmentation is adopted, and the deformation of the whole area is not carried out any more unlike the global deformation. Because the image is already segmented into the superpixels in the step of multi-image depth fusion, the superpixels have the same basic color, all pixel points in the superpixels belong to the same object, and the depths are close. Therefore, these superpixel blocks are individually deformed without affecting other regions. The method based on local deformation greatly reduces errors caused by partially unreliable depth sample constraint, deformation among super pixel blocks is relatively independent and small in size, the dimension of a linear equation set to be solved is greatly reduced, the calculation complexity is reduced, most importantly, parallel processing can be carried out among the super pixel blocks, and the processing time can be greatly reduced.
As shown in FIG. 3, a superpixel block is irregular in shape, and based on the location of the pixels in the superpixel block, a rectangle is found that can contain the entire superpixel block, if B (x, y) ∈ AkThen find the coordinates, x, of the four vertices of the rectanglemin=min(Bx);xmax=max(Bx);ymin=min(By);ymax=max(By) Four of rectangularThe vertices are respectively: u shape1(xmin,ymin),U2(xmin,ymax),U3(xmax,ymin),U4(xmax,ymax) Then, the super-pixel block is divided into two triangular regions along the diagonal line, and the two triangles are deformed by performing image deformation transformation constrained by the three-dimensional point cloud. The warping process can be seen from the figure, with the operation of each superpixel block being relatively independent.
Processing and rendering of new visual angle
Visual new angle processing fusion
And after a camera matrix is given for each new visual angle, two images which are spatially closest to the new visual angle can be determined by inputting camera parameters of the images, wherein the two images comprise a left image and a right image. And then, respectively carrying out deformation processing on the two input images according to the visual new angle construction guided by the local deformation to obtain the visual new angle images of the two input images at the visual new angle after deformation. Because the relative position and parallax of the object in the scenery can change while the visual angle changes, some holes can be left at the new visual angle after deformation, the input images are at the left side and the right side of the new visual angle in space, the scenery observed at different visual angles is different, and thus the deformed results of the two input images can be utilized to complement the missing information and the hole area. Meanwhile, in the process of processing and fusing, the input image information closest to the new visual angle is kept as much as possible, the corresponding weight is increased, and the visual angle image information slightly far away is used as supplementary information to obtain an image of the new visual angle with more complete information. And selecting more input visual angle images which are closest to the new visual angle and have more quantity for carrying out deformation operation for more image information of the new visual angle after the compensation deformation, and then processing and fusing. However, the larger the number of input view selections, the more the number of calculations increases, and the relationship between the number of input view selections and the information that can be supplemented needs to be balanced.
And selecting pixel values on the new visual angle image nearby, fusing the multiple images onto the new visual angle image according to weighting, marking the pixel value of the point as (0, 255) if no valid pixel value can be found on the multiple images, indicating that the point is a hole, and then saving the (0, 255) as a mask for subsequent hole filling operation.
As shown in fig. 4, the upper two images are images of the left and right original images closest to the new viewing angle respectively deformed to the new viewing angle, and the upper image is a resultant image obtained by fusing the upper two images. After the new visual angle is processed and fused, although the image mainly contains the information of the whole scene, some regular noise exists. The reason for the noise is that in the deformation process, the deformation is performed by taking the triangle as a standard, and the vertex of the triangle cannot be processed when the pixel is interpolated to a new visual angle from an input visual angle, so that the pixel appears in a null point form without an RGB value. For the noises, obvious noises can be removed by adopting a filtering mode. Of course, the large hole region needs to be processed by a subsequent hole filling algorithm.
Filling and correcting for (II) hollow area
After the above processing, although the image content of most of the area of the new visual angle is obtained, some void areas are left to affect the final visual effect. The reason for this void region is two: firstly, due to the change of the visual angle, the area of the region block of the same block of the super pixels after deformation changes, and if the area of the super pixels is reduced, gaps among the super pixels are left, especially in the boundary region; secondly, when the depth sample difference is carried out on the target superpixel, a source superpixel which is consistent with the target superpixel cannot be found aiming at some target superpixels, so that the depth interpolation is not successfully carried out on the target superpixel, and some void areas can be left when the target superpixel cannot be deformed to a new visual angle in the deformation process.
Aiming at the filling problem of the hollow areas, the invention provides a filling and correcting method of the hollow areas based on image blocks, which comprises the following two steps:
filling a cavity area based on a sample block;
and step two, automatically detecting the artifact and eliminating the correction.
The algorithm first takes the hole area as a mask, then finds a sample block optimal in geometry and color for each point of the mask area, and copies the sample block onto the mask area. However, such copy-paste causes local blocking and image inconsistency, which makes the real scene unnatural. On the corrected image, there are artifacts where there are significant color and photometric differences. The artifacts can be eliminated by automatically detecting according to the characteristics of the artifacts and then modifying the relevant correction parameters of the artifact areas, and finally, a visually good filling and correction effect of the hollow area is formed.
(1) Sample block based void region filling
The filling and correcting method for the hollow area has the following two advantages: firstly, no extra cost is paid in the detection and elimination post-processing steps of the artifact; and compared with other methods which adopt multi-core or GPU acceleration algorithms, the method can achieve the same speed without adopting acceleration.
The method for filling the void area based on the sample block comprises the following specific steps:
firstly, an object area S needing to be filled is selected, the object area is assigned to be a single color, then the size of a block is specified, and after the parameters are defined, the object area can be automatically filled.
For each pixel in the image, a color value and a reliability value are defined. The color value may be assigned null in the object region and the reliability value represents the reliability of the pixel, which is modified and fixed if the point is filled. On the boundary T of the object region, because the size and the structure information of the void region existing in each point block are different, each pixel will be given a priority temporarily, the order of filling is determined, and then the iterative processing is performed according to the following three steps until the object region set is empty.
Step 1, calculation of block priority
The filling order of the void region is important, and each block of a point on the boundary of the void region is processed by adopting a method with the highest priority. The priority is calculated based on the continuity of the stronger edges of the blocks and the higher reliability values around them.
The reliability term is considered to measure the amount of reliable information around a pixel, and for a block with a large proportion of pixels already present or filled, it is also possible that the already present part of the block belongs to the source region rather than the object region. At this point, the blocks are given higher priority, with the padding operation first.
This method has a good filling effect for a specific shape of the hole area, for example, those hole areas containing corner points or having a narrow shape will be filled preferentially because these blocks are surrounded by more pixels of the source area and the block that best matches them will provide more reliable information. In contrast, an object region with a single texture, unchanged structure or less reliable information will wait until more pixels are filled around the object region before the filling process is performed.
In the initial iteration, the reliability item approximately executes the filling sequence around the center of the hollow area, and along with the execution of the filling program, the reliability value of the points at the periphery of the hollow area, namely the points on the boundary, is larger, and the reliability value of the pixels at the center of the hollow area is smaller.
The data items are functions that represent the isophote and the hit strength of the boundary T during each iteration, and if there is an isophote in a block, then the priority of that block will be increased. The data item has a very important role in the filling algorithm, because the linear structure of the source area is maintained and propagated into the hole area, and the linear structure damaged by the hole area is restored, so that a real effect is visually formed. A balance relation exists between the reliability item and the data item, the data item quickly spreads the isolux line to the cavity area, and the reliability item inhibits the spreading speed so that obvious artifacts are not generated.
The priority function can automatically determine the filling sequence of the hole areas, and compared with the filling sequence of any predefined hole areas, the accuracy and the effect are obviously improved. The hole filling sequence also becomes a function of the image attribute, not only can the damaged structure be eliminated, but also the block effect can be reduced, the fuzzy smoothing processing is relatively simple, and the good visual effect is achieved in the sense of reality.
Step 2, propagation of texture and structure information
And (3) calculating the priority of the source region block according to the step 1, finding the optimal matching block of the void region to be filled according to the priority, and then extracting data from the source region according to the optimal block to fill the void region.
In the filling of the prior art, the color value of a pixel is obtained through diffusion, and fuzzy filling is carried out in a hollow area. But the void areas are noticeable and produce poor visual results. The method fills the hollow area by directly sampling the image information of the source area, finds out the block most similar to the object area block in the blocks of the source area, and then fills each pixel in the set. Such padding may substantially preserve the geometry and texture information from the source region for further propagation to the object region.
Step 3, updating the reliability value
Through the operations of the step 1 and the step 2, only part of the cavity area is filled along the boundary of the cavity area, the rest cavity area still needs to be filled through a closer iteration, the reliability value of the filled point changes correspondingly, and the reliability value of each point of the filled area is updated.
The updating method of the reliability values of all points in the filled area updates the reliability values of the pixels of the filled object area in the boundary block, the reliability values of the pixels are continuously reduced towards the center of the hollow area along with the filling, and based on the copying method, the hollow area is filled by a method of repeated iteration and global processing.
(2) Artifacts are automatically detected and corrections are eliminated.
Artifacts are left on an image processed by a block correction method, are visually obvious, and are particularly important to be automatically detected and eliminated.
Judging whether a point p ∈ S is an artifact, verifying whether the following two conditions are satisfied:
the first condition is as follows: the larger the gradient value at the point, the more spatially discontinuous the point;
and a second condition: blocks stuck to the field of p, if they come from different locations, will cause discontinuities in the image.
And finding the artifact point sets according to the characteristics of the condition one and the condition two.
Fourthly, summary of the invention
For an image set of a scene, a three-dimensional point cloud obtained by simply adopting visual SFM reconstruction can be displayed as a cavity area in a reconstruction result in an area with complex texture or less matching of feature points. However, if the number of the hole areas is too large, the missing information is too much, which may cause serious influence on the subsequent processing process. In order to enable the three-dimensional information of the scenery to be more complete, the invention adopts a sample depth interpolation-based method to fill the void area with missing information, and the method can interpolate correct depth for different scenery and has better robustness. By a good superpixel segmentation method, an image is segmented into each superpixel block, so that the superpixel blocks meet the requirement of consistent color and belong to only one object. And then selecting the super-pixel block with the color and the space distance which are most similar to the super-pixel block without the three-dimensional information, and performing depth interpolation on the target super-pixel block by using the effective sample depth values contained in the super-pixel block. Finally, a real and reliable scene point cloud model with complete three-dimensional information is obtained, and sufficient constraint conditions are provided for subsequent work.
Aiming at the problem that the view angle of the set is limited, the invention provides a visual new angle construction drawing method which is rapid and has better visual effect, more visual new angles with different positions can be constructed according to the existing view angle, more understanding and displaying can be carried out on the set, and the constructed visual new angle has stronger sense of reality. Firstly, based on the space deformation theory of stereo reconstruction, one visual angle is converted to another visual angle, in order to improve the accuracy and reduce the noise, the invention adopts the technology of dividing the image into a plurality of superpixels and deforming the superpixel blocks with relatively independent information, thereby accelerating the processing speed, as the initially constructed new visual angle still leaves some cavity areas which can not be obtained by the image deformation technology, the invention adopts the method based on image block correction to fill the cavity areas, in order to ensure that the filling is more natural and has the sense of reality, the invention adopts the post-processing method to the reconstructed visual angle, detects that the obvious artifact areas exist in the visual effect, further fuses by the method of local self-adaptation aiming at the artifact areas, and finally obtains the new visual angle image with stronger sense of reality in the visual effect, and the new visual angle construction method adopted by the invention has higher speed, the visual effect is good.

Claims (10)

1. The method for constructing and drawing the new visual angle of the multi-image fused stereoscopic scenery is characterized by comprising multi-image depth fusion, visual new angle construction guided by local deformation and visual new angle processing rendering, wherein the multi-image depth fusion comprises a real-time topology-preserved superpixel segmentation algorithm, calculation of a similar superpixel set, calculation of a most similar superpixel and depth sample interpolation, the visual new angle construction guided by local deformation comprises three-dimensional point cloud constrained image deformation transformation and local deformation driven by superpixel segmentation, and the visual new angle processing rendering comprises visual new angle processing fusion and filling and correction of a cavity region.
2. The method for constructing and drawing a new visual angle of a multi-image fused stereoscopic scene according to claim 1, wherein for monocular visual stereoscopic reconstruction based on an image set, a three-dimensional point cloud model of the scene is obtained as a first step of the reconstruction process, a depth sample interpolation method based on a super-pixel block is proposed by taking point cloud as a constraint condition, the super-pixel block which is not reconstructed is found out, the super-pixel block which is optimal in spatial distance and color and contains depth information is found out in an image range, and then depth interpolation is carried out on the missing super-pixel block by utilizing the known depth information, so as to obtain a scene three-dimensional point cloud model with enough three-dimensional point cloud information in each area; and then, constructing a new visual angle by utilizing three-dimensional point cloud information, wherein the key step of creating the new visual angle is image deformation, converting the image content of a known visual angle to a corresponding position on a new visual angle image, adopting a local deformation method based on the result of superpixel segmentation, enabling the deformation among each superpixel block not to influence each other, performing parallel processing to obtain most of the image content of the new visual angle, and calculating the priority of filling for a fine cavity area or a void area which is not filled with reality by a block correction-based method according to the content and the structure of the image, and performing sequential iterative improvement on the cavity area until the void area is filled, so as to finally present the new visual angle image with strong reality.
3. The method for constructing and drawing the new visual angle of the multi-image fused stereoscopic scenery according to the claim 1 is characterized in that the invention provides a method for constructing the multi-image depth fusion, which comprises the steps of constructing effective and enough used three-dimensional points, firstly segmenting an image into superpixels, determining superpixel blocks with poor reconstruction quality by utilizing segmentation results and the existing depth information, and named as object areas, then finding out the superpixel blocks which are most similar to the object areas in color and closest to the object areas in spatial distance from the existing depth information to fill the object areas with missing depth, and finally obtaining a three-dimensional model with complete scenery, thereby meeting the requirement of constructing the new visual angle;
the invention provides a real-time topology-preserved superpixel segmentation algorithm, which realizes real-time superpixel segmentation from rough to fine, can preserve topology, adopts a rough to fine updating method, and achieves good effect in a minimization improvement process, wherein the detailed process comprises the following two steps: single image superpixel estimation and refinement from coarse to fine.
4. The method as claimed in claim 3, wherein the super-pixel estimation of a single image is performed by using a for one imagec∈ { 1.., F } denotes each superThe super pixel F to which the pixel c belongs, a ═ a1,...,aN) Representing a set of all random variables of segmentation, wherein N represents the image size, forming a segmentation problem into an objective function, satisfying the appearance consistency and the regular shape, and adding constraint on the super-pixel size;
definition of diIs the average position of the ith super pixel, eiIs the average color of the ith super pixel; e ═ e (e)1,...,eF),d=(d1,...,dF) Respectively representing the set of center and average positions of all superpixels; n is a radical of8Representing an eight neighborhood of pixel c, the single image superpixel estimation comprises the following parts according to the Markov random field energy formula: boundary length item, topological structure retention item, minimized size, shape regularization item and appearance consistency item;
boundary length term: keeping a superpixel regularized by ensuring that it has a small boundary length;
topology reservation entry: the property of forcing the super-pixels to remain connected, the unconnected representation being infinite;
minimizing the size: forcing 1/4 the size of the superpixel to be at least the original size;
shape regularization term: keeping the superpixels regular in shape;
appearance consistency term: the uniformity of color of each super pixel is maintained.
5. The method for constructing and rendering a new angle of vision of a multi-image-fused diorama according to claim 3, wherein in the coarse to fine improvement, the superpixels are initialized to a regular grid, and then the average color and position of each superpixel are calculated; then iteratively refining each layer of refinement process from coarse to fine to achieve a locally good refinement of the objective function, wherein the list is initialized to all boundary blocks, and then it is checked in turn whether connectivity is violated when the label of the boundary block is changed, if connectivity is not violated, the allocation of the block is refined, if the allocation is changed for the block, the mean position and color are updated using the incremental mean equation for the two super-pixel blocks accordingly, the incremental mean equation being:
Figure FDA0002429434660000021
wherein b isn-1Is a previous estimate, anN is the size of the kth super pixel, which is a new element.
If the block at the end of the priority queue is at the boundary, the block's field is added to the queue and the process is repeated until the queue is empty and the next level of improvement is started.
6. The method for constructing and rendering a new visual angle of a multi-image-fused diorama according to claim 1, wherein in the calculation of the similar superpixel sets, all superpixels in an image are represented as a set a ═ a { a } according to a real-time topology preserving superpixel segmentation algorithmi}i∈{0...n-1}And n is the number of superpixels in an image, then the reconstructed three-dimensional point cloud is projected on the image to obtain the depth value of each pixel x on the image, and the depth value is expressed as g [ p (x, y)];
The set of depth samples contained in each superpixel is denoted g [ A ]i]={c(x,y)∈Ai|g[c(x,y)]If the pixel point of the depth information is more than 0}, setting a superpixel block with less than 0.58 percent of pixel points of the depth information as a target superpixel, and setting the other superpixels as reliable superpixels;
the present invention employs converting the image to L AB color space and separately creating a histogram for each superpixel that divides each subspace of L, A, B into 24 bins, respectively, forming a 72-dimensional descriptor, denoted R, for each superpixel blockLab[Ai],Ai∈ A, then, calculating χ of the target superpixel and all superpixels with reliable depth2The distance between the first and second electrodes,
Figure FDA0002429434660000031
where R (i) is the value of the ith dimension in the histogram.
Selecting 32 most similar superpixel blocks with smaller distance to form a set, and expressing the set as N [ A ]i]And determining the number of the most similar superpixels according to the number of all superpixels.
7. The method of claim 1, wherein in the calculation of the most similar superpixel, selecting a set of the most similar superpixels, and further reducing the size of N [ A ] according to the superpixel block with the Euclidean spatial distance from the target superpixel to the nearest superpixeli]The number of elements is generally reduced to 3 to 6;
the calculation of the most similar superpixel of the invention adopts a graph traversal algorithm to create a two-dimensional superpixel graph structure, if any two superpixels share a boundary, an edge is added between two corresponding nodes on the graph, and the weight of the edge is the χ of the histogram of L AB of the two superpixels2Distance, then calculating the target superpixel Ai TAnd each similar super pixel
Figure FDA0002429434660000032
By minimizing all possible A' si TTo AjCalculating the path value, then adopting shortest path algorithm to the obtained path, selecting three superpixels with shortest path to form a set
Figure FDA0002429434660000033
After obtaining the superpixel of the three shortest paths, i.e.
Figure FDA0002429434660000034
A histogram of the depth samples of the three superpixels is drawn. If the histogram has a single peak or two consecutive peaks, the depth values of the three superpixels are similar, since the three superpixels are from the most similar superpixel block, their colors and the target superpixel are ten times largerIf the target superpixels can not find the three superpixels through the two steps, the superpixels are marked as holes.
8. The method as claimed in claim 1, wherein the super-pixel set closest to the target super-pixel block in terms of spatial distance and color is obtained in depth sample interpolation
Figure FDA0002429434660000035
Then according to
Figure FDA0002429434660000036
Interpolating the contained effective depth information into a target super-pixel block, randomly selecting 8-12 pixel points in the target super-pixel block for depth interpolation, and determining the number of the pixel points with interpolated depth according to the size of the super-pixel block to meet the subsequent constraint requirement; calculating the depth of the point according to the space distance between the image point of the original effective depth information and the interpolated point, and executing a depth interpolation algorithm on the target superpixel of each image; and supplementing an area which can not obtain depth information in the reconstruction process on the basis of the original three-dimensional point cloud, and respectively carrying out depth interpolation processing on each image in the image set to finally obtain a set three-dimensional point cloud model of which the three-dimensional information is enough for subsequent processing.
9. The method as claimed in claim 1, wherein the three-dimensional point cloud constrained transformation of image deformation is given a new visual angle, and its camera projection matrix BnIf the image near the new angle of vision is known as D1,D2,...,DNAssuming an input image DiHas a camera matrix of BiFor each point B (x, y) ∈ D on the imageiWherein the three-dimensional point cloud capable of being projected to the input image range is represented as ZiSet of sceneryThree-dimensional point q (X, Y, Z) ∈ Z in three-dimensional point cloudiThen, there exists a mapping relationship F from two-dimensional points on the image to three-dimensional points in spaceiThe formula is as follows:
Bi(q)=Bi(Fi(B))=B
the area to be deformed is firstly divided into a grid of n × m, and for a point B with a depth sample on the two-dimensional image, three vertexes of the triangle where the point B is positioned are expressed as (U)1,U2,U3) The initial triangle on the input image is a right triangle, and B is represented as (B) according to the center coordinate of the point B in the triangle1(B),b2(B),b3(B) If three of the triangles after deformation are defined as (U)1′,U2′,U3') two conditions need to be met during deformation: including a reprojection energy factor condition and a similarity transform factor condition.
10. The method for constructing and drawing the new visual angle of the multi-image fused stereoscopic scenery according to claim 1, characterized in that in the processing and fusion of the new visual angle, after a camera matrix is given for each new visual angle, two images which are spatially closest to the new visual angle are determined by camera parameters of the input images, including a left image and a right image, and then the two input images are deformed according to the construction of the new visual angle guided by local deformation respectively to obtain the new visual angle images of the two input images at the new visual angle after deformation, and the deformed results of the two input images are utilized to complement missing information and a cavity region; meanwhile, input image information closest to the new visual angle is reserved in the processing and fusion process, corresponding weight is increased, and visual angle image information slightly far away is used as supplementary information to obtain an image of the new visual angle with more complete information; selecting more input visual angle images which are closest to the new visual angle to perform deformation operation, and then processing and fusing;
selecting pixel values on the new visual angle image nearby, fusing the multiple images onto the new visual angle image according to weighting, if no effective pixel value can be found on the multiple images, marking the pixel value of the point as (0, 255), indicating that the point is a hole, and then storing the (0, 255) as a mask for subsequent hole filling operation; the filtering mode can remove obvious noise.
CN202010231534.0A 2020-03-27 2020-03-27 Multi-image fused stereoscopic set vision new angle construction drawing method Pending CN111462030A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010231534.0A CN111462030A (en) 2020-03-27 2020-03-27 Multi-image fused stereoscopic set vision new angle construction drawing method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010231534.0A CN111462030A (en) 2020-03-27 2020-03-27 Multi-image fused stereoscopic set vision new angle construction drawing method

Publications (1)

Publication Number Publication Date
CN111462030A true CN111462030A (en) 2020-07-28

Family

ID=71685715

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010231534.0A Pending CN111462030A (en) 2020-03-27 2020-03-27 Multi-image fused stereoscopic set vision new angle construction drawing method

Country Status (1)

Country Link
CN (1) CN111462030A (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111948658A (en) * 2020-08-22 2020-11-17 高小翎 Deep water area positioning method for identifying and matching underwater landform images
CN112215871A (en) * 2020-09-29 2021-01-12 武汉联影智融医疗科技有限公司 Moving target tracking method and device based on robot vision
CN112233018A (en) * 2020-09-22 2021-01-15 天津大学 Reference image guided face super-resolution method based on three-dimensional deformation model
CN113298709A (en) * 2021-04-06 2021-08-24 广东省科学院智能制造研究所 Image visual angle transformation method based on geometric transformation principle
CN113409457A (en) * 2021-08-20 2021-09-17 宁波博海深衡科技有限公司武汉分公司 Three-dimensional reconstruction and visualization method and equipment for stereo image
CN114998338A (en) * 2022-08-03 2022-09-02 山西阳光三极科技股份有限公司 Mining quantity calculation method based on laser radar point cloud
CN117350926A (en) * 2023-12-04 2024-01-05 北京航空航天大学合肥创新研究院 Multi-mode data enhancement method based on target weight

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080137989A1 (en) * 2006-11-22 2008-06-12 Ng Andrew Y Arrangement and method for three-dimensional depth image construction
US20150213640A1 (en) * 2014-01-24 2015-07-30 Nvidia Corporation Hybrid virtual 3d rendering approach to stereovision
CN108038905A (en) * 2017-12-25 2018-05-15 北京航空航天大学 A kind of Object reconstruction method based on super-pixel
CN109712067A (en) * 2018-12-03 2019-05-03 北京航空航天大学 A kind of virtual viewpoint rendering method based on depth image

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080137989A1 (en) * 2006-11-22 2008-06-12 Ng Andrew Y Arrangement and method for three-dimensional depth image construction
US20150213640A1 (en) * 2014-01-24 2015-07-30 Nvidia Corporation Hybrid virtual 3d rendering approach to stereovision
CN108038905A (en) * 2017-12-25 2018-05-15 北京航空航天大学 A kind of Object reconstruction method based on super-pixel
CN109712067A (en) * 2018-12-03 2019-05-03 北京航空航天大学 A kind of virtual viewpoint rendering method based on depth image

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
DIONICIO VASQUEZ等: "An iterative approach for obtaining multi-scale superpixels based on stochastic graph contraction operations", 《EXPERT SYSTEMS WITH APPLICATIONS》, vol. 102, 15 July 2018 (2018-07-15), pages 57 - 69 *
曾一鸣等: "利用稀疏点云偏序关系的半监督单目图像深度估计", 《计算机辅助设计与图形学学报》, vol. 31, no. 11, 15 November 2019 (2019-11-15), pages 2038 - 2046 *

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111948658A (en) * 2020-08-22 2020-11-17 高小翎 Deep water area positioning method for identifying and matching underwater landform images
CN112233018A (en) * 2020-09-22 2021-01-15 天津大学 Reference image guided face super-resolution method based on three-dimensional deformation model
CN112215871A (en) * 2020-09-29 2021-01-12 武汉联影智融医疗科技有限公司 Moving target tracking method and device based on robot vision
CN112215871B (en) * 2020-09-29 2023-04-21 武汉联影智融医疗科技有限公司 Moving target tracking method and device based on robot vision
CN113298709A (en) * 2021-04-06 2021-08-24 广东省科学院智能制造研究所 Image visual angle transformation method based on geometric transformation principle
CN113409457A (en) * 2021-08-20 2021-09-17 宁波博海深衡科技有限公司武汉分公司 Three-dimensional reconstruction and visualization method and equipment for stereo image
CN113409457B (en) * 2021-08-20 2023-06-16 宁波博海深衡科技有限公司武汉分公司 Three-dimensional reconstruction and visualization method and equipment for stereoscopic image
CN114998338A (en) * 2022-08-03 2022-09-02 山西阳光三极科技股份有限公司 Mining quantity calculation method based on laser radar point cloud
CN117350926A (en) * 2023-12-04 2024-01-05 北京航空航天大学合肥创新研究院 Multi-mode data enhancement method based on target weight
CN117350926B (en) * 2023-12-04 2024-02-13 北京航空航天大学合肥创新研究院 Multi-mode data enhancement method based on target weight

Similar Documents

Publication Publication Date Title
CN111462030A (en) Multi-image fused stereoscopic set vision new angle construction drawing method
JP7181977B2 (en) Method and system for detecting and combining structural features in 3D reconstruction
CN109872397B (en) Three-dimensional reconstruction method of airplane parts based on multi-view stereo vision
CN108335352B (en) Texture mapping method for multi-view large-scale three-dimensional reconstruction scene
CN107945267B (en) Method and equipment for fusing textures of three-dimensional model of human face
US8791941B2 (en) Systems and methods for 2-D to 3-D image conversion using mask to model, or model to mask, conversion
EP1303839B1 (en) System and method for median fusion of depth maps
US9098930B2 (en) Stereo-aware image editing
CN111243071A (en) Texture rendering method, system, chip, device and medium for real-time three-dimensional human body reconstruction
CN106709947A (en) RGBD camera-based three-dimensional human body rapid modeling system
Lee et al. Silhouette segmentation in multiple views
US20050140670A1 (en) Photogrammetric reconstruction of free-form objects with curvilinear structures
CN111882668B (en) Multi-view three-dimensional object reconstruction method and system
CN108665530B (en) Three-dimensional modeling implementation method based on single picture
WO2008112802A2 (en) System and method for 2-d to 3-d image conversion using mask to model, or model to mask, conversion
CN113178009B (en) Indoor three-dimensional reconstruction method utilizing point cloud segmentation and grid repair
EP1063614A2 (en) Apparatus for using a plurality of facial images from different viewpoints to generate a facial image from a new viewpoint, method thereof, application apparatus and storage medium
WO2020187339A1 (en) Naked eye 3d virtual viewpoint image generation method and portable terminal
CN113781621A (en) Three-dimensional reconstruction processing method, device, equipment and storage medium
CN115222889A (en) 3D reconstruction method and device based on multi-view image and related equipment
CN110517348B (en) Target object three-dimensional point cloud reconstruction method based on image foreground segmentation
Tylecek et al. Depth map fusion with camera position refinement
CN115082640A (en) Single image-based 3D face model texture reconstruction method and equipment
CN114049423A (en) Automatic realistic three-dimensional model texture mapping method
CN117501313A (en) Hair rendering system based on deep neural network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination