CN111462030A

CN111462030A - Multi-image fused stereoscopic set vision new angle construction drawing method

Info

Publication number: CN111462030A
Application number: CN202010231534.0A
Authority: CN
Inventors: 高小翎; 何克慧
Original assignee: Individual
Current assignee: Individual
Priority date: 2020-03-27
Filing date: 2020-03-27
Publication date: 2020-07-28

Abstract

The invention provides a multi-image fusion stereoscopic set vision new angle construction drawing method, which provides depth sample interpolation based on a superpixel block, performs depth interpolation on the missing superpixel block by using known depth information to obtain a three-dimensional point cloud model with each area containing enough three-dimensional point cloud information, performs vision new angle construction by using the three-dimensional point clouds, transforms the image content of the known visual angle to the corresponding position on a vision new angle image, adopts a local deformation method based on the superpixel segmentation result, is relatively independent among the superpixel blocks, can perform parallel processing, and greatly improves the calculation speed. Aiming at the cavity area, sequential iterative improvement is continuously carried out on the cavity area through a block correction-based method until the cavity area is filled, and finally, a new visual angle image with strong sense of reality can be presented to a user, so that the new visual angle construction speed is high, and the visual effect is good.

Description

Multi-image fused stereoscopic set vision new angle construction drawing method

Technical Field

The invention relates to a visual new angle construction drawing method, in particular to a multi-image fusion stereoscopic set visual new angle construction drawing method, and belongs to the technical field of stereoscopic set visual angle construction.

Background

The stereo reconstruction technique plays a very important role in many fields in real life, such as virtual reality, animation, medical images, scenic spot tourism, and the like. Because the technology is widely applied, a plurality of methods are available for realizing stereo reconstruction, and currently, the technology which utilizes more stereo reconstruction can be divided into three types, the first type is to reconstruct a stereo set model through computer software, the second type is to reconstruct a three-dimensional model of a set by utilizing data acquired by scanning equipment, and the third type is to reconstruct a three-dimensional set based on a two-dimensional image set. Three-dimensional model reconstruction software is a three-dimensional model based on a computer system and animation rendering production software, for example, 3D TUdioMax is commonly used in the fields of animation production, industrial modeling, decoration design and the like, and Softimage and Maya are commonly adopted in the film and television field to produce film advertisements, film special effects and the like, so that a good three-dimensional model can be produced and rendered. However, the three-dimensional model reconstruction software is difficult to process for real scenes and large-scale scenes, and the software needs a relatively professional user to operate and use.

The three-dimensional model reconstruction method based on the scanning equipment is more, currently, the Kinect equipment developed by Microsoft is commonly used, the Kinect equipment is used for gradually scanning a scene, then, a depth image is subjected to filtering operation, then, ICP and PC L are used for registration, and the three-dimensional model of the scene is reconstructed.

The two methods have obviously larger consumption of manpower and material resources when the set is modeled, and the obtained three-dimensional model is not satisfactory when the real set or large-scale complex set is reconstructed. The three-dimensional reconstruction technology based on the vision method opens up a new road for the development of the field of three-dimensional reconstruction, the three-dimensional reconstruction technology based on the vision is to carry out three-dimensional reconstruction on a set based on an image set, a digital camera adopts a two-dimensional image of the set, then three-dimensional measurement is carried out by comprehensively utilizing the principles of computer vision, image processing and the like, the dimension measurement on the spot is not needed, and then computer processing is applied to finally obtain a three-dimensional model of the set. The stereo reconstruction technology based on vision has high applicability to the real scene, the reconstruction speed is high, and even the whole process can be completed without increasing any interaction. Therefore, stereo reconstruction based on visual methods is becoming an increasingly promising direction for research and application.

The computer vision can be further divided into a monocular vision method, a stereoscopic vision method and the like according to the difference of the number of cameras, wherein the monocular vision method is a method for performing set stereoscopic reconstruction by using a single camera, and the adopted images can be considered to be obtained from a single image or an image set of a single visual angle or an image set of a plurality of visual angles. If the image is a single visual angle, the related depth information needs to be calculated through the two-dimensional features of the input image, the process is relatively simple, and only a rough three-dimensional model is obtained. The method can be suitable for large-scale and complex scenes, and the final reconstruction result is relatively better if the image set is larger and the image resolution is higher, but the calculation amount and the processing time cost are relatively higher.

Another visual stereo reconstruction method is a stereo vision method, which is also called a binocular vision method because it uses two cameras aligned in the horizontal or vertical direction to acquire image data of a set. The method comprises the steps of calculating depth information by using parallax due to the fact that the parallax exists between two cameras, aligning the two cameras in the horizontal direction or the vertical direction to reduce the calculated amount of camera correction, meanwhile, correspondingly preprocessing collected images, calibrating the cameras, calculating related camera parameters, extracting and matching feature points of the two images, calculating a basic matrix of the cameras by using matched feature data, correcting the images by using an epipolar geometry principle, calculating the three-dimensional matching between the two images after correction, obtaining a parallax image, finally calculating the depth value of corresponding points according to the parallax image according to a triangulation method, obtaining a three-dimensional point cloud model of a scene, and meshing the point clouds to obtain a complete three-dimensional model.

In summary, there are many methods for stereo reconstruction, but there are many problems to be solved by refining to each specific direction, and there are great challenges to improve and enhance the stereo reconstruction in the prior art, and the stereo reconstruction technique in the prior art mainly has the following defects: firstly, due to the operation complexity and the specialty of the three-dimensional modeling software, only a few professionals can adopt and control the process of three-dimensional reconstruction, the operation difficulty of the three-dimensional modeling software is extremely high, and the three-dimensional model reconstruction software is difficult to process for real scenery and large-scale scenery; secondly, due to the limitation of the equipment and the instrument, the three-dimensional reconstruction technology based on the scanning instrument enables the scene to be processed to be too simple, monotonous and narrow, the process of acquiring data is very troublesome, the whole scene needs to be scanned in a mode of holding the equipment by hand, the cost for reconstruction is very high, the result of the existing method is not satisfactory, and the real effect cannot be displayed visually; thirdly, in comparison, the visual-based stereo reconstruction technology can simplify the whole stereo reconstruction process, conveniently acquire image information of a set, and can process large-scale complex sets, but most of the visual stereo reconstruction in the prior art can only process some regular artificial sets, because when processing complex sets, some areas with complex textures do not have enough matching feature points in the initially reconstructed three-dimensional point cloud, the information of spatial three-dimensional points in the areas cannot be obtained, but the three-dimensional point information is very important for subsequent visual construction as a constraint condition, a new visual angle obtained by direct deformation is not an image with complete information, certain noise and holes can exist, the final visual effect is seriously influenced, and the visual reconstruction effect is poor; fourth, the prior art lacks corresponding processing for an incomplete three-dimensional point cloud model of initial reconstruction, and does not design to interpolate depth samples by searching for an area block which is most similar to the missing three-dimensional information and has three-dimensional information, and cannot respectively deform according to a divided superpixel block as a unit, thereby causing large mutual influence between objects of different depths of the same scene, the speed of deformation processing is very slow, large-area cavity areas exist in a visual angle, more noise exists, and the final visual effect is seriously influenced.

Disclosure of Invention

Aiming at the defects of the prior art, the multi-image fusion stereoscopic scenery vision new angle construction drawing method provided by the invention provides a scenery three-dimensional point cloud model with enough three-dimensional point cloud information in each region by means of depth sample interpolation based on a super-pixel block and depth interpolation of the missing super-pixel block by means of the known depth information, and finally, the vision new angle construction is carried out by means of the three-dimensional point cloud information. Aiming at a small void region or a void region which is not true to be filled, the filling priority is calculated by a block correction-based method according to the content and the structure of the image, and the void region is continuously improved by sequential iteration until the void region is filled, so that a new visual angle image with strong sense of reality can be finally presented to a user, the new visual angle construction speed is high, and the visual effect is good.

In order to achieve the technical effects, the technical scheme adopted by the invention is as follows:

the method for constructing and drawing the new visual angle of the multi-image fused stereoscopic scenery comprises multi-image depth fusion, local deformation guided visual new angle construction and visual new angle processing rendering, wherein the multi-image depth fusion comprises a real-time topology preserved superpixel segmentation algorithm, calculation of a similar superpixel set, calculation of a most similar superpixel and depth sample interpolation, the local deformation guided visual new angle construction comprises three-dimensional point cloud constrained image deformation transformation and local deformation driven by superpixel segmentation, and the visual new angle processing rendering comprises visual new angle processing fusion and filling and correcting of a cavity region.

The method comprises the steps of establishing a drawing method for a multi-image fused stereoscopic scene vision new angle, further, aiming at monocular vision stereoscopic reconstruction based on an image set, obtaining a three-dimensional point cloud model of a scene in the first step of the reconstruction process, taking point cloud as a constraint condition, providing a depth sample interpolation method based on a super-pixel block, finding out a super-pixel block which is not reconstructed, finding out a super-pixel block which is optimal in spatial distance and color and contains depth information in an image range, and then carrying out depth interpolation on the missing super-pixel block by utilizing known depth information to obtain a scene three-dimensional point cloud model of which each region contains enough three-dimensional point cloud information; and then, constructing a new visual angle by utilizing three-dimensional point cloud information, wherein the key step of creating the new visual angle is image deformation, converting the image content of a known visual angle to a corresponding position on a new visual angle image, adopting a local deformation method based on the result of superpixel segmentation, enabling the deformation among each superpixel block not to influence each other, performing parallel processing to obtain most of the image content of the new visual angle, and calculating the priority of filling for a fine cavity area or a void area which is not filled with reality by a block correction-based method according to the content and the structure of the image, and performing sequential iterative improvement on the cavity area until the void area is filled, so as to finally present the new visual angle image with strong reality.

The invention provides a multi-image fused stereoscopic scenery vision new angle construction drawing method, further, the invention provides a multi-image depth fusion construction method, which constructs effective and enough used three-dimensional points, firstly, an image is segmented into superpixels, superpixels with poor reconstruction quality are determined by utilizing segmentation results and existing depth information and are called object regions, then, superpixels which are most similar to the object regions in color and closest in spatial distance are found out from the existing depth information to fill the object regions with missing depths, and finally, a scenery complete three-dimensional model is obtained, so that the requirement of vision new angle construction is met;

the invention provides a real-time topology-preserved superpixel segmentation algorithm, which realizes real-time superpixel segmentation from rough to fine, can preserve topology, adopts a rough to fine updating method, and achieves good effect in a minimization improvement process, wherein the detailed process comprises the following two steps: single image superpixel estimation and refinement from coarse to fine.

A multi-image fusion stereoscopic setting vision new angle construction drawing method is further characterized in that in single-image super-pixel estimation, a image is used_c∈ { 1.. multidot.f } denotes a superpixel F, a ═ to which each superpixel c belongs (a ═₁，...，a_N) Representing a set of all random variables of segmentation, wherein N represents the image size, forming a segmentation problem into an objective function, satisfying the appearance consistency and the regular shape, and adding constraint on the super-pixel size;

definition of d_iIs the average position of the ith super pixel, e_iIs the average color of the ith super pixel. e ═ e (e)₁,...,e_F),d＝(d₁,...,d_F) Representing the set of center and average positions of all superpixels, respectively. N is a radical of₈Representing an eight neighborhood of pixel c, the single image superpixel estimation comprises the following parts according to the Markov random field energy formula: boundary length item, topological structure retention item, minimized size, shape regularization item and appearance consistency item;

boundary length term: keeping a superpixel regularized by ensuring that it has a small boundary length;

topology reservation entry: the property of forcing the super-pixels to remain connected, the unconnected representation being infinite;

minimizing the size: forcing 1/4 the size of the superpixel to be at least the original size;

shape regularization term: keeping the superpixels regular in shape;

appearance consistency term: the uniformity of color of each super pixel is maintained.

The method for constructing and drawing the new perspective of the multi-image fused stereoscopic scenery vision is further improved from rough to fine, firstly, the superpixels are initialized into regular grids, and then the average color and position of each superpixel are calculated; then iteratively refining each layer of refinement process from coarse to fine to achieve a locally good refinement of the objective function, wherein the list is initialized to all boundary blocks, and then it is checked in turn whether connectivity is violated when the label of the boundary block is changed, if connectivity is not violated, the allocation of the block is refined, if the allocation is changed for the block, the mean position and color are updated using the incremental mean equation for the two super-pixel blocks accordingly, the incremental mean equation being:

wherein b is_n-1Is a previous estimate, a_nN is the size of the kth super pixel, which is a new element.

If the block at the end of the priority queue is at the boundary, the block's field is added to the queue and the process is repeated until the queue is empty and the next level of improvement is started.

A multi-image fusion stereoscopic scene vision new angle construction drawing method is further characterized in that in calculation of similar superpixel sets, all superpixels in an image are represented as a set A ═ A { A } according to a superpixel segmentation algorithm retained by real-time topology_i}_{i∈{0...n-1}}And n is the number of superpixels in an image, then the reconstructed three-dimensional point cloud is projected on the image to obtain the depth value of each pixel x on the image, and the depth value is expressed as g [ p (x, y)]；

The set of depth samples contained in each superpixel is denoted g [ A ]_i]＝{c(x，y)∈A_i|g[c(x，y)]If the pixel point of the depth information is more than 0}, setting a superpixel block with less than 0.58 percent of pixel points of the depth information as a target superpixel, and setting the other superpixels as reliable superpixels;

the present invention employs converting the image to L AB color space and separately creating a histogram for each superpixel that divides each subspace L, A, B into 24 groups, respectivelyDistance, forming a 72-dimensional descriptor, denoted R, for each superpixel block_Lab[Ai]，A_i∈ A, then, calculating χ of the target superpixel and all superpixels with reliable depth²The distance between the first and second electrodes,

where R (i) is the value of the ith dimension in the histogram.

Selecting 32 most similar superpixel blocks with smaller distance to form a set, and expressing the set as N [ A ]_i]And determining the number of the most similar superpixels according to the number of all superpixels.

A multi-image fusion stereoscopic setting vision new angle construction drawing method is further characterized in that in the calculation of the most similar superpixel, a superpixel set which is most similar is selected, and according to a superpixel block which is spatially closest to the Euclidean space distance of a target superpixel, the N [ A ] is further reduced_i]The number of elements is generally reduced to 3 to 6;

the calculation of the most similar superpixel of the invention adopts a graph traversal algorithm to create a two-dimensional superpixel graph structure, if any two superpixels share a boundary, an edge is added between two corresponding nodes on the graph, and the weight of the edge is the χ of the histogram of L AB of the two superpixels²Distance, then calculating the target superpixel A_i ^TAnd each similar super pixel

By minimizing all possible A' s_i ^TTo A_jCalculating the path value, then adopting shortest path algorithm to the obtained path, selecting three superpixels with shortest path to form a set

After obtaining the superpixel of the three shortest paths, i.e.

A histogram of the depth samples of the three superpixels is drawn. If the histogram has a single peak or two consecutive peaks, it indicates that the depth values of the three superpixels are similar, since the three superpixels are from the most similar superpixels, it indicates that their colors are very similar to the target superpixels, and they are also very similar spatially, then these superpixels belong to the same object, if some target superpixels can not find these three superpixels through these two steps, then these superpixels are marked as holes.

A multi-image fusion stereoscopic setting vision new angle construction drawing method is further characterized in that in depth sample interpolation, a super pixel set which is closest to a target super pixel block in space distance and color is obtained

Then according to

Interpolating the contained effective depth information into a target super-pixel block, randomly selecting 8-12 pixel points in the target super-pixel block for depth interpolation, and determining the number of the pixel points with interpolated depth according to the size of the super-pixel block to meet the subsequent constraint requirement; calculating the depth of the point according to the space distance between the image point of the original effective depth information and the interpolated point, and executing a depth interpolation algorithm on the target superpixel of each image; and supplementing an area which can not obtain depth information in the reconstruction process on the basis of the original three-dimensional point cloud, and respectively carrying out depth interpolation processing on each image in the image set to finally obtain a set three-dimensional point cloud model of which the three-dimensional information is enough for subsequent processing.

A multi-image fused stereoscopic setting vision new angle construction drawing method is further characterized in that a vision new angle is given in image deformation transformation of three-dimensional point cloud constraint, and a camera projection matrix B of the vision new angle is provided_nIf the image near the new angle of vision is known as D₁,D₂,...,D_NAssuming an input image D_iHas a camera matrix of B_iFor each point B (x, y) ∈ D on the image_iWherein the three-dimensional point cloud capable of being projected to the input image range is represented as Z_iThree-dimensional point q (X, Y, Z) ∈ Z in the three-dimensional point cloud of the set_iThen, there exists a mapping relationship F from two-dimensional points on the image to three-dimensional points in space_iThe formula is as follows:

B_i(q)＝B_i(F_i(B))＝B

the area to be deformed is firstly divided into a grid of n × m, and for a point B with a depth sample on the two-dimensional image, three vertexes of the triangle where the point B is positioned are expressed as (U)₁,U₂,U₃) The initial triangle on the input image is a right triangle, and B is represented as (B) according to the center coordinate of the point B in the triangle₁(B)，b₂(B)，b₃(B) If three of the triangles after deformation are defined as (U)₁′，U₂′，U₃') two conditions need to be met during deformation: including a reprojection energy factor condition and a similarity transform factor condition.

In the method for constructing and drawing the new visual angle of the multi-image fused stereoscopic scenery, further, in the process of processing and fusing the new visual angle, after a camera matrix is given for each new visual angle, two images which are closest to the new visual angle in space are determined by camera parameters of input images, wherein the two images comprise a left image and a right image, then deformation processing is carried out on the two input images according to the new visual angle construction guided by local deformation respectively to obtain the new visual angle images of the two input images at the new visual angle after deformation, and the deformed results of the two input images are utilized to complement missing information and a cavity area; meanwhile, input image information closest to the new visual angle is reserved in the processing and fusion process, corresponding weight is increased, and visual angle image information slightly far away is used as supplementary information to obtain an image of the new visual angle with more complete information; selecting more input visual angle images which are closest to the new visual angle to perform deformation operation, and then processing and fusing;

selecting pixel values on the new visual angle image nearby, fusing the multiple images onto the new visual angle image according to weighting, if no effective pixel value can be found on the multiple images, marking the pixel value of the point as (0, 255), indicating that the point is a hole, and then storing the (0, 255) as a mask for subsequent hole filling operation; the filtering mode can remove obvious noise.

Compared with the prior art, the invention has the advantages that:

1. the invention provides a method for constructing and drawing a new visual angle of a multi-image fused three-dimensional scenery, which comprises the steps of providing depth sample interpolation based on a superpixel block, finding out the superpixel block which is optimal in spatial distance and color and contains depth information in an image range, then carrying out depth interpolation on the missing superpixel block by utilizing the known depth information to finally obtain a scenery three-dimensional point cloud model with each area containing enough three-dimensional point cloud information, and then carrying out visual new angle construction by utilizing the three-dimensional point cloud information, wherein the key step of creating the visual new angle is image deformation, the image content of the known visual angle is transformed to the corresponding position on the visual new angle image, and based on the result of superpixel segmentation, a local deformation method is adopted, each superpixel block is relatively independent, so that the deformation among the superpixel blocks is not influenced mutually and can be processed in parallel, the calculation speed is greatly improved. Aiming at a small void region or filling a void region which is not true, the invention calculates the priority of filling by a block correction-based method according to the content and the structure of the image, and continuously improves the void region by sequential iteration until the void region is filled, and finally the visual new angle image with stronger true sense in vision can be presented to a user.

2. The multi-image fused stereoscopic set vision new angle construction drawing method provided by the invention is directed at the problem that the set visual angle is limited, provides a quick visual new angle construction drawing method with better visual effect, can construct more visual new angles with different positions according to the existing visual angle, and can carry out more understanding and display on the set, and the constructed visual new angle has stronger sense of reality. Firstly, based on the spatial deformation theory of stereo reconstruction, one visual angle is converted to another visual angle, in order to improve the accuracy and reduce noise, the image is divided into a plurality of superpixels, and the superpixels with relatively independent information are deformed, so that the processing speed is accelerated, a post-processing method is adopted for the reconstructed visual angle, and a local self-adaptive method is further fused aiming at an artifact region, so that a visual new angle image with stronger reality in visual effect is finally obtained.

3. The invention provides a multi-image fusion stereoscopic setting vision new angle construction drawing method, which uses a stereoscopic setting reconstruction method based on a super-pixel block. For an input image set, dividing each image into a plurality of superpixel blocks with the same size, wherein the colors of the blocks are basically consistent, each block only belongs to one object in a scene, filling an incomplete three-dimensional point cloud model of initial reconstruction by using the property, performing depth sample interpolation by searching an area block which is most similar to the missing three-dimensional information and has three-dimensional information, and simultaneously, deforming the superpixel blocks respectively according to the divided superpixel blocks as units in the image deformation process. The method not only reduces the influence among objects with different depths in the same scene in the deformation process, but also greatly improves the speed of deformation processing.

4. The invention provides a multi-image fusion stereoscopic set vision new angle construction drawing method, and provides a vision new angle construction method based on three-dimensional point constraint. Finding two original input visual angles with the closest visual new angle in space, taking the initially reconstructed three-dimensional point cloud as a constraint condition, respectively deforming the two original input visual angles to the visual new angle, and fusing the image deformation results of the two original visual angles together to obtain most of image information in the visual new angle. Finally, a new visual angle image with strong reality sense can be presented.

Drawings

Fig. 1 is a structural diagram of a method for constructing and drawing a new perspective view of a multi-image fused stereoscopic scene according to the present invention.

FIG. 2 is a schematic diagram of the image transformation process of the three-dimensional point cloud constraint of the present invention.

FIG. 3 is a schematic diagram of the local deformation process of the super-pixel division driving of the present invention.

FIG. 4 is a schematic view of the new visual angle processing fusion effect of the present invention.

Detailed Description

The technical scheme of the multi-image fused stereoscopic set vision new angle construction drawing method provided by the invention is further described below with reference to the accompanying drawings, so that a person skilled in the art can better understand the invention and can implement the method.

The invention provides a multi-image fusion stereoscopic setting vision new angle construction and drawing method, which comprises multi-image depth fusion, local deformation guided vision new angle construction and vision new angle processing rendering; the multi-image depth fusion comprises a real-time topology-preserved superpixel segmentation algorithm, calculation of a similar superpixel set, calculation of a most similar superpixel and depth sample interpolation, wherein the real-time topology-preserved superpixel segmentation algorithm comprises single-image superpixel estimation and rough-to-fine improvement; constructing a visual new angle guided by local deformation, wherein the visual new angle comprises image deformation transformation constrained by a three-dimensional point cloud and local deformation driven by superpixel segmentation; and the visual new angle processing and rendering comprises the processing and fusion of the visual new angle and the filling and correction of the cavity area.

As shown in fig. 1, the present invention is directed to monocular visual stereo reconstruction based on image set, and since the reconstructed object is a real-world scene, the final reconstruction effect must be consistent with human visual perception, so that the reconstruction result is more realistic. The first step of the reconstruction process is to obtain a three-dimensional point cloud model of the scene, and due to the sparsity of the point cloud, the point cloud can only be used as a constraint condition, and enough but not too much constraint point information must be provided in every place of the scene as the constraint condition. Aiming at the problem, the invention provides a depth sample interpolation method based on a super-pixel block, which can divide an image into the super-pixel blocks with smaller sizes and consistent colors, find out the super-pixel blocks which are not reconstructed according to the characteristic, and find out the super-pixel blocks which are optimal in space distance and color and contain depth information in an image range. The missing super-pixel blocks are then depth interpolated using these known depth information. Finally, a set three-dimensional point cloud model with enough three-dimensional point cloud information in each area is obtained. And then, constructing a new visual angle by using the three-dimensional point cloud information, wherein the key step of creating the new visual angle is image deformation, the image content of a known visual angle is converted to a corresponding position on the new visual angle image, and severe distortion is generated on the new visual angle image due to global deformation, particularly among objects with obvious parallax and depth difference. Based on the result of the super-pixel segmentation, the problem is avoided by adopting a local deformation method, and the relative independence between each super-pixel block ensures that the deformation between the super-pixel blocks does not influence each other, so that the super-pixel blocks can be processed in parallel, and the calculation speed is greatly improved. Although most image contents of a new visual angle can be obtained through the method, a small void area or a void area which is not true can be filled finally.

One, multiple image depth fusion

The first step of multi-image fusion stereo reconstruction is to obtain three-dimensional point cloud according to an existing image set, because the three-dimensional point cloud has an important constraint effect on the subsequent visual new angle construction, if a certain area three-dimensional point is missing, the information of a corresponding area cannot be obtained in the visual new angle, and therefore, it is very important to obtain the area three-dimensional information of a scene as much as possible.

Even if the best three-dimensional point cloud reconstruction method is adopted, the depth of some important areas in the scene cannot be reconstructed, and because the complex scene texture is complex, the feature points among the images cannot be correctly matched in the point cloud reconstruction process, and the point cloud reconstruction method cannot become the final effective three-dimensional point and is abandoned.

Although the prior art method cannot directly reconstruct the three-dimensional points of a complex scene region, the depth information of the region can be constructed based on the existing image information. The invention provides a multi-image depth fusion construction method. Three-dimensional points can be constructed that are efficient and adequate for use. First, an image is divided into superpixels, and a superpixel block with poor reconstruction quality, called an object region, is determined by using the division result and the existing depth information. Then find out the super pixel block which is most similar to the object area in color and has the nearest space distance in the existing depth information to fill the object area with the missing depth. Finally, a three-dimensional model with complete scenery is obtained, and the requirement of constructing a new visual angle is met.

Real-time topology preserving superpixel segmentation algorithm

The invention requires that the super-pixel algorithm should have high running speed, strong real-time performance, good reliability and normalization and good topological consistency of image segmentation. Whereas the prior art does not meet these requirements. The shortcomings of the prior art methods, and the real-time requirements of the present invention for superpixels, are addressed. The invention provides a real-time topology-preserved superpixel segmentation algorithm, which realizes real-time superpixel segmentation from rough to fine, and the real-time topology-preserved superpixel segmentation algorithm adopts a rough to fine updating method, so that the minimization improvement process achieves a good effect. The detailed process comprises the following two steps:

(1) single image superpixel estimation

For an image, use a_c∈ { 1.. multidot.f } denotes a superpixel F, a ═ to which each superpixel c belongs (a ═₁，...，a_N) Represents the set of all random variables of the segmentation and N represents the image size. The segmentation problem is formed into an objective function, similar to k-means clustering, which satisfies the appearance consistency and the ruleThe shape of the cell. Constraints are added on the super-pixel size to prevent the super-pixel size from being too small.

Definition of d_iIs the average position of the ith super pixel, e_iIs the average color of the ith super pixel. e ═ e (e)₁,...,e_F),d＝(d₁,...,d_F) Representing the set of center and average positions of all superpixels, respectively. N is a radical of₈Representing an eight neighborhood of pixel c, the single image superpixel estimation comprises the following parts according to the Markov random field energy formula: boundary length terms, topology retention terms, minimum size, shape regularization terms, appearance consistency terms.

Boundary length term: its regularization is maintained by ensuring that the superpixel has a small boundary length.

Topology reservation entry: the property of forcing the connection between superpixels to be maintained, the representation of the disconnection is infinite.

Minimizing the size: the size of the super pixel is forced to be at least 1/4 of the original size.

Shape regularization term: the superpixels are kept regular in shape.

(2) Improvement from coarse to fine

The invention provides an algorithm from rough to fine aiming at the distribution of pixels and a priority queue FIFO strategy. Firstly, initializing superpixels into a regular grid, and then calculating the average color and position of each superpixel; then iteratively refining each layer of refinement process from coarse to fine to achieve a locally good refinement of the objective function, wherein the list is initialized to all boundary blocks, and then it is checked in turn whether connectivity is violated when the label of the boundary block is changed, if connectivity is not violated, the allocation of the block is refined, if the allocation is changed for the block, the mean position and color are updated using the incremental mean equation for the two super-pixel blocks accordingly, the incremental mean equation being:

(II) computation of sets of similar superpixels

According to a real-time topology preserving superpixel segmentation algorithm, all superpixels in one image are represented as a set A ═ A { (A })_i}_{i∈{0...n-1}}And n is the number of superpixels in an image, then the reconstructed three-dimensional point cloud is projected on the image to obtain the depth value of each pixel x on the image, and the depth value is expressed as g [ p (x, y)]。

The set of depth samples contained in each superpixel is denoted g [ A ]_i]＝{c(x，y)∈A_i|g[c(x，y)]And more than 0, in order to distinguish the area lacking in the three-dimensional depth information, setting superpixel blocks with less than 0.58% of pixel points of the depth information as target superpixels, and setting the other superpixels as reliable superpixels.

The present invention employs converting the image to L AB color space and separately creating a histogram for each superpixel that divides each subspace of L, A, B into 24 bins, respectively, forming a 72-dimensional descriptor, denoted R, for each superpixel block_Lab[A_i]，A_i∈ A, then, calculating χ of the target superpixel and all superpixels with reliable depth²The distance between the first and second electrodes,

where R (i) is the value of the ith dimension in the histogram.

Selecting 32 most similar superpixel blocks with smaller distance to form a set, and expressing the set as N [ A ]_i]The number of the most similar superpixels is determined according to the number of all superpixels, generally 34 to 76, and if the number is obviously selected, the number is largerThe computation complexity is increased considerably.

(III) computation of the most similar superpixels

Selecting the most similar superpixel set in the calculation of the similar superpixel sets, and further reducing the N [ A ] according to the superpixel block which is spatially closest to the Euclidean space of the target superpixel_i]The number of elements is generally reduced to 3 to 6. While the irregular and highly non-convex shape of the superpixels makes the euclidean distances between superpixels quite ambiguous, as well as objects that are spatially adjacent to them, since the shapes and sizes of superpixel blocks vary.

To solve these problems, the calculation of the most similar superpixels of the present invention uses an algorithm of graph traversal, creates a two-dimensional superpixel graph structure, and if any two superpixels share a boundary, adds an edge between two corresponding nodes on the graph, and the weight of the edge is χ of the histogram of L AB of the two superpixels²Distance. Then calculating the target superpixel A_i ^TAnd each similar super pixel A_j∈N[A_i ^T]By minimizing all possible A' s_i ^TTo A_jCalculating the path value, then adopting shortest path algorithm to the obtained path, selecting three superpixels with shortest path to form a set

After obtaining the superpixel of the three shortest paths, i.e.

A histogram of the depth samples of the three superpixels is drawn. If the histogram has a single peak or two consecutive peaks, the depth values of the three superpixels are similar, and since the three superpixels are from the most similar superpixels, the colors of the three superpixels are very similar to the target superpixel, and are also very similar in space, the superpixels belong to the same object. If there are still some target superpixels that cannot find the three superpixels through the two steps, the superpixels are determinedThe pixels are marked as holes.

(IV) depth sample interpolation

Obtaining a super pixel set which is closest to the target super pixel block in terms of space distance and color

Then according to

And interpolating the contained effective depth information into a target super-pixel block, randomly selecting 8-12 pixel points in the target super-pixel block to perform depth interpolation, and determining the number of the pixel points with interpolated depth according to the size of the super-pixel block to meet the follow-up constraint requirement. And calculating the depth of the point according to the space distance between the image point of the original effective depth information and the interpolated point, and executing a depth interpolation algorithm on the target superpixel of each image. And supplementing an area which can not obtain depth information in the reconstruction process on the basis of the original three-dimensional point cloud, and respectively carrying out depth interpolation processing on each image in the image set to finally obtain a set three-dimensional point cloud model of which the three-dimensional information is enough for subsequent processing.

Second, local deformation guided visual new angle construction

Given a set of images of a scene, the user can only see the scene from a limited viewing angle, depending on the number of images. The present invention relates to a method for displaying a scene, and more particularly, to a method for displaying a scene, which can dynamically browse the scene from a plurality of viewing angles by using an existing image set of the scene, and can make the scene observed at a new viewing angle be the same as real, and have a good visual effect. The prior art method only processes transition between some visual angles or only small visual angle changes, and is difficult to process for complex scenes. The set of scene images can be processed by visual SFM to obtain the parameters of the camera. Therefore, the original camera position can be utilized to set a camera matrix of the new visual angle, then two or four images closest to the new visual angle are selected as reference images, the obtained scenery three-dimensional point cloud is used as a constraint condition, and the image closest to the new visual angle is deformed to the new visual angle to obtain the basic image information of the new visual angle. However, the obtained new visual angle image has to leave some gaps or holes in the deformation process, which affects the final visual effect, and at this time, the new visual angle needs to be fused and corrected, so that the new visual angle image is more real in appearance. The above processing process is repeated by selecting different positions, so that the effect of the whole scene in the scene can be obtained, and more information of one scene can be observed and known from more visual angles.

(I) three-dimensional point cloud constrained image warping

Given a new angle of vision, its camera projection matrix B_nIf the image near the new angle of vision is known as D₁,D₂,...,D_NAssuming an input image D_iHas a camera matrix of B_iFor each point B (x, y) ∈ D on the image_iWherein the three-dimensional point cloud capable of being projected to the input image range is represented as Z_iThree-dimensional point q (X, Y, Z) ∈ Z in the three-dimensional point cloud of the set_iThen, there exists a mapping relationship F from two-dimensional points on the image to three-dimensional points in space_iThe formula is as follows:

B_i(q)＝B_i(F_i(B))＝B

(1) Reprojection energy factor condition:

in order to enable the depth sample points and the three vertexes of the triangle after deformation to still meet the barycentric coordinate relationship in the new visual angle after deformation, a least square energy equation is formed according to the constraint of the three-dimensional point cloud.

(2) Similarity transform factor condition

The image has been divided into a grid of n × m, and for each grid cell, the grid cell can be divided into two triangles, then the deformation is performed in units of triangles, where a similarity transformation factor measures the deviation after deformation of the two corresponding grid cells₁,U₂,U₃) With one of the vertices U₂As an origin, an edge connected thereto<U₂,U₃>The straight line is the X axis and is rotated by 90 degrees, the straight line of the side is the Y axis, a local coordinate system is formed, two vertexes are used for representing the other vertex, and then U is formed₁Can use U₂And U₃Expressed in the following form:

wherein R is₉₀Is a rotation matrix. From the local coordinate system, u and v are both known coordinates, calculated by the following formula:

u＝(U₁-U₂)^T(U₃-U₂)/||U₃-U₂||

v＝(U₁-U₂)TR₉₀(U₃-U₂)/||(U₃-U₂||

therefore, by reducing the variation among the three vertices after the transformation, the shape of the triangle can be kept from being abnormally changed. The equation of the shape keeping term is obtained, for a region needing deformation, the reprojection energy factor of the region must be minimized, and the shape of the region is kept, as shown in fig. 2, therefore, only a sparse linear equation system is formed by minimizing the objective function for the region needing deformation, the vertex coordinates of each triangle after deformation are obtained through the sparse linear equation system, once the vertex coordinates of the triangle are determined, the image after deformation transformation can be obtained by interpolating the image of a new visual angle according to the central coordinates of the pixels in the triangle in the input image.

Local deformation driven by super pixel division

The point clouds reconstructed by stereo are not completely accurate and may have noise, especially in the boundary and contour regions between objects. Even after the multi-image depth fusion, the obtained result is just to provide reasonable and sufficient constraint points for the cavity region, but the constraint points are not completely consistent with the whole image. If the area needing to be deformed is directly treated as a whole, the deformed artifact is caused by the inaccurate constraint terms, and the whole treatment forms a large multi-dimensional sparse linear equation set, so that the complexity of the solution is increased, and the solution time is prolonged.

Aiming at the problems which can occur based on global deformation, the local deformation driven by superpixel segmentation is adopted, and the deformation of the whole area is not carried out any more unlike the global deformation. Because the image is already segmented into the superpixels in the step of multi-image depth fusion, the superpixels have the same basic color, all pixel points in the superpixels belong to the same object, and the depths are close. Therefore, these superpixel blocks are individually deformed without affecting other regions. The method based on local deformation greatly reduces errors caused by partially unreliable depth sample constraint, deformation among super pixel blocks is relatively independent and small in size, the dimension of a linear equation set to be solved is greatly reduced, the calculation complexity is reduced, most importantly, parallel processing can be carried out among the super pixel blocks, and the processing time can be greatly reduced.

As shown in FIG. 3, a superpixel block is irregular in shape, and based on the location of the pixels in the superpixel block, a rectangle is found that can contain the entire superpixel block, if B (x, y) ∈ A_kThen find the coordinates, x, of the four vertices of the rectangle_min＝min(B_x)；x_max＝max(B_x)；y_min＝min(B_y)；y_max＝max(B_y) Four of rectangularThe vertices are respectively: u shape₁(x_min，y_min)，U₂(x_min，y_max)，U₃(x_max，y_min)，U₄(x_max，y_max) Then, the super-pixel block is divided into two triangular regions along the diagonal line, and the two triangles are deformed by performing image deformation transformation constrained by the three-dimensional point cloud. The warping process can be seen from the figure, with the operation of each superpixel block being relatively independent.

Processing and rendering of new visual angle

Visual new angle processing fusion

And after a camera matrix is given for each new visual angle, two images which are spatially closest to the new visual angle can be determined by inputting camera parameters of the images, wherein the two images comprise a left image and a right image. And then, respectively carrying out deformation processing on the two input images according to the visual new angle construction guided by the local deformation to obtain the visual new angle images of the two input images at the visual new angle after deformation. Because the relative position and parallax of the object in the scenery can change while the visual angle changes, some holes can be left at the new visual angle after deformation, the input images are at the left side and the right side of the new visual angle in space, the scenery observed at different visual angles is different, and thus the deformed results of the two input images can be utilized to complement the missing information and the hole area. Meanwhile, in the process of processing and fusing, the input image information closest to the new visual angle is kept as much as possible, the corresponding weight is increased, and the visual angle image information slightly far away is used as supplementary information to obtain an image of the new visual angle with more complete information. And selecting more input visual angle images which are closest to the new visual angle and have more quantity for carrying out deformation operation for more image information of the new visual angle after the compensation deformation, and then processing and fusing. However, the larger the number of input view selections, the more the number of calculations increases, and the relationship between the number of input view selections and the information that can be supplemented needs to be balanced.

And selecting pixel values on the new visual angle image nearby, fusing the multiple images onto the new visual angle image according to weighting, marking the pixel value of the point as (0, 255) if no valid pixel value can be found on the multiple images, indicating that the point is a hole, and then saving the (0, 255) as a mask for subsequent hole filling operation.

As shown in fig. 4, the upper two images are images of the left and right original images closest to the new viewing angle respectively deformed to the new viewing angle, and the upper image is a resultant image obtained by fusing the upper two images. After the new visual angle is processed and fused, although the image mainly contains the information of the whole scene, some regular noise exists. The reason for the noise is that in the deformation process, the deformation is performed by taking the triangle as a standard, and the vertex of the triangle cannot be processed when the pixel is interpolated to a new visual angle from an input visual angle, so that the pixel appears in a null point form without an RGB value. For the noises, obvious noises can be removed by adopting a filtering mode. Of course, the large hole region needs to be processed by a subsequent hole filling algorithm.

Filling and correcting for (II) hollow area

After the above processing, although the image content of most of the area of the new visual angle is obtained, some void areas are left to affect the final visual effect. The reason for this void region is two: firstly, due to the change of the visual angle, the area of the region block of the same block of the super pixels after deformation changes, and if the area of the super pixels is reduced, gaps among the super pixels are left, especially in the boundary region; secondly, when the depth sample difference is carried out on the target superpixel, a source superpixel which is consistent with the target superpixel cannot be found aiming at some target superpixels, so that the depth interpolation is not successfully carried out on the target superpixel, and some void areas can be left when the target superpixel cannot be deformed to a new visual angle in the deformation process.

Aiming at the filling problem of the hollow areas, the invention provides a filling and correcting method of the hollow areas based on image blocks, which comprises the following two steps:

filling a cavity area based on a sample block;

and step two, automatically detecting the artifact and eliminating the correction.

The algorithm first takes the hole area as a mask, then finds a sample block optimal in geometry and color for each point of the mask area, and copies the sample block onto the mask area. However, such copy-paste causes local blocking and image inconsistency, which makes the real scene unnatural. On the corrected image, there are artifacts where there are significant color and photometric differences. The artifacts can be eliminated by automatically detecting according to the characteristics of the artifacts and then modifying the relevant correction parameters of the artifact areas, and finally, a visually good filling and correction effect of the hollow area is formed.

(1) Sample block based void region filling

The filling and correcting method for the hollow area has the following two advantages: firstly, no extra cost is paid in the detection and elimination post-processing steps of the artifact; and compared with other methods which adopt multi-core or GPU acceleration algorithms, the method can achieve the same speed without adopting acceleration.

The method for filling the void area based on the sample block comprises the following specific steps:

firstly, an object area S needing to be filled is selected, the object area is assigned to be a single color, then the size of a block is specified, and after the parameters are defined, the object area can be automatically filled.

For each pixel in the image, a color value and a reliability value are defined. The color value may be assigned null in the object region and the reliability value represents the reliability of the pixel, which is modified and fixed if the point is filled. On the boundary T of the object region, because the size and the structure information of the void region existing in each point block are different, each pixel will be given a priority temporarily, the order of filling is determined, and then the iterative processing is performed according to the following three steps until the object region set is empty.

Step 1, calculation of block priority

The filling order of the void region is important, and each block of a point on the boundary of the void region is processed by adopting a method with the highest priority. The priority is calculated based on the continuity of the stronger edges of the blocks and the higher reliability values around them.

The reliability term is considered to measure the amount of reliable information around a pixel, and for a block with a large proportion of pixels already present or filled, it is also possible that the already present part of the block belongs to the source region rather than the object region. At this point, the blocks are given higher priority, with the padding operation first.

This method has a good filling effect for a specific shape of the hole area, for example, those hole areas containing corner points or having a narrow shape will be filled preferentially because these blocks are surrounded by more pixels of the source area and the block that best matches them will provide more reliable information. In contrast, an object region with a single texture, unchanged structure or less reliable information will wait until more pixels are filled around the object region before the filling process is performed.

In the initial iteration, the reliability item approximately executes the filling sequence around the center of the hollow area, and along with the execution of the filling program, the reliability value of the points at the periphery of the hollow area, namely the points on the boundary, is larger, and the reliability value of the pixels at the center of the hollow area is smaller.

The data items are functions that represent the isophote and the hit strength of the boundary T during each iteration, and if there is an isophote in a block, then the priority of that block will be increased. The data item has a very important role in the filling algorithm, because the linear structure of the source area is maintained and propagated into the hole area, and the linear structure damaged by the hole area is restored, so that a real effect is visually formed. A balance relation exists between the reliability item and the data item, the data item quickly spreads the isolux line to the cavity area, and the reliability item inhibits the spreading speed so that obvious artifacts are not generated.

The priority function can automatically determine the filling sequence of the hole areas, and compared with the filling sequence of any predefined hole areas, the accuracy and the effect are obviously improved. The hole filling sequence also becomes a function of the image attribute, not only can the damaged structure be eliminated, but also the block effect can be reduced, the fuzzy smoothing processing is relatively simple, and the good visual effect is achieved in the sense of reality.

Step 2, propagation of texture and structure information

And (3) calculating the priority of the source region block according to the step 1, finding the optimal matching block of the void region to be filled according to the priority, and then extracting data from the source region according to the optimal block to fill the void region.

In the filling of the prior art, the color value of a pixel is obtained through diffusion, and fuzzy filling is carried out in a hollow area. But the void areas are noticeable and produce poor visual results. The method fills the hollow area by directly sampling the image information of the source area, finds out the block most similar to the object area block in the blocks of the source area, and then fills each pixel in the set. Such padding may substantially preserve the geometry and texture information from the source region for further propagation to the object region.

Step 3, updating the reliability value

Through the operations of the step 1 and the step 2, only part of the cavity area is filled along the boundary of the cavity area, the rest cavity area still needs to be filled through a closer iteration, the reliability value of the filled point changes correspondingly, and the reliability value of each point of the filled area is updated.

The updating method of the reliability values of all points in the filled area updates the reliability values of the pixels of the filled object area in the boundary block, the reliability values of the pixels are continuously reduced towards the center of the hollow area along with the filling, and based on the copying method, the hollow area is filled by a method of repeated iteration and global processing.

(2) Artifacts are automatically detected and corrections are eliminated.

Artifacts are left on an image processed by a block correction method, are visually obvious, and are particularly important to be automatically detected and eliminated.

Judging whether a point p ∈ S is an artifact, verifying whether the following two conditions are satisfied:

the first condition is as follows: the larger the gradient value at the point, the more spatially discontinuous the point;

and a second condition: blocks stuck to the field of p, if they come from different locations, will cause discontinuities in the image.

And finding the artifact point sets according to the characteristics of the condition one and the condition two.

Fourthly, summary of the invention

For an image set of a scene, a three-dimensional point cloud obtained by simply adopting visual SFM reconstruction can be displayed as a cavity area in a reconstruction result in an area with complex texture or less matching of feature points. However, if the number of the hole areas is too large, the missing information is too much, which may cause serious influence on the subsequent processing process. In order to enable the three-dimensional information of the scenery to be more complete, the invention adopts a sample depth interpolation-based method to fill the void area with missing information, and the method can interpolate correct depth for different scenery and has better robustness. By a good superpixel segmentation method, an image is segmented into each superpixel block, so that the superpixel blocks meet the requirement of consistent color and belong to only one object. And then selecting the super-pixel block with the color and the space distance which are most similar to the super-pixel block without the three-dimensional information, and performing depth interpolation on the target super-pixel block by using the effective sample depth values contained in the super-pixel block. Finally, a real and reliable scene point cloud model with complete three-dimensional information is obtained, and sufficient constraint conditions are provided for subsequent work.

Aiming at the problem that the view angle of the set is limited, the invention provides a visual new angle construction drawing method which is rapid and has better visual effect, more visual new angles with different positions can be constructed according to the existing view angle, more understanding and displaying can be carried out on the set, and the constructed visual new angle has stronger sense of reality. Firstly, based on the space deformation theory of stereo reconstruction, one visual angle is converted to another visual angle, in order to improve the accuracy and reduce the noise, the invention adopts the technology of dividing the image into a plurality of superpixels and deforming the superpixel blocks with relatively independent information, thereby accelerating the processing speed, as the initially constructed new visual angle still leaves some cavity areas which can not be obtained by the image deformation technology, the invention adopts the method based on image block correction to fill the cavity areas, in order to ensure that the filling is more natural and has the sense of reality, the invention adopts the post-processing method to the reconstructed visual angle, detects that the obvious artifact areas exist in the visual effect, further fuses by the method of local self-adaptation aiming at the artifact areas, and finally obtains the new visual angle image with stronger sense of reality in the visual effect, and the new visual angle construction method adopted by the invention has higher speed, the visual effect is good.

Claims

1. The method for constructing and drawing the new visual angle of the multi-image fused stereoscopic scenery is characterized by comprising multi-image depth fusion, visual new angle construction guided by local deformation and visual new angle processing rendering, wherein the multi-image depth fusion comprises a real-time topology-preserved superpixel segmentation algorithm, calculation of a similar superpixel set, calculation of a most similar superpixel and depth sample interpolation, the visual new angle construction guided by local deformation comprises three-dimensional point cloud constrained image deformation transformation and local deformation driven by superpixel segmentation, and the visual new angle processing rendering comprises visual new angle processing fusion and filling and correction of a cavity region.

2. The method for constructing and drawing a new visual angle of a multi-image fused stereoscopic scene according to claim 1, wherein for monocular visual stereoscopic reconstruction based on an image set, a three-dimensional point cloud model of the scene is obtained as a first step of the reconstruction process, a depth sample interpolation method based on a super-pixel block is proposed by taking point cloud as a constraint condition, the super-pixel block which is not reconstructed is found out, the super-pixel block which is optimal in spatial distance and color and contains depth information is found out in an image range, and then depth interpolation is carried out on the missing super-pixel block by utilizing the known depth information, so as to obtain a scene three-dimensional point cloud model with enough three-dimensional point cloud information in each area; and then, constructing a new visual angle by utilizing three-dimensional point cloud information, wherein the key step of creating the new visual angle is image deformation, converting the image content of a known visual angle to a corresponding position on a new visual angle image, adopting a local deformation method based on the result of superpixel segmentation, enabling the deformation among each superpixel block not to influence each other, performing parallel processing to obtain most of the image content of the new visual angle, and calculating the priority of filling for a fine cavity area or a void area which is not filled with reality by a block correction-based method according to the content and the structure of the image, and performing sequential iterative improvement on the cavity area until the void area is filled, so as to finally present the new visual angle image with strong reality.

3. The method for constructing and drawing the new visual angle of the multi-image fused stereoscopic scenery according to the claim 1 is characterized in that the invention provides a method for constructing the multi-image depth fusion, which comprises the steps of constructing effective and enough used three-dimensional points, firstly segmenting an image into superpixels, determining superpixel blocks with poor reconstruction quality by utilizing segmentation results and the existing depth information, and named as object areas, then finding out the superpixel blocks which are most similar to the object areas in color and closest to the object areas in spatial distance from the existing depth information to fill the object areas with missing depth, and finally obtaining a three-dimensional model with complete scenery, thereby meeting the requirement of constructing the new visual angle;

4. The method as claimed in claim 3, wherein the super-pixel estimation of a single image is performed by using a for one image_c∈ { 1.., F } denotes each superThe super pixel F to which the pixel c belongs, a ═ a₁，...，a_N) Representing a set of all random variables of segmentation, wherein N represents the image size, forming a segmentation problem into an objective function, satisfying the appearance consistency and the regular shape, and adding constraint on the super-pixel size;

definition of d_iIs the average position of the ith super pixel, e_iIs the average color of the ith super pixel; e ═ e (e)₁,...,e_F),d＝(d₁,...,d_F) Respectively representing the set of center and average positions of all superpixels; n is a radical of₈Representing an eight neighborhood of pixel c, the single image superpixel estimation comprises the following parts according to the Markov random field energy formula: boundary length item, topological structure retention item, minimized size, shape regularization item and appearance consistency item;

shape regularization term: keeping the superpixels regular in shape;

5. The method for constructing and rendering a new angle of vision of a multi-image-fused diorama according to claim 3, wherein in the coarse to fine improvement, the superpixels are initialized to a regular grid, and then the average color and position of each superpixel are calculated; then iteratively refining each layer of refinement process from coarse to fine to achieve a locally good refinement of the objective function, wherein the list is initialized to all boundary blocks, and then it is checked in turn whether connectivity is violated when the label of the boundary block is changed, if connectivity is not violated, the allocation of the block is refined, if the allocation is changed for the block, the mean position and color are updated using the incremental mean equation for the two super-pixel blocks accordingly, the incremental mean equation being:

6. The method for constructing and rendering a new visual angle of a multi-image-fused diorama according to claim 1, wherein in the calculation of the similar superpixel sets, all superpixels in an image are represented as a set a ═ a { a } according to a real-time topology preserving superpixel segmentation algorithm_i}_{i∈{0...n-1}}And n is the number of superpixels in an image, then the reconstructed three-dimensional point cloud is projected on the image to obtain the depth value of each pixel x on the image, and the depth value is expressed as g [ p (x, y)]；

where R (i) is the value of the ith dimension in the histogram.

7. The method of claim 1, wherein in the calculation of the most similar superpixel, selecting a set of the most similar superpixels, and further reducing the size of N [ A ] according to the superpixel block with the Euclidean spatial distance from the target superpixel to the nearest superpixel_i]The number of elements is generally reduced to 3 to 6;

After obtaining the superpixel of the three shortest paths, i.e.

A histogram of the depth samples of the three superpixels is drawn. If the histogram has a single peak or two consecutive peaks, the depth values of the three superpixels are similar, since the three superpixels are from the most similar superpixel block, their colors and the target superpixel are ten times largerIf the target superpixels can not find the three superpixels through the two steps, the superpixels are marked as holes.

8. The method as claimed in claim 1, wherein the super-pixel set closest to the target super-pixel block in terms of spatial distance and color is obtained in depth sample interpolation

Then according to

9. The method as claimed in claim 1, wherein the three-dimensional point cloud constrained transformation of image deformation is given a new visual angle, and its camera projection matrix B_nIf the image near the new angle of vision is known as D₁,D₂,...,D_NAssuming an input image D_iHas a camera matrix of B_iFor each point B (x, y) ∈ D on the image_iWherein the three-dimensional point cloud capable of being projected to the input image range is represented as Z_iSet of sceneryThree-dimensional point q (X, Y, Z) ∈ Z in three-dimensional point cloud_iThen, there exists a mapping relationship F from two-dimensional points on the image to three-dimensional points in space_iThe formula is as follows:

B_i(q)＝B_i(F_i(B))＝B

10. The method for constructing and drawing the new visual angle of the multi-image fused stereoscopic scenery according to claim 1, characterized in that in the processing and fusion of the new visual angle, after a camera matrix is given for each new visual angle, two images which are spatially closest to the new visual angle are determined by camera parameters of the input images, including a left image and a right image, and then the two input images are deformed according to the construction of the new visual angle guided by local deformation respectively to obtain the new visual angle images of the two input images at the new visual angle after deformation, and the deformed results of the two input images are utilized to complement missing information and a cavity region; meanwhile, input image information closest to the new visual angle is reserved in the processing and fusion process, corresponding weight is increased, and visual angle image information slightly far away is used as supplementary information to obtain an image of the new visual angle with more complete information; selecting more input visual angle images which are closest to the new visual angle to perform deformation operation, and then processing and fusing;