WO2022222077A1 - 基于反射分解的室内场景虚拟漫游方法 - Google Patents
基于反射分解的室内场景虚拟漫游方法 Download PDFInfo
- Publication number
- WO2022222077A1 WO2022222077A1 PCT/CN2021/088788 CN2021088788W WO2022222077A1 WO 2022222077 A1 WO2022222077 A1 WO 2022222077A1 CN 2021088788 W CN2021088788 W CN 2021088788W WO 2022222077 A1 WO2022222077 A1 WO 2022222077A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- reflection
- pixel
- image
- depth
- plane
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 32
- 238000000354 decomposition reaction Methods 0.000 title claims abstract description 22
- 238000009877 rendering Methods 0.000 claims abstract description 35
- 230000000694 effects Effects 0.000 claims abstract description 15
- 238000005457 optimization Methods 0.000 claims description 47
- 230000003287 optical effect Effects 0.000 claims description 16
- 238000013528 artificial neural network Methods 0.000 claims description 13
- 238000004364 calculation method Methods 0.000 claims description 12
- 238000002939 conjugate gradient method Methods 0.000 claims description 6
- 230000008569 process Effects 0.000 claims description 6
- 238000004422 calculation algorithm Methods 0.000 claims description 5
- 239000011159 matrix material Substances 0.000 claims description 4
- 238000002156 mixing Methods 0.000 claims description 4
- 239000003086 colorant Substances 0.000 claims description 3
- 238000003708 edge detection Methods 0.000 claims description 3
- 230000009466 transformation Effects 0.000 claims description 3
- 239000013598 vector Substances 0.000 claims description 3
- 238000004458 analytical method Methods 0.000 claims description 2
- 230000010339 dilation Effects 0.000 claims description 2
- 238000010586 diagram Methods 0.000 description 8
- 230000015572 biosynthetic process Effects 0.000 description 3
- 238000013527 convolutional neural network Methods 0.000 description 3
- 238000003786 synthesis reaction Methods 0.000 description 3
- 238000010276 construction Methods 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 230000002452 interceptive effect Effects 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 239000011365 complex material Substances 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000001537 neural effect Effects 0.000 description 1
- 238000003909 pattern recognition Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T17/00—Three dimensional [3D] modelling, e.g. data description of 3D objects
- G06T17/20—Finite element generation, e.g. wire-frame surface description, tesselation
- G06T17/205—Re-meshing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T15/00—3D [Three Dimensional] image rendering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T15/00—3D [Three Dimensional] image rendering
- G06T15/04—Texture mapping
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T17/00—Three dimensional [3D] modelling, e.g. data description of 3D objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T17/00—Three dimensional [3D] modelling, e.g. data description of 3D objects
- G06T17/20—Finite element generation, e.g. wire-frame surface description, tesselation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/90—Dynamic range modification of images or parts thereof
- G06T5/92—Dynamic range modification of images or parts thereof based on global image properties
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/13—Edge detection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/40—Analysis of texture
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/80—Analysis of captured images to determine intrinsic or extrinsic camera parameters, i.e. camera calibration
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/56—Extraction of image or video features relating to colour
Definitions
- the invention relates to the technical field of picture-based rendering and virtual viewpoint synthesis, in particular to a method for virtual roaming of indoor scenes based on picture-based rendering technology combined with reflection decomposition.
- the purpose of virtual tour of indoor scene is to build a system, given the internal and external parameters of the virtual camera, and output the drawn picture of the virtual viewpoint.
- Existing relatively mature virtual roaming applications are mainly based on a series of panoramic images, and a virtual roaming can be performed with pure rotation centered on each panoramic image. Most systems use simple interpolation to move between panoramic images, and the visual error is relatively large at this time. .
- For virtual roaming with large degrees of freedom there are many existing methods to achieve object-level observation or observation of viewpoint movement in a part of the scene, including using a light field camera to explicitly obtain the light field around the target object, see Gortler, Steven J ., et al. "The lumigraph.” Proceedings of the 23rd annual conference on Computer graphics and interactive techniques.
- the present invention provides an indoor scene virtual roaming method based on reflection decomposition, which can perform large-degree-of-freedom navigation in a large indoor scene with reflection effects under the condition of small storage requirements. Virtual tour.
- a method for virtual roaming of indoor scenes based on reflection decomposition comprising the following steps:
- S1 Take pictures enough to cover the scene in the target indoor scene, perform 3D reconstruction of the indoor scene based on the photographed pictures, and obtain a rough global triangular mesh model of the camera's internal and external parameters and the indoor scene;
- S3 Detect the plane in the global triangular mesh model, and use the color consistency between adjacent images to detect whether the plane is a reflective plane. If so, construct a double-layer expression on the reflective area for each picture in which the reflective plane is visible. , used to correctly render the reflection effect on the surface of the object; the two-layer expression includes the foreground and background double-layer triangular meshes and two decomposed pictures of the foreground and background.
- the foreground triangular mesh is used to express the surface geometry of the object, and the background triangular mesh is used It is used to express the mirror image of the scene geometry on the reflection plane, the foreground image is used to express the surface texture of the object after removing the reflection component, and the background image is used to express the reflection component of the scene on the surface of the object. Specifically, it includes the following sub-steps:
- the judgment method is: for each pixel, after mirroring the global triangular mesh model according to the plane equation, find the mirrored image in the matching cost body. The cost corresponding to the depth value determines whether the cost position is a local minimum point. If the number of pixels in the local minimum point of the cost in the image is greater than the pixel number threshold, it is considered that the plane has a reflection component on the image. If a plane has a reflection component The number of visible pictures is greater than the threshold of the number of pictures, then the plane is considered to be a reflective plane;
- each reflection plane calculate its two-dimensional reflection area ⁇ k on each visible picture, specifically: project the reflection plane on the visible picture to obtain a projection depth map, and perform a dilation operation on the projection depth map, Then compare the expanded projected depth map with the aligned depth map to obtain an accurate two-dimensional reflection area.
- For each pixel with a depth value in the projected depth map use the three-dimensional point distance and the normal angle to filter, Taking the filtered pixel area as the reflection area ⁇ k of the reflection plane on the picture;
- H is the Laplacian matrix
- the function ⁇ -1 returns two-dimensional coordinates, and projects the point u in the image I k ' to the image I k according to the depth value and the camera's internal and external parameters; express The depth map obtained by projection; v represents vertices in;
- ⁇ g is the weight of the prior item, which is used to constrain the second round of optimization
- the depth edge of the depth map is aligned to the color edge of the original image, and the aligned depth map is obtained, specifically:
- the neighborhood picture set is calculated according to the internal and external parameters of the virtual camera, the local coordinate system of the current virtual camera is divided into 8 quadrants according to the coordinate axis plane, and in each quadrant, a series of neighborhood pictures are further selected. , using the direction of the optical center of the picture and virtual camera optical center direction angle The distance ⁇ t k -t n ⁇ from the optical center t k of the picture and the optical center t n of the virtual camera, and each quadrant is divided into several regions again; after that, in each region, the smallest similarity d k is selected. 1 image is added to the neighborhood image collection, where ⁇ is the distance proportion weight;
- t k and t n represent the three-dimensional coordinates of the optical center of the picture and the virtual camera
- x represents the three-dimensional coordinates of the corresponding three-dimensional point of the pixel
- each pixel is rendered to a series of triangular patches.
- points are used to represent patches and pixels. The intersection of the determined rays, if the rendering cost of a point is greater than the minimum rendering cost of all points in the pixel + the range threshold ⁇ , then the point does not participate in the calculation of the depth map, so the depths of all the points involved in the calculation of the depth map are compared. Take the minimum value as the depth value of the pixel;
- a super-resolution neural network is trained to compensate for the loss of sharpness caused by downsampling of the stored image, and at the same time reduce possible rendering errors, specifically: after each new virtual perspective is rendered to obtain a depth image and a color image, use the depth image.
- the neural network reduces rendering errors and improves clarity; the network uses the current frame color image and depth image plus the previous frame color image and depth image as input; first, a three-layer convolutional network is used to analyze the current frame depth color image and previous frame respectively. The features are extracted from the depth color image of one frame. The next step is to map the features of the previous frame to the current frame. The initial correspondence is obtained by calculating the depth map.
- the alignment module is used to further fit a local two-dimensional bias.
- the features of the two frames before and after are further aligned, and the features of the two frames before and after the alignment are combined into the super-resolution module implemented by the U-Net convolutional neural network, and the high-definition picture of the current frame is output.
- FIG. 1 is a flowchart of an indoor scene virtual roaming method based on reflection decomposition provided by an embodiment of the present invention
- FIG. 2 is a schematic diagram of a global triangular mesh model provided by an embodiment of the present invention.
- FIG. 3 is a schematic diagram of a construction result of a double-layer expression in a reflection area provided by an embodiment of the present invention
- FIG. 4 is a schematic diagram of a rendering result of a virtual viewpoint with reflection provided by an embodiment of the present invention.
- FIG. 5 is a comparison diagram of whether to use a super-resolution neural network according to an embodiment of the present invention.
- FIG. 6 is a structural diagram of a super-resolution neural network provided by an embodiment of the present invention.
- a method for virtual roaming of indoor scenes based on reflection decomposition includes the following steps:
- the three-dimensional reconstruction software COLMAP or RealityCapture can be used to obtain the internal and external parameters of the camera and the global triangular mesh model.
- convert the aligned depth map into a triangular mesh specifically: convert the depth value into three-dimensional coordinates, connect all horizontal and vertical edges and connect a hypotenuse, and encounter the depth described in the previous steps. Edges disconnect the corresponding edges to obtain a triangular mesh.
- FIG. 3 is a schematic diagram of the construction result of the double-layer expression in the reflection area provided by the embodiment of the present invention.
- the two-layer representation includes two decomposed pictures of foreground and background double-layer triangular meshes and foreground and background.
- the image is used to express the surface texture of the object after removing the reflection component, and the background image is used to express the reflection component of the scene on the surface of the object.
- the calculation of K nearest neighbors is obtained according to the order of the overlap rate of vertices in the global triangular mesh model after plane reflection, including the image I k itself, whose overlap rate must be the highest.
- use Construct a matching cost volume see, Sinha, Sudipta N., et al.
- each reflection plane calculate its two-dimensional reflection area ⁇ k on each visible picture, and specifically, project the reflection plane (with a three-dimensional boundary) onto the visible picture to obtain a projection depth map, and for the projection
- the depth map is expanded (9x9 window can be used), and then the expanded projection depth map is compared with the depth map aligned in the previous step to obtain an accurate two-dimensional reflection area, for each depth value in the projection depth map.
- Pixels are screened using the three-dimensional point distance and the normal angle (reservation of the three-dimensional point distance less than 0.03 meters and the normal angle less than 60 degrees), and the screened pixel area is used as the reflection area of the reflection plane on the picture.
- ⁇ k use the plane equation of the plane to obtain the initial two-layer depth map, specifically, take the projected depth map as the initial foreground depth map, and mirror the camera internal and external parameters of the picture as a virtual camera according to the plane equation, and then Use the global triangular mesh model to render the initial background depth map in the virtual camera.
- the near clipping plane of the rendering needs to be modified as the reflection plane, and then convert the initial foreground and background depth map to a simplified two-layer triangular mesh according to the method in step 2).
- H is the Laplacian matrix
- the function ⁇ -1 returns two-dimensional coordinates, and projects the point u in the image I k ' to the image I k according to the depth value and the camera's internal and external parameters; express The depth map obtained by projection; v represents vertex in .
- the nonlinear conjugate gradient method is used for optimization, and the number of iterations is 30; after that, fixed optimization and The conjugate gradient method is also used, with 30 iterations; one round of optimization is alternated, and the entire optimization process is carried out for a total of two rounds of optimization, and after the first round of optimization, the consistency constraints of foreground images (surface colors) between multiple viewing angles are used.
- the first round of optimization Perform denoising, specifically, it is known that after the first round of optimization and Use the following formula to obtain the denoised image and
- weight ⁇ g is equal to 0.05, which is used to constrain the second round of optimization.
- FIG. 4 is a schematic diagram of a rendering result of a virtual viewpoint with reflection provided by an embodiment of the present invention.
- the goal of the online rendering process is to give the internal and external parameters of the virtual camera, and the output is a virtual picture corresponding to the virtual camera.
- Calculate the neighborhood picture set according to the internal and external parameters of the virtual camera divide the local coordinate system of the current virtual camera into 8 quadrants according to the coordinate axis plane, and further select a series of neighborhood pictures in each quadrant, and use the picture optical center direction and virtual camera optical center direction angle
- the distance ⁇ t k -t n ⁇ from the optical center t k of the picture and the optical center t n of the virtual camera, and each quadrant is divided into several regions again; preferably, it is divided into 9 regions, and the 9 regions are At [0°, 10°), [10°, 20°), [20°, ⁇ ), ⁇ t k -t n ⁇ at [0, 0.6), [0.6, 1.2), [1.2, 1.8), Arrangement and combination of each of the three intervals; after that, in each area, select the image with the smallest similar
- the distance proportion weight ⁇ is equal to 0.1.
- t k and t n represent the three-dimensional coordinates of the optical center of the picture and the virtual camera
- x represents the three-dimensional coordinates of the corresponding three-dimensional point of the pixel
- each pixel is rendered to a series of triangular patches, where "point" is used to represent the patch and The intersection of the rays determined by the pixel, if the rendering cost of a certain point is too large, greater than the minimum rendering cost of all points in the pixel + the range threshold ⁇ , in this embodiment ⁇ is 0.17, then the point does not participate in the calculation of the depth map, so Compare the depths of all points participating in the calculation of the depth map and take the minimum value as the depth value of the pixel.
- the reflection area ⁇ k in the neighborhood picture is also drawn to the current virtual viewpoint, and the reflection area ⁇ n of the current virtual viewpoint is obtained.
- the depth map calculation and color mixing are performed on the images of the two layers respectively. and It is obtained by decomposition after inverse gamma correction.
- the mixed images of the two layers are added, and then a gamma correction is required to obtain the correct image with reflection effect.
- a deep neural network is used to reduce rendering errors and improve clarity.
- the network uses the color picture and depth picture of the current frame plus the color picture and depth picture of the previous frame as input.
- the purpose of using the two frames before and after is to add more effective information and improve timing stability; first, a three-layer image is used.
- the convolutional network extracts features from the depth color image of the current frame and the depth color image of the previous frame, respectively.
- the next step is to warp the features of the previous frame to the current frame.
- the initial correspondence is calculated from the depth map. Since the depth map It is not completely accurate.
- An alignment module (implemented by a convolutional neural network with three layers of convolutional layers) is used to further fit a local two-dimensional offset to further align the features of the two frames before and after, and merge the features of the two frames before and after the alignment.
- (concat) Input the super-resolution module (implemented through the U-Net convolutional neural network), and output the high-definition picture of the current frame.
- a computer device including a memory and a processor, wherein computer-readable instructions are stored in the memory, and when the computer-readable instructions are executed by the processor, the processor is caused to perform the reflection-based decomposition in the foregoing embodiments.
- the steps in the indoor scene virtual walkthrough method including a processor, wherein computer-readable instructions are stored in the memory, and when the computer-readable instructions are executed by the processor, the processor is caused to perform the reflection-based decomposition in the foregoing embodiments.
- a storage medium storing computer-readable instructions.
- the one or more processors can execute the reflection-based decomposition in the above-mentioned embodiments.
- the storage medium may be a non-volatile storage medium.
- ROM Read Only Memory
- RAM Random Access Memory
- magnetic disk or CD etc.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computer Graphics (AREA)
- Software Systems (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Geometry (AREA)
- Multimedia (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Image Generation (AREA)
Abstract
本发明公开了一种基于反射分解的室内场景虚拟漫游方法,首先利用三维重建获得粗略全局三角网格模型投影作为初始深度图,将深度边缘对齐到彩色边缘,将对齐深度图转换为简化的三角网格;检测全局三角网格模型中的平面,如果某平面为反射平面,则对每张可见该反射平面的图片在反射区域上构建双层表达,用于正确渲染物体表面的反射效果;给定虚拟视角,利用邻域图片及三角网格绘制虚拟视角图片,对于反射区域,利用前景背景图片和前景背景三角网格进行绘制。本发明方法可以在较小的存储需求的情况下,在较大的含有反射效果的室内场景中进行大自由度的虚拟漫游。本发明渲染效果好,漫游自由度较大,可绘制部分反射、高光等效果,结果鲁棒。
Description
本发明涉及基于图片的渲染和虚拟视点合成技术领域,特别涉及一种基于图片的渲染技术结合反射分解进行室内场景虚拟漫游的方法。
室内场景虚拟漫游的目的是构建一个系统,给定虚拟相机的内外参,输出虚拟视点的绘制图片。现有比较成熟的虚拟漫游应用主要基于一系列的全景图片,以每个全景图片为中心可以进行纯旋转的虚拟漫游,全景图片间的移动多数系统利用简单的插值进行,这时候视觉误差比较大。对于大自由度的虚拟漫游,现有很多方法做到了物体级别的观察或者对场景中的一部分进行视点移动观察,包括利用光场相机显式的获取目标物体周围的光场,参见Gortler,Steven J.,et al.“The lumigraph.”Proceedings of the 23rd annual conference on Computer graphics and interactive techniques.1996,或者使用普通相机的拍摄图片利用神经网络进行场景的表达和插值,参见Mildenhall,Ben,et al.“Nerf:Representing scenes as neural radiance fields for view synthesis.”Proceedings of the European Conference on Computer Vision.2020。对于较大的室内场景,目前最新的方法可以做到相对自由视点的渲染,但是渲染效果还是不够好,参见Riegler and Koltun.“Free View Synthesis.”Proceedings of the European Conference on Computer Vision.2020。特别的,对于较大的室内场景中存在的各种反射类型(地面、桌子、镜面等),目前仍然没有一个系统可以较好处理具有这种复杂材质的室内漫游。
发明内容
本发明针对现有技术的不足,提供了一种基于反射分解的室内场景虚拟漫游方法,可以在较小的存储需求的情况下,在较大的含有反射效果的室内场景中进行大自由度的虚拟漫游。
为了达到上述目的,本发明采用以下技术方案:一种基于反射分解的室内场景虚拟漫游方法,包括以下步骤:
S1:在目标室内场景拍摄足够覆盖场景的图片,基于拍摄图片对室内场景进行三维重建,获取相机内外参及室内场景的粗略全局三角网格模型;
S2:对于每张图片,将全局三角网格模型投影为对应的深度图,将深度边缘对齐到彩色边缘,将对齐后的深度图转换为三角网格,对三角网格进行网格简化;
S3:检测全局三角网格模型中的平面,利用相邻图像间的颜色一致性,检测该平面是否为反射平面,如果是则对每张可见该反射平面的图片在反射区域上构建双层表达,用于正确 渲染物体表面的反射效果;所述双层表达包含前景背景双层三角网格及前景背景两张分解后的图片,前景三角网格用于表达物体表面几何,背景三角网格用于表达场景几何在反射平面的镜像,前景图片用于表达去除反射分量后的物体表面纹理,背景图片用于表达场景在物体表面的反射分量;具体包括以下子步骤:
S31:检测全局三角网格模型中的平面,保留面积大于面积阈值的平面,将平面投影到可见的图片上面,将可见该平面的图片集合记为
对于
中的每一张图片I
k,计算其K近邻图片集合
K近邻的计算是按照平面反射之后的全局三角网格模型中顶点的重叠率排序获得的;
利用
构建匹配代价体,判断该平面在图片I
k上是否具有足够的反射分量,判断方法为:对于每一个像素,将全局三角网格模型依照平面方程镜像之后,在匹配代价体中寻找镜像后的深度值对应的代价,判断代价位置是否是一个局部最小点,如果该图片中代价局部最小点像素数量大于像素数量阈值,则认为该平面在该图片上具有反射分量,如果某个平面具有反射分量的可见图片数量大于图片数量阈值,则认为该平面为反射平面;
S32:对于每个反射平面,计算其在每一幅可见图片上的二维反射区域β
k,具体为:将反射平面投影到可见的图片上获得投影深度图,对投影深度图进行膨胀操作,然后将膨胀后的投影深度图和对齐的深度图做比较,获取精确的二维反射区域,对于投影深度图中的每个有深度值的像素,利用三维点距离和法线夹角进行筛选,将筛选后的像素区域作为该反射平面在该图片上的反射区域β
k;
S33:对每张可见该反射平面的图片在反射区域上构建双层表达,具体为:将投影深度图作为初始前景深度图,将该图片的相机内外参依照该平面方程镜像为虚拟相机,然后在虚拟相机中利用全局三角网格模型渲染初始背景深度图,将初始前景背景深度图转换为简化的两层三角网格
和
优化的目标是最小化如下能量函数:
其中优化目标中
代表反射层三角网格的刚体变换,其初始值分别为单位矩阵和0,
和
只优化网格顶点三维位置,不改变拓扑结构,E
d、E
s、E
p分别为数据项、平滑项、先验项,λ
s、λ
p为各项的权重,u表示
中的像素;具体的:
给定初始值,使用非线性共轭梯度法进行优化;之后,固定
优化
和
同样使用共轭梯度法;一次交替为一轮优化,整个优化过程总共进行两轮优化,并且在第一轮优化之后,利用多个视角间前景图片的一致性约束对于第一轮优化后的
进行去噪,具体的,已知第一轮优化后的
和
利用如下公式获取去噪后的图像
和
其中λ
g为先验项权重,用于约束第二轮优化;
S4:给定虚拟视角,利用邻域图片及三角网格绘制虚拟视角图片,对于反射区域,利用前景背景图片和前景背景三角网格进行绘制,具体为:将邻域图片中的反射区域β
k绘制到当前虚拟视点,获取当前虚拟视点的反射区域β
n,对于反射区域内的像素,需要利用前景背景两层的图片和简化的三角网格进行绘制,分别对两层的图像进行深度图的计算和颜色混合,由于两层图片
和
是进行过反伽马矫正之后分解获得的,在渲染阶段,对两层混合的图片相加,做一次伽马矫正获得正确的带反射效果的图片。
进一步地,所述S2中,将深度图的深度边缘对齐到原图片的彩色边缘,获取对齐的深度图,具体为:
首先计算深度图对应的法线图,然后对于深度图中的每个像素i,将其深度值d
i按照相机内参转换为局部坐标系的三维点v
i,计算相邻点i,j之间的平面距离dt
ij=max(|(v
i-v
j)·n
i|,|(v
i-v
j)·n
j|),n
i,n
j分别为点i,j的法线向量,如果dt
ij大于λ*max(1,min(d
i,d
j)),则将该像素记为深度边缘像素,其中λ为边缘检测阈值;
对于每一幅图片,在获取所有深度边缘像素之后,利用索贝尔卷积计算深度边缘的局部二维梯度,然后以每一个深度边缘像素为起点,沿着边缘二维梯度方向及其反方向同时逐个像素遍历,直到两边的其中一边遍历到彩色边缘像素;在遍历到彩色边缘像素之后,删除起点像素到该彩色边缘像素中间路径的所有像素的深度值;将删除深度值的像素定义为未对齐像素,将未删除深度值的像素定义对齐像素,对于每个被删除的深度值,利用周围未删除的深度值进行插值填充。
进一步地,对于每个被删除的深度值,利用周围未删除的深度值进行插值填充,具体为:对于每个待插值的未对齐像素p
i,计算其到所有其他对齐像素的测地线距离d
g(p
i,p
j),利用测地线距离找到m个最近的对齐像素,计算插值的深度值
其中
表示像素p
i的最近邻对齐像素集合,w
g(i,j)=exp(-d
g(p
i,p
j)),
表示将像素p
i投影到p
j的局部平面方程上,该局部平面方程由v
j和n
j计算获得。
进一步地,所述S4中,根据虚拟相机内外参计算邻域图片集合,将当前虚拟相机的局部坐标系按照坐标轴平面分割为8个象限,在每个象限中,进一步选取一系列邻域图片,利用图片光心方向
和虚拟相机光心方向
的夹角
和图片光心t
k和虚拟相机光心t
n的距离‖t
k-t
n‖,将每个象限再一次分割为若干区域;之后,在每个区域内,选出相似性d
k最小的1张图片加入到邻域图片集合中,
其中λ为距离占比权重;
在获得邻域图片集合之后,将邻域图片集合中的每一幅图片按照其对应的简化后的三角网格绘制到虚拟视点,具体为:
a)计算一幅鲁棒深度图,对于面片着色器的每一个像素,计算其渲染代价c(t
k,t
n,x):
c(t
k,t
n,x)=∠(t
k-x,t
n-x)*π/180+max(0,1-‖t
n-x‖/‖t
k-x‖)
其中t
k和t
n代表图片和虚拟相机的光心三维坐标,x表示该像素对应三维点的三维坐标,每个像素有一系列的三角面片渲染到,这里用“点”表示面片和像素所决定光线的交点,如果某个点的渲染代价大于该像素中所有点的最小渲染代价+范围阈值λ,那么该点不参与计算深 度图,如此将所有参与计算深度图的点的深度进行比较取最小值,作为该像素的深度值;
b)计算完虚拟相机的深度图,将图片作为纹理贴图加入三角网格进行绘制,对于每个虚拟相机图片的像素,将深度图附近的点的颜色按照设定权重w
k进行混合,获取最终的渲染颜色。
进一步地,所述S4中,为了减小存储规模,所有图片降采样到1/n存储,n≥1,渲染的时候虚拟窗口设置为原图大小。
进一步地,训练一个超分辨率神经网络补偿存储图片降采样造成的清晰度损失,同时减少可能存在的绘制错误,具体为:在每一个新的虚拟视角渲染获得深度图片和彩色图片之后,利用深度神经网络减少渲染错误并且提升清晰度;网络利用当前帧彩色图片和深度图片加上前一帧彩色图片和深度图片作为输入;首先利用一个三层卷积网络分别对当前帧的深度彩色图片和前一帧的深度彩色图片提取特征,下一步将前一帧特征扭曲映射到当前帧,初始的对应由深度图计算获得,由于深度图并不完全准确,利用对齐模块进一步拟合一个局部二维偏移将前后两帧特征进行进一步对齐,将对齐后的前后两帧特征合并输入通过U-Net卷积神经网络实现的超分辨模块,输出当前帧的高清图片。
本发明的有益效果在于:
1、构建一套完整的流程可以处理大数量的拍摄数据,对较大规模的室内场景实现较大自由度的虚拟视点漫游;
2、检测室内场景中的反射面和图片中的反射区域,对反射区域构建双层表达,使得在室内场景漫游过程中可以较好的渲染出反射效果,极大提高渲染真实感;
3、通过后接一个专用的超分辨率神经网络,在减少渲染错误的同时,减少支持单个场景漫游需要的图片分辨率,从而减少存储和内存消耗。
图1为本发明实施例提供的基于反射分解的室内场景虚拟漫游方法流程图;
图2为本发明实施例提供的全局三角网格模型示意图;
图3为本发明实施例提供的反射区域双层表达构建结果示意图;
图4为本发明实施例提供的带反射虚拟视点绘制结果示意图;
图5为本发明实施例提供的是否使用超分辨率神经网络结果对比图;
图6为本发明实施例提供的超分辨率神经网络结构图。
下面结合附图和具体实施例对本发明作进一步详细说明,应当理解,此处描述的具体实施方式仅仅用以解释本发明,并不用于限定本发明。
如图1所示,本发明实施例提供的一种基于反射分解的室内场景虚拟漫游方法,该方法包括以下步骤:
(1)在目标室内场景拍摄足够覆盖场景的图片,基于拍摄图片对室内场景进行三维重建,如图2所示,获取相机内外参及室内场景的粗略全局三角网格模型。
具体的,可以采用三维重建软件COLMAP或者RealityCapture获取相机内外参及全局三角网格模型。
(2)对于每张图片,将全局三角网格模型投影为对应的深度图,将深度边缘对齐到彩色边缘,将对齐后的深度图转换为三角网格,对三角网格进行网格简化。
具体的,由于全局三角网格模型包含一些错误,将投影出来的深度图的深度边缘对齐到原图片的彩色边缘,获取对齐的深度图,该步骤具体为:
首先计算深度图对应的法线图,然后对于深度图中的每个像素i,将其深度值d
i按照相机内参转换为局部坐标系的三维点v
i,计算相邻点i,j之间的平面距离dt
ij=max(|(v
i-v
j)·n
i|,|(v
i-v
j)·n
j|),n
i,n
j分别为点i,j的法线向量,如果dt
ij大于λ*max(1,min(d
i,d
j)),则将该像素记为深度边缘像素,其中λ为边缘检测阈值,本实施例中取λ=0.01。
对于每一幅图片,在获取所有深度边缘像素之后,利用索贝尔(Sobel)卷积计算深度边缘的局部二维梯度,然后以每一个深度边缘像素为起点,沿着边缘二维梯度方向及其反方向同时逐个像素遍历,直到两边的其中一边遍历到彩色边缘像素,其中彩色边缘像素由坎尼(Canny)边缘提取算法获得;在遍历到彩色边缘像素之后,删除起点像素到该彩色边缘像素中间路径的所有像素的深度值;将删除深度值的像素定义为未对齐像素,将未删除深度值的像素定义对齐像素,对于每个被删除的深度值,利用周围未删除的深度值进行插值填充,具体的,对于每个待插值的未对齐像素p
i,计算其到所有其他对齐像素的测地线距离d
g(p
i,p
j),参见Revaud,Jerome,et al.“Epicflow:Edge-preserving interpolation of correspondences for optical flow.”Proceedings of the IEEE conference on computer vision and pattern recognition.2015,利用测地线距离找到m个(本实施例中取m=4)最近的对齐像素,计算插值的深度值
其中
表示像素p
i的最近邻对齐像素集合,w
g(i,j)=exp(-d
g(p
i,p
j)),
表示将像素p
i投影到p
j的局部平面方程上,该局部平面方程由v
j和n
j计算获得。
具体的,深度图对齐后,将对齐后的深度图转换为三角网格,具体为:将深度值转换为三维坐标,将所有横竖边连接并且连接一条斜边,遇到前面步骤所述的深度边缘则断开相应的边,获取三角网格。
具体的,调用网格简化算法对生成的三角网格进行网格简化,参见Garland,Michael,and Paul S.Heckbert.“Surface simplification using quadric error metrics.”Proceedings of the 24th annual conference on Computer graphics and interactive techniques.1997。
(3)检测全局三角网格模型中的平面,利用相邻图像间的颜色一致性,检测该平面是否为反射平面,如果是则对每张可见该反射平面的图片在反射区域上构建双层表达,用于正确渲染物体表面的反射效果。图3为本发明实施例提供的反射区域双层表达构建结果示意图。
所述双层表达包含前景背景双层三角网格及前景背景两张分解后的图片,前景三角网格用于表达物体表面几何,背景三角网格用于表达场景几何在反射平面的镜像,前景图片用于表达去除反射分量后的物体表面纹理,背景图片用于表达场景在物体表面的反射分量。
具体的,首先检测全局三角网格模型中的平面,保留面积大于面积阈值的平面(本实施例中面积阈值取0.09m
2),将平面投影到可见的图片上面,将可见该平面的图片集合记为
对于
中的每一张图片I
k,计算其K近邻(本实施例中取K=6)图片集合
这里K近邻的计算是按照平面反射之后的全局三角网格模型中顶点的重叠率排序获得的,包含图片I
k自身,其重叠率一定是最高的。之后,利用
构建匹配代价体(cost volume),参见,Sinha,Sudipta N.,et al.“Image-based rendering for scenes with reflections.”ACM Transactions on Graphics(TOG)31.4(2012):1-10,判断该平面在图片I
k上是否具有足够的反射分量,具体的,对于每一个像素,将全局三角网格模型依照平面方程镜像之后,在匹配代价体中寻找镜像后的深度值对应的代价,判断代价位置是否是一个局部最小点,如果该图片中代价局部最小点像素数量大于像素数量阈值(本实施例中取50),则认为该平面在该图片上具有反射分量,如果某个平面具有反射分量的可见图片数量大于图片数量阈值(本实施例中取5),则认为该平面为反射平面。
具体的,对于每个反射平面,计算其在每一幅可见图片上的二维反射区域β
k,具体的,将反射平面(具有三维边界)投影到可见的图片上获得投影深度图,对投影深度图进行膨胀操作(可以采用9x9窗口),然后将膨胀后的投影深度图和前面步骤对齐的深度图做比较,获取精确的二维反射区域,对于投影深度图中的每个有深度值的像素,利用三维点距离和法线夹角进行筛选(将三维点距离小于0.03米并且法线夹角小于60度的保留),将筛选后的像素区域作为该反射平面在该图片上的反射区域β
k;同时,利用该平面的平面方程,获取初始的两层深度图,具体的,将投影深度图作为初始前景深度图,将该图片的相机内外参依照该平面方程镜像为虚拟相机,然后在虚拟相机中利用全局三角网格模型渲染初始背景深度图,注意需要修改渲染的近剪裁平面为反射平面,然后按照步骤2)的方法将初始前景背景深度图转换为简化的两层三角网格
和
优化的目标是最小化如下能量函数:
其中优化目标中
代表反射层三角网格的刚体变换,其初始值分别为单位矩阵和0,
和
只优化网格顶点三维位置,不改变拓扑结构,E
d、E
s、E
p分别为数据项、平滑项、先验项,λ
s、λ
p为各项的权重,分别为0.04和0.01,u表示
中的像素;具体的:
给定初始值,使用非线性共轭梯度法进行优化,迭代次数为30次;之后,固定
优化
和
同样使用共轭梯度法,迭代30次;一次交替为一轮优化,整个优化过程总共进行两轮优化,并且在第一轮优化之后,利用多个视角间前景图片(表面颜色)的一致性约束对于第一轮优化后的
进行去噪,具体的,已知第一轮优化后的
和
利用如下公式获取去噪后的图像
和
其中先验项权重λ
g等于0.05,用于约束第二轮优化。
(4)给定虚拟视角,利用邻域图片及三角网格绘制虚拟视角图片,对于反射区域,利用前景背景图片和前景背景三角网格进行绘制。图4为本发明实施例提供的带反射虚拟视点绘制结果示意图。
具体的,在线渲染流程的目标是给定虚拟相机的内外参,输出是对应该虚拟相机的虚拟图片。具体的:根据虚拟相机内外参计算邻域图片集合,将当前虚拟相机的局部坐标系按照坐标轴平面分割为8个象限,在每个象限中,进一步选取一系列邻域图片,利用图片光心方向
和虚拟相机光心方向
的夹角
和图片光心t
k和虚拟相机光心t
n的距离‖t
k-t
n‖,将每个象限再一次分割为若干区域;优选的,分割为9个区域,9个区域为
在[0°,10°)、[10°,20°)、[20°,∞),‖t
k-t
n‖在[0,0.6)、[0.6,1.2)、[1.2,1.8),各自三个区间的排列组合;之后,在每个区域内,选出相似性d
k最小的1张图片加入到邻域图片集合中:
其中,距离占比权重λ等于0.1。
在获得邻域图片集合之后,将邻域图片集合中的每一幅图片按照其对应的简化后的三角网格绘制到虚拟视点,具体为:
a)计算一幅鲁棒深度图,对于面片着色器的每一个像素,计算其渲染代价c(t
k,t
n,x):
c(t
k,t
n,x)=∠(t
k-x,t
n-x)*π/180+max(0,1-‖t
n-x‖/‖t
k-x‖)
其中,t
k和t
n代表图片和虚拟相机的光心三维坐标,x表示该像素对应三维点的三维坐标,每个像素有一系列的三角面片渲染到,这里用“点”表示面片和像素所决定光线的交点,如果某个点的渲染代价过大,大于该像素中所有点的最小渲染代价+范围阈值λ,本实施例中λ取0.17,那么该点不参与计算深度图,如此将所有参与计算深度图的点的深度进行比较取最小值,作为该像素的深度值。
b)计算完虚拟相机的深度图,将图片作为纹理贴图加入三角网格进行绘制,对于每个虚拟相机图片的像素,将深度图附近(距离小于3cm)的点的颜色按照设定权重w
k(w
k=exp(-d
k/0.033))进行混合(blending),获取最终的渲染颜色。
具体的,将邻域图片中的反射区域β
k也绘制到当前虚拟视点,获取当前虚拟视点的反射区域β
n,对于反射区域内的像素,需要利用前景背景两层的图片和简化的三角网格进行绘制, 按照上述步骤分别对两层的图像进行深度图的计算和颜色混合,由于两层图片
和
是进行过反伽马矫正之后分解获得的,在渲染阶段,对两层混合的图片相加,然后需要做一次伽马矫正获得正确的带反射效果的图片。
具体的,上述渲染步骤中,为了减小存储,所有图片降采样到1/n存储(n≥1,本实施例中n取4),渲染的时候虚拟窗口设置为原图大小,如此渲染的虚拟视点图片分辨率不变但是会较为模糊,在下一步利用超分辨率神经网络提升清晰度。
(5)训练一个超分辨率神经网络补偿存储图片降采样造成的清晰度损失,同时减少可能存在的绘制错误;图5为本发明实施例提供的是否使用超分辨率神经网络结果对比图,图6为本发明实施例提供的超分辨率神经网络结构图。
具体的,在每一个新的虚拟视角渲染获得深度图片和彩色图片之后,利用深度神经网络减少渲染错误并且提升清晰度。具体的,网络利用当前帧彩色图片和深度图片加上前一帧彩色图片和深度图片作为输入,使用前后两帧图片的目的是加入更多的有效信息并且提升时序稳定性;首先利用一个三层卷积网络分别对当前帧的深度彩色图片和前一帧的深度彩色图片提取特征,下一步将前一帧特征扭曲映射(warp)到当前帧,初始的对应由深度图计算获得,由于深度图并不完全准确,利用一个对齐模块(通过三层卷积层的卷积神经网络实现)进一步拟合一个局部二维偏移将前后两帧特征进行进一步对齐,将对齐后的前后两帧特征合并(concat)输入超分辨模块(通过U-Net卷积神经网络实现),输出当前帧的高清图片。
在一个实施例中,提出了一种计算机设备,包括存储器和处理器,存储器中存储有计算机可读指令,计算机可读指令被处理器执行时,使得处理器执行上述各实施例中基于反射分解的室内场景虚拟漫游方法中的步骤。
在一个实施例中,提出了一种存储有计算机可读指令的存储介质,计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器执行上述各实施例中基于反射分解的室内场景虚拟漫游方法中的步骤。其中,存储介质可以为非易失性存储介质。
本领域普通技术人员可以理解上述实施例的各种方法中的全部或部分步骤是可以通过程序来指令相关的硬件来完成,该程序可以存储于一计算机可读存储介质中,存储介质可以包括:只读存储器(ROM,Read Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁盘或光盘等。
以上所述仅为本说明书一个或多个实施例的较佳实施例而已,并不用以限制本说明书一个或多个实施例,凡在本说明书一个或多个实施例的精神和原则之内,所做的任何修改、等同替换、改进等,均应包含在本说明书一个或多个实施例保护的范围之内。
Claims (6)
- 一种基于反射分解的室内场景虚拟漫游方法,其特征在于,包括以下步骤:S1:在目标室内场景拍摄足够覆盖场景的图片,基于拍摄图片对室内场景进行三维重建,获取相机内外参及室内场景的粗略全局三角网格模型;S2:对于每张图片,将全局三角网格模型投影为对应的深度图,将深度边缘对齐到彩色边缘,将对齐后的深度图转换为三角网格,对三角网格进行网格简化;S3:检测全局三角网格模型中的平面,利用相邻图像间的颜色一致性,检测该平面是否为反射平面,如果是则对每张可见该反射平面的图片在反射区域上构建双层表达,用于正确渲染物体表面的反射效果;所述双层表达包含前景背景双层三角网格及前景背景两张分解后的图片,前景三角网格用于表达物体表面几何,背景三角网格用于表达场景几何在反射平面的镜像,前景图片用于表达去除反射分量后的物体表面纹理,背景图片用于表达场景在物体表面的反射分量;具体包括以下子步骤:S31:检测全局三角网格模型中的平面,保留面积大于面积阈值的平面,将平面投影到可见的图片上面,将可见该平面的图片集合记为 对于 中的每一张图片I k,计算其K近邻图片集合 K近邻的计算是按照平面反射之后的全局三角网格模型中顶点的重叠率排序获得的;利用 构建匹配代价体,判断该平面在图片I k上是否具有足够的反射分量,判断方法为:对于每一个像素,将全局三角网格模型依照平面方程镜像之后,在匹配代价体中寻找镜像后的深度值对应的代价,判断代价位置是否是一个局部最小点,如果该图片中代价局部最小点像素数量大于像素数量阈值,则认为该平面在该图片上具有反射分量,如果某个平面具有反射分量的可见图片数量大于图片数量阈值,则认为该平面为反射平面;S32:对于每个反射平面,计算其在每一幅可见图片上的二维反射区域β k,具体为:将反射平面投影到可见的图片上获得投影深度图,对投影深度图进行膨胀操作,然后将膨胀后的投影深度图和对齐的深度图做比较,获取精确的二维反射区域,对于投影深度图中的每个有深度值的像素,利用三维点距离和法线夹角进行筛选,将筛选后的像素区域作为该反射平面在该图片上的反射区域β k;S33:对每张可见该反射平面的图片在反射区域上构建双层表达,具体为:将投影深度图作为初始前景深度图,将该图片的相机内外参依照该平面方程镜像为虚拟相机,然后在虚拟相机中利用全局三角网格模型渲染初始背景深度图,将初始前景背景深度图转换为简化的两 层三角网格 和优化的目标是最小化如下能量函数:其中优化目标中 代表反射层三角网格的刚体变换,其初始值分别为单位矩阵和0, 和 只优化网格顶点三维位置,不改变拓扑结构,E d、E s、E p分别为数据项、平滑项、先验项,λ s、λ p为各项的权重,u表示 中的像素;具体的:给定初始值,使用非线性共轭梯度法进行优化;之后,固定 优化 和 同样使用共轭梯度法;一次交替为一轮优化,整个优化过程总共进行两轮优化,并且在第一轮优化之后,利用多个视角间前景图片的一致性约束对于第一轮优化后的 进行去噪,具体的,已知第一轮优化后的 和 利用如下公式获取去噪后的图像 和其中λ g为先验项权重,用于约束第二轮优化;
- 根据权利要求1所述的一种基于反射分解的室内场景虚拟漫游方法,其特征在于,所述S2中,将深度图的深度边缘对齐到原图片的彩色边缘,获取对齐的深度图,具体为:首先计算深度图对应的法线图,然后对于深度图中的每个像素i,将其深度值d i按照相机内参转换为局部坐标系的三维点v i,计算相邻点i,j之间的平面距离dt ij=max(|(v i-v j)·n i|,|(v i-v j)·n j|),n i,n j分别为点i,j的法线向量,如果dt ij大于λ*max(1,min(d i,d j)),则将该像素记为深度边缘像素,其中λ为边缘检测阈值;对于每一幅图片,在获取所有深度边缘像素之后,利用索贝尔卷积计算深度边缘的局部二维梯度,然后以每一个深度边缘像素为起点,沿着边缘二维梯度方向及其反方向同时逐个像素遍历,直到两边的其中一边遍历到彩色边缘像素;在遍历到彩色边缘像素之后,删除起点像素到该彩色边缘像素中间路径的所有像素的深度值;将删除深度值的像素定义为未对齐像素,将未删除深度值的像素定义对齐像素,对于每个被删除的深度值,利用周围未删除的深度值进行插值填充。
- 根据权利要求1所述的一种基于反射分解的室内场景虚拟漫游方法,其特征在于,所 述S4中,根据虚拟相机内外参计算邻域图片集合,将当前虚拟相机的局部坐标系按照坐标轴平面分割为8个象限,在每个象限中,进一步选取一系列邻域图片,利用图片光心方向 和虚拟相机光心方向 的夹角 和图片光心t k和虚拟相机光心t n的距离‖t k-t n‖,将每个象限再一次分割为若干区域;之后,在每个区域内,选出相似性d k最小的1张图片加入到邻域图片集合中, 其中λ为距离占比权重;在获得邻域图片集合之后,将邻域图片集合中的每一幅图片按照其对应的简化后的三角网格绘制到虚拟视点,具体为:a)计算一幅鲁棒深度图,对于面片着色器的每一个像素,计算其渲染代价c(t k,t n,x):c(t k,t n,x)=∠(t k-x,t n-x)*π/180+max(0,1-‖t n-x‖/‖t k-x‖)其中t k和t n代表图片和虚拟相机的光心三维坐标,x表示该像素对应三维点的三维坐标,每个像素有一系列的三角面片渲染到,这里用“点”表示面片和像素所决定光线的交点,如果某个点的渲染代价大于该像素中所有点的最小渲染代价+范围阈值λ,那么该点不参与计算深度图,如此将所有参与计算深度图的点的深度进行比较取最小值,作为该像素的深度值;b)计算完虚拟相机的深度图,将图片作为纹理贴图加入三角网格进行绘制,对于每个虚拟相机图片的像素,将深度图附近的点的颜色按照设定权重w k进行混合,获取最终的渲染颜色。
- 根据权利要求1所述的一种基于反射分解的室内场景虚拟漫游方法,其特征在于,所述S4中,为了减小存储规模,所有图片降采样到1/n存储,n≥1,渲染的时候虚拟窗口设置为原图大小。
- 根据权利要求1-5任一项所述的一种基于反射分解的室内场景虚拟漫游方法,其特征在于,训练一个超分辨率神经网络补偿存储图片降采样造成的清晰度损失,同时减少可能存在的绘制错误,具体为:在每一个新的虚拟视角渲染获得深度图片和彩色图片之后,利用深度神经网络减少渲染错误并且提升清晰度;网络利用当前帧彩色图片和深度图片加上前一帧彩色图片和深度图片作为输入;首先利用一个三层卷积网络分别对当前帧的深度彩色图片和前一帧的深度彩色图片提取特征,下一步将前一帧特征扭曲映射到当前帧,初始的对应由深度图计算获得,由于深度图并不完全准确,利用对齐模块进一步拟合一个局部二维偏移将前后两帧特征进行进一步对齐,将对齐后的前后两帧特征合并输入通过U-Net卷积神经网络实现的超分辨模块,输出当前帧的高清图片。
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2021/088788 WO2022222077A1 (zh) | 2021-04-21 | 2021-04-21 | 基于反射分解的室内场景虚拟漫游方法 |
US18/490,790 US20240169674A1 (en) | 2021-04-21 | 2023-10-20 | Indoor scene virtual roaming method based on reflection decomposition |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2021/088788 WO2022222077A1 (zh) | 2021-04-21 | 2021-04-21 | 基于反射分解的室内场景虚拟漫游方法 |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US18/490,790 Continuation US20240169674A1 (en) | 2021-04-21 | 2023-10-20 | Indoor scene virtual roaming method based on reflection decomposition |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2022222077A1 true WO2022222077A1 (zh) | 2022-10-27 |
Family
ID=83723623
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2021/088788 WO2022222077A1 (zh) | 2021-04-21 | 2021-04-21 | 基于反射分解的室内场景虚拟漫游方法 |
Country Status (2)
Country | Link |
---|---|
US (1) | US20240169674A1 (zh) |
WO (1) | WO2022222077A1 (zh) |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116546183A (zh) * | 2023-04-06 | 2023-08-04 | 华中科技大学 | 一种基于单帧图像的3d动态视频生成方法 |
CN116758136A (zh) * | 2023-08-21 | 2023-09-15 | 杭州蓝芯科技有限公司 | 一种货物体积实时在线识别方法、系统、设备以及介质 |
CN116883607A (zh) * | 2023-09-06 | 2023-10-13 | 四川物通科技有限公司 | 基于辐射传输的虚拟现实场景生成系统 |
CN116958449A (zh) * | 2023-09-12 | 2023-10-27 | 北京邮电大学 | 城市场景三维建模方法、装置及电子设备 |
CN117011446A (zh) * | 2023-08-23 | 2023-11-07 | 苏州深捷信息科技有限公司 | 一种动态环境光照的实时渲染方法 |
CN117934700A (zh) * | 2023-11-15 | 2024-04-26 | 广州极点三维信息科技有限公司 | 基于神经渲染的三维家居漫游场景重建方法、系统及介质 |
CN117994444A (zh) * | 2024-04-03 | 2024-05-07 | 浙江华创视讯科技有限公司 | 复杂场景的重建方法、设备及存储介质 |
CN118135079A (zh) * | 2024-05-07 | 2024-06-04 | 中国人民解放军国防科技大学 | 基于云端融合的三维场景漫游绘制方法、装置及设备 |
CN118135079B (zh) * | 2024-05-07 | 2024-07-09 | 中国人民解放军国防科技大学 | 基于云端融合的三维场景漫游绘制方法、装置及设备 |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150178988A1 (en) * | 2012-05-22 | 2015-06-25 | Telefonica, S.A. | Method and a system for generating a realistic 3d reconstruction model for an object or being |
CN109064533A (zh) * | 2018-07-05 | 2018-12-21 | 深圳奥比中光科技有限公司 | 一种3d漫游方法及系统 |
CN110288712A (zh) * | 2019-03-30 | 2019-09-27 | 天津大学 | 室内场景的稀疏多视角三维重建方法 |
CN110458939A (zh) * | 2019-07-24 | 2019-11-15 | 大连理工大学 | 基于视角生成的室内场景建模方法 |
CN111652963A (zh) * | 2020-05-07 | 2020-09-11 | 浙江大学 | 一种基于神经网络的增强现实绘制方法 |
-
2021
- 2021-04-21 WO PCT/CN2021/088788 patent/WO2022222077A1/zh active Application Filing
-
2023
- 2023-10-20 US US18/490,790 patent/US20240169674A1/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150178988A1 (en) * | 2012-05-22 | 2015-06-25 | Telefonica, S.A. | Method and a system for generating a realistic 3d reconstruction model for an object or being |
CN109064533A (zh) * | 2018-07-05 | 2018-12-21 | 深圳奥比中光科技有限公司 | 一种3d漫游方法及系统 |
CN110288712A (zh) * | 2019-03-30 | 2019-09-27 | 天津大学 | 室内场景的稀疏多视角三维重建方法 |
CN110458939A (zh) * | 2019-07-24 | 2019-11-15 | 大连理工大学 | 基于视角生成的室内场景建模方法 |
CN111652963A (zh) * | 2020-05-07 | 2020-09-11 | 浙江大学 | 一种基于神经网络的增强现实绘制方法 |
Non-Patent Citations (2)
Title |
---|
JIANG HANGQING;ZHAO CHANGFEI;ZHANG GUOFENG;WANG HUIYAN;BAO HUJUN: "Multi-View Depth Map Sampling for 3D Reconstruction of Natural Scene", JOURNAL OF COMPUTER-AIDED DESIGN & COMPUTER GRAPHICS, vol. 27, no. 10, 15 October 2015 (2015-10-15), pages 1805 - 1815, XP055979893 * |
REVAUD JEROME; WEINZAEPFEL PHILIPPE; HARCHAOUI ZAID; SCHMID CORDELIA: "EpicFlow: Edge-preserving interpolation of correspondences for optical flow", 2015 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 7 June 2015 (2015-06-07), pages 1164 - 1172, XP032793523, DOI: 10.1109/CVPR.2015.7298720 * |
Cited By (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116546183B (zh) * | 2023-04-06 | 2024-03-22 | 华中科技大学 | 基于单帧图像的具有视差效果的动态图像生成方法及系统 |
CN116546183A (zh) * | 2023-04-06 | 2023-08-04 | 华中科技大学 | 一种基于单帧图像的3d动态视频生成方法 |
CN116758136A (zh) * | 2023-08-21 | 2023-09-15 | 杭州蓝芯科技有限公司 | 一种货物体积实时在线识别方法、系统、设备以及介质 |
CN116758136B (zh) * | 2023-08-21 | 2023-11-10 | 杭州蓝芯科技有限公司 | 一种货物体积实时在线识别方法、系统、设备以及介质 |
CN117011446A (zh) * | 2023-08-23 | 2023-11-07 | 苏州深捷信息科技有限公司 | 一种动态环境光照的实时渲染方法 |
CN117011446B (zh) * | 2023-08-23 | 2024-03-08 | 苏州深捷信息科技有限公司 | 一种动态环境光照的实时渲染方法 |
CN116883607A (zh) * | 2023-09-06 | 2023-10-13 | 四川物通科技有限公司 | 基于辐射传输的虚拟现实场景生成系统 |
CN116883607B (zh) * | 2023-09-06 | 2023-12-05 | 四川物通科技有限公司 | 基于辐射传输的虚拟现实场景生成系统 |
CN116958449A (zh) * | 2023-09-12 | 2023-10-27 | 北京邮电大学 | 城市场景三维建模方法、装置及电子设备 |
CN116958449B (zh) * | 2023-09-12 | 2024-04-30 | 北京邮电大学 | 城市场景三维建模方法、装置及电子设备 |
CN117934700A (zh) * | 2023-11-15 | 2024-04-26 | 广州极点三维信息科技有限公司 | 基于神经渲染的三维家居漫游场景重建方法、系统及介质 |
CN117994444A (zh) * | 2024-04-03 | 2024-05-07 | 浙江华创视讯科技有限公司 | 复杂场景的重建方法、设备及存储介质 |
CN118135079A (zh) * | 2024-05-07 | 2024-06-04 | 中国人民解放军国防科技大学 | 基于云端融合的三维场景漫游绘制方法、装置及设备 |
CN118135079B (zh) * | 2024-05-07 | 2024-07-09 | 中国人民解放军国防科技大学 | 基于云端融合的三维场景漫游绘制方法、装置及设备 |
Also Published As
Publication number | Publication date |
---|---|
US20240169674A1 (en) | 2024-05-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2022222077A1 (zh) | 基于反射分解的室内场景虚拟漫游方法 | |
US11727587B2 (en) | Method and system for scene image modification | |
Penner et al. | Soft 3d reconstruction for view synthesis | |
CN105245841B (zh) | 一种基于cuda的全景视频监控系统 | |
CN106651938B (zh) | 一种融合高分辨率彩色图像的深度图增强方法 | |
Lei et al. | Depth map super-resolution considering view synthesis quality | |
CN106570507B (zh) | 单目视频场景三维结构的多视角一致的平面检测解析方法 | |
Li et al. | Detail-preserving and content-aware variational multi-view stereo reconstruction | |
Bradley et al. | Accurate multi-view reconstruction using robust binocular stereo and surface meshing | |
Dolson et al. | Upsampling range data in dynamic environments | |
US5963664A (en) | Method and system for image combination using a parallax-based technique | |
CN111243071A (zh) | 实时三维人体重建的纹理渲染方法、系统、芯片、设备和介质 | |
CN113223132B (zh) | 基于反射分解的室内场景虚拟漫游方法 | |
WO2016110239A1 (zh) | 图像处理方法和装置 | |
CN112434709A (zh) | 基于无人机实时稠密三维点云和dsm的航测方法及系统 | |
Ma et al. | An operational superresolution approach for multi-temporal and multi-angle remotely sensed imagery | |
CN111553841A (zh) | 一种基于最佳缝合线更新的实时视频拼接算法 | |
CN116958437A (zh) | 融合注意力机制的多视图重建方法及系统 | |
Xu et al. | Hybrid mesh-neural representation for 3D transparent object reconstruction | |
Chen et al. | Kinect depth recovery using a color-guided, region-adaptive, and depth-selective framework | |
Pan et al. | Depth map completion by jointly exploiting blurry color images and sparse depth maps | |
Xu et al. | Scalable image-based indoor scene rendering with reflections | |
Galea et al. | Denoising of 3D point clouds constructed from light fields | |
CN112132971A (zh) | 三维人体建模方法、装置、电子设备和存储介质 | |
Coorg | Pose imagery and automated three-dimensional modeling of urban environments |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 21937320 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 21937320 Country of ref document: EP Kind code of ref document: A1 |