WO2022222077A1 - 基于反射分解的室内场景虚拟漫游方法 - Google Patents

基于反射分解的室内场景虚拟漫游方法 Download PDF

Info

Publication number
WO2022222077A1
WO2022222077A1 PCT/CN2021/088788 CN2021088788W WO2022222077A1 WO 2022222077 A1 WO2022222077 A1 WO 2022222077A1 CN 2021088788 W CN2021088788 W CN 2021088788W WO 2022222077 A1 WO2022222077 A1 WO 2022222077A1
Authority
WO
WIPO (PCT)
Prior art keywords
reflection
pixel
image
depth
plane
Prior art date
Application number
PCT/CN2021/088788
Other languages
English (en)
French (fr)
Inventor
许威威
许佳敏
吴秀超
朱紫涵
鲍虎军
Original Assignee
浙江大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 浙江大学 filed Critical 浙江大学
Priority to PCT/CN2021/088788 priority Critical patent/WO2022222077A1/zh
Publication of WO2022222077A1 publication Critical patent/WO2022222077A1/zh
Priority to US18/490,790 priority patent/US20240169674A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects
    • G06T17/20Finite element generation, e.g. wire-frame surface description, tesselation
    • G06T17/205Re-meshing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T15/003D [Three Dimensional] image rendering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T15/003D [Three Dimensional] image rendering
    • G06T15/04Texture mapping
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects
    • G06T17/20Finite element generation, e.g. wire-frame surface description, tesselation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/90Dynamic range modification of images or parts thereof
    • G06T5/92Dynamic range modification of images or parts thereof based on global image properties
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/13Edge detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/40Analysis of texture
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/80Analysis of captured images to determine intrinsic or extrinsic camera parameters, i.e. camera calibration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/56Extraction of image or video features relating to colour

Definitions

  • the invention relates to the technical field of picture-based rendering and virtual viewpoint synthesis, in particular to a method for virtual roaming of indoor scenes based on picture-based rendering technology combined with reflection decomposition.
  • the purpose of virtual tour of indoor scene is to build a system, given the internal and external parameters of the virtual camera, and output the drawn picture of the virtual viewpoint.
  • Existing relatively mature virtual roaming applications are mainly based on a series of panoramic images, and a virtual roaming can be performed with pure rotation centered on each panoramic image. Most systems use simple interpolation to move between panoramic images, and the visual error is relatively large at this time. .
  • For virtual roaming with large degrees of freedom there are many existing methods to achieve object-level observation or observation of viewpoint movement in a part of the scene, including using a light field camera to explicitly obtain the light field around the target object, see Gortler, Steven J ., et al. "The lumigraph.” Proceedings of the 23rd annual conference on Computer graphics and interactive techniques.
  • the present invention provides an indoor scene virtual roaming method based on reflection decomposition, which can perform large-degree-of-freedom navigation in a large indoor scene with reflection effects under the condition of small storage requirements. Virtual tour.
  • a method for virtual roaming of indoor scenes based on reflection decomposition comprising the following steps:
  • S1 Take pictures enough to cover the scene in the target indoor scene, perform 3D reconstruction of the indoor scene based on the photographed pictures, and obtain a rough global triangular mesh model of the camera's internal and external parameters and the indoor scene;
  • S3 Detect the plane in the global triangular mesh model, and use the color consistency between adjacent images to detect whether the plane is a reflective plane. If so, construct a double-layer expression on the reflective area for each picture in which the reflective plane is visible. , used to correctly render the reflection effect on the surface of the object; the two-layer expression includes the foreground and background double-layer triangular meshes and two decomposed pictures of the foreground and background.
  • the foreground triangular mesh is used to express the surface geometry of the object, and the background triangular mesh is used It is used to express the mirror image of the scene geometry on the reflection plane, the foreground image is used to express the surface texture of the object after removing the reflection component, and the background image is used to express the reflection component of the scene on the surface of the object. Specifically, it includes the following sub-steps:
  • the judgment method is: for each pixel, after mirroring the global triangular mesh model according to the plane equation, find the mirrored image in the matching cost body. The cost corresponding to the depth value determines whether the cost position is a local minimum point. If the number of pixels in the local minimum point of the cost in the image is greater than the pixel number threshold, it is considered that the plane has a reflection component on the image. If a plane has a reflection component The number of visible pictures is greater than the threshold of the number of pictures, then the plane is considered to be a reflective plane;
  • each reflection plane calculate its two-dimensional reflection area ⁇ k on each visible picture, specifically: project the reflection plane on the visible picture to obtain a projection depth map, and perform a dilation operation on the projection depth map, Then compare the expanded projected depth map with the aligned depth map to obtain an accurate two-dimensional reflection area.
  • For each pixel with a depth value in the projected depth map use the three-dimensional point distance and the normal angle to filter, Taking the filtered pixel area as the reflection area ⁇ k of the reflection plane on the picture;
  • H is the Laplacian matrix
  • the function ⁇ -1 returns two-dimensional coordinates, and projects the point u in the image I k ' to the image I k according to the depth value and the camera's internal and external parameters; express The depth map obtained by projection; v represents vertices in;
  • ⁇ g is the weight of the prior item, which is used to constrain the second round of optimization
  • the depth edge of the depth map is aligned to the color edge of the original image, and the aligned depth map is obtained, specifically:
  • the neighborhood picture set is calculated according to the internal and external parameters of the virtual camera, the local coordinate system of the current virtual camera is divided into 8 quadrants according to the coordinate axis plane, and in each quadrant, a series of neighborhood pictures are further selected. , using the direction of the optical center of the picture and virtual camera optical center direction angle The distance ⁇ t k -t n ⁇ from the optical center t k of the picture and the optical center t n of the virtual camera, and each quadrant is divided into several regions again; after that, in each region, the smallest similarity d k is selected. 1 image is added to the neighborhood image collection, where ⁇ is the distance proportion weight;
  • t k and t n represent the three-dimensional coordinates of the optical center of the picture and the virtual camera
  • x represents the three-dimensional coordinates of the corresponding three-dimensional point of the pixel
  • each pixel is rendered to a series of triangular patches.
  • points are used to represent patches and pixels. The intersection of the determined rays, if the rendering cost of a point is greater than the minimum rendering cost of all points in the pixel + the range threshold ⁇ , then the point does not participate in the calculation of the depth map, so the depths of all the points involved in the calculation of the depth map are compared. Take the minimum value as the depth value of the pixel;
  • a super-resolution neural network is trained to compensate for the loss of sharpness caused by downsampling of the stored image, and at the same time reduce possible rendering errors, specifically: after each new virtual perspective is rendered to obtain a depth image and a color image, use the depth image.
  • the neural network reduces rendering errors and improves clarity; the network uses the current frame color image and depth image plus the previous frame color image and depth image as input; first, a three-layer convolutional network is used to analyze the current frame depth color image and previous frame respectively. The features are extracted from the depth color image of one frame. The next step is to map the features of the previous frame to the current frame. The initial correspondence is obtained by calculating the depth map.
  • the alignment module is used to further fit a local two-dimensional bias.
  • the features of the two frames before and after are further aligned, and the features of the two frames before and after the alignment are combined into the super-resolution module implemented by the U-Net convolutional neural network, and the high-definition picture of the current frame is output.
  • FIG. 1 is a flowchart of an indoor scene virtual roaming method based on reflection decomposition provided by an embodiment of the present invention
  • FIG. 2 is a schematic diagram of a global triangular mesh model provided by an embodiment of the present invention.
  • FIG. 3 is a schematic diagram of a construction result of a double-layer expression in a reflection area provided by an embodiment of the present invention
  • FIG. 4 is a schematic diagram of a rendering result of a virtual viewpoint with reflection provided by an embodiment of the present invention.
  • FIG. 5 is a comparison diagram of whether to use a super-resolution neural network according to an embodiment of the present invention.
  • FIG. 6 is a structural diagram of a super-resolution neural network provided by an embodiment of the present invention.
  • a method for virtual roaming of indoor scenes based on reflection decomposition includes the following steps:
  • the three-dimensional reconstruction software COLMAP or RealityCapture can be used to obtain the internal and external parameters of the camera and the global triangular mesh model.
  • convert the aligned depth map into a triangular mesh specifically: convert the depth value into three-dimensional coordinates, connect all horizontal and vertical edges and connect a hypotenuse, and encounter the depth described in the previous steps. Edges disconnect the corresponding edges to obtain a triangular mesh.
  • FIG. 3 is a schematic diagram of the construction result of the double-layer expression in the reflection area provided by the embodiment of the present invention.
  • the two-layer representation includes two decomposed pictures of foreground and background double-layer triangular meshes and foreground and background.
  • the image is used to express the surface texture of the object after removing the reflection component, and the background image is used to express the reflection component of the scene on the surface of the object.
  • the calculation of K nearest neighbors is obtained according to the order of the overlap rate of vertices in the global triangular mesh model after plane reflection, including the image I k itself, whose overlap rate must be the highest.
  • use Construct a matching cost volume see, Sinha, Sudipta N., et al.
  • each reflection plane calculate its two-dimensional reflection area ⁇ k on each visible picture, and specifically, project the reflection plane (with a three-dimensional boundary) onto the visible picture to obtain a projection depth map, and for the projection
  • the depth map is expanded (9x9 window can be used), and then the expanded projection depth map is compared with the depth map aligned in the previous step to obtain an accurate two-dimensional reflection area, for each depth value in the projection depth map.
  • Pixels are screened using the three-dimensional point distance and the normal angle (reservation of the three-dimensional point distance less than 0.03 meters and the normal angle less than 60 degrees), and the screened pixel area is used as the reflection area of the reflection plane on the picture.
  • ⁇ k use the plane equation of the plane to obtain the initial two-layer depth map, specifically, take the projected depth map as the initial foreground depth map, and mirror the camera internal and external parameters of the picture as a virtual camera according to the plane equation, and then Use the global triangular mesh model to render the initial background depth map in the virtual camera.
  • the near clipping plane of the rendering needs to be modified as the reflection plane, and then convert the initial foreground and background depth map to a simplified two-layer triangular mesh according to the method in step 2).
  • H is the Laplacian matrix
  • the function ⁇ -1 returns two-dimensional coordinates, and projects the point u in the image I k ' to the image I k according to the depth value and the camera's internal and external parameters; express The depth map obtained by projection; v represents vertex in .
  • the nonlinear conjugate gradient method is used for optimization, and the number of iterations is 30; after that, fixed optimization and The conjugate gradient method is also used, with 30 iterations; one round of optimization is alternated, and the entire optimization process is carried out for a total of two rounds of optimization, and after the first round of optimization, the consistency constraints of foreground images (surface colors) between multiple viewing angles are used.
  • the first round of optimization Perform denoising, specifically, it is known that after the first round of optimization and Use the following formula to obtain the denoised image and
  • weight ⁇ g is equal to 0.05, which is used to constrain the second round of optimization.
  • FIG. 4 is a schematic diagram of a rendering result of a virtual viewpoint with reflection provided by an embodiment of the present invention.
  • the goal of the online rendering process is to give the internal and external parameters of the virtual camera, and the output is a virtual picture corresponding to the virtual camera.
  • Calculate the neighborhood picture set according to the internal and external parameters of the virtual camera divide the local coordinate system of the current virtual camera into 8 quadrants according to the coordinate axis plane, and further select a series of neighborhood pictures in each quadrant, and use the picture optical center direction and virtual camera optical center direction angle
  • the distance ⁇ t k -t n ⁇ from the optical center t k of the picture and the optical center t n of the virtual camera, and each quadrant is divided into several regions again; preferably, it is divided into 9 regions, and the 9 regions are At [0°, 10°), [10°, 20°), [20°, ⁇ ), ⁇ t k -t n ⁇ at [0, 0.6), [0.6, 1.2), [1.2, 1.8), Arrangement and combination of each of the three intervals; after that, in each area, select the image with the smallest similar
  • the distance proportion weight ⁇ is equal to 0.1.
  • t k and t n represent the three-dimensional coordinates of the optical center of the picture and the virtual camera
  • x represents the three-dimensional coordinates of the corresponding three-dimensional point of the pixel
  • each pixel is rendered to a series of triangular patches, where "point" is used to represent the patch and The intersection of the rays determined by the pixel, if the rendering cost of a certain point is too large, greater than the minimum rendering cost of all points in the pixel + the range threshold ⁇ , in this embodiment ⁇ is 0.17, then the point does not participate in the calculation of the depth map, so Compare the depths of all points participating in the calculation of the depth map and take the minimum value as the depth value of the pixel.
  • the reflection area ⁇ k in the neighborhood picture is also drawn to the current virtual viewpoint, and the reflection area ⁇ n of the current virtual viewpoint is obtained.
  • the depth map calculation and color mixing are performed on the images of the two layers respectively. and It is obtained by decomposition after inverse gamma correction.
  • the mixed images of the two layers are added, and then a gamma correction is required to obtain the correct image with reflection effect.
  • a deep neural network is used to reduce rendering errors and improve clarity.
  • the network uses the color picture and depth picture of the current frame plus the color picture and depth picture of the previous frame as input.
  • the purpose of using the two frames before and after is to add more effective information and improve timing stability; first, a three-layer image is used.
  • the convolutional network extracts features from the depth color image of the current frame and the depth color image of the previous frame, respectively.
  • the next step is to warp the features of the previous frame to the current frame.
  • the initial correspondence is calculated from the depth map. Since the depth map It is not completely accurate.
  • An alignment module (implemented by a convolutional neural network with three layers of convolutional layers) is used to further fit a local two-dimensional offset to further align the features of the two frames before and after, and merge the features of the two frames before and after the alignment.
  • (concat) Input the super-resolution module (implemented through the U-Net convolutional neural network), and output the high-definition picture of the current frame.
  • a computer device including a memory and a processor, wherein computer-readable instructions are stored in the memory, and when the computer-readable instructions are executed by the processor, the processor is caused to perform the reflection-based decomposition in the foregoing embodiments.
  • the steps in the indoor scene virtual walkthrough method including a processor, wherein computer-readable instructions are stored in the memory, and when the computer-readable instructions are executed by the processor, the processor is caused to perform the reflection-based decomposition in the foregoing embodiments.
  • a storage medium storing computer-readable instructions.
  • the one or more processors can execute the reflection-based decomposition in the above-mentioned embodiments.
  • the storage medium may be a non-volatile storage medium.
  • ROM Read Only Memory
  • RAM Random Access Memory
  • magnetic disk or CD etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Graphics (AREA)
  • Software Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Geometry (AREA)
  • Multimedia (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Image Generation (AREA)

Abstract

本发明公开了一种基于反射分解的室内场景虚拟漫游方法,首先利用三维重建获得粗略全局三角网格模型投影作为初始深度图,将深度边缘对齐到彩色边缘,将对齐深度图转换为简化的三角网格;检测全局三角网格模型中的平面,如果某平面为反射平面,则对每张可见该反射平面的图片在反射区域上构建双层表达,用于正确渲染物体表面的反射效果;给定虚拟视角,利用邻域图片及三角网格绘制虚拟视角图片,对于反射区域,利用前景背景图片和前景背景三角网格进行绘制。本发明方法可以在较小的存储需求的情况下,在较大的含有反射效果的室内场景中进行大自由度的虚拟漫游。本发明渲染效果好,漫游自由度较大,可绘制部分反射、高光等效果,结果鲁棒。

Description

基于反射分解的室内场景虚拟漫游方法 技术领域
本发明涉及基于图片的渲染和虚拟视点合成技术领域,特别涉及一种基于图片的渲染技术结合反射分解进行室内场景虚拟漫游的方法。
背景技术
室内场景虚拟漫游的目的是构建一个系统,给定虚拟相机的内外参,输出虚拟视点的绘制图片。现有比较成熟的虚拟漫游应用主要基于一系列的全景图片,以每个全景图片为中心可以进行纯旋转的虚拟漫游,全景图片间的移动多数系统利用简单的插值进行,这时候视觉误差比较大。对于大自由度的虚拟漫游,现有很多方法做到了物体级别的观察或者对场景中的一部分进行视点移动观察,包括利用光场相机显式的获取目标物体周围的光场,参见Gortler,Steven J.,et al.“The lumigraph.”Proceedings of the 23rd annual conference on Computer graphics and interactive techniques.1996,或者使用普通相机的拍摄图片利用神经网络进行场景的表达和插值,参见Mildenhall,Ben,et al.“Nerf:Representing scenes as neural radiance fields for view synthesis.”Proceedings of the European Conference on Computer Vision.2020。对于较大的室内场景,目前最新的方法可以做到相对自由视点的渲染,但是渲染效果还是不够好,参见Riegler and Koltun.“Free View Synthesis.”Proceedings of the European Conference on Computer Vision.2020。特别的,对于较大的室内场景中存在的各种反射类型(地面、桌子、镜面等),目前仍然没有一个系统可以较好处理具有这种复杂材质的室内漫游。
发明内容
本发明针对现有技术的不足,提供了一种基于反射分解的室内场景虚拟漫游方法,可以在较小的存储需求的情况下,在较大的含有反射效果的室内场景中进行大自由度的虚拟漫游。
为了达到上述目的,本发明采用以下技术方案:一种基于反射分解的室内场景虚拟漫游方法,包括以下步骤:
S1:在目标室内场景拍摄足够覆盖场景的图片,基于拍摄图片对室内场景进行三维重建,获取相机内外参及室内场景的粗略全局三角网格模型;
S2:对于每张图片,将全局三角网格模型投影为对应的深度图,将深度边缘对齐到彩色边缘,将对齐后的深度图转换为三角网格,对三角网格进行网格简化;
S3:检测全局三角网格模型中的平面,利用相邻图像间的颜色一致性,检测该平面是否为反射平面,如果是则对每张可见该反射平面的图片在反射区域上构建双层表达,用于正确 渲染物体表面的反射效果;所述双层表达包含前景背景双层三角网格及前景背景两张分解后的图片,前景三角网格用于表达物体表面几何,背景三角网格用于表达场景几何在反射平面的镜像,前景图片用于表达去除反射分量后的物体表面纹理,背景图片用于表达场景在物体表面的反射分量;具体包括以下子步骤:
S31:检测全局三角网格模型中的平面,保留面积大于面积阈值的平面,将平面投影到可见的图片上面,将可见该平面的图片集合记为
Figure PCTCN2021088788-appb-000001
对于
Figure PCTCN2021088788-appb-000002
中的每一张图片I k,计算其K近邻图片集合
Figure PCTCN2021088788-appb-000003
K近邻的计算是按照平面反射之后的全局三角网格模型中顶点的重叠率排序获得的;
利用
Figure PCTCN2021088788-appb-000004
构建匹配代价体,判断该平面在图片I k上是否具有足够的反射分量,判断方法为:对于每一个像素,将全局三角网格模型依照平面方程镜像之后,在匹配代价体中寻找镜像后的深度值对应的代价,判断代价位置是否是一个局部最小点,如果该图片中代价局部最小点像素数量大于像素数量阈值,则认为该平面在该图片上具有反射分量,如果某个平面具有反射分量的可见图片数量大于图片数量阈值,则认为该平面为反射平面;
S32:对于每个反射平面,计算其在每一幅可见图片上的二维反射区域β k,具体为:将反射平面投影到可见的图片上获得投影深度图,对投影深度图进行膨胀操作,然后将膨胀后的投影深度图和对齐的深度图做比较,获取精确的二维反射区域,对于投影深度图中的每个有深度值的像素,利用三维点距离和法线夹角进行筛选,将筛选后的像素区域作为该反射平面在该图片上的反射区域β k
S33:对每张可见该反射平面的图片在反射区域上构建双层表达,具体为:将投影深度图作为初始前景深度图,将该图片的相机内外参依照该平面方程镜像为虚拟相机,然后在虚拟相机中利用全局三角网格模型渲染初始背景深度图,将初始前景背景深度图转换为简化的两层三角网格
Figure PCTCN2021088788-appb-000005
Figure PCTCN2021088788-appb-000006
利用迭代优化算法,计算两层前景背景图片
Figure PCTCN2021088788-appb-000007
Figure PCTCN2021088788-appb-000008
并且进一步优化
Figure PCTCN2021088788-appb-000009
Figure PCTCN2021088788-appb-000010
在优化前,所有相关的原始图片均预先进行反伽马矫正,用于后续分解;
优化的目标是最小化如下能量函数:
Figure PCTCN2021088788-appb-000011
其中优化目标中
Figure PCTCN2021088788-appb-000012
代表反射层三角网格的刚体变换,其初始值分别为单位矩阵和0,
Figure PCTCN2021088788-appb-000013
Figure PCTCN2021088788-appb-000014
只优化网格顶点三维位置,不改变拓扑结构,E d、E s、E p分别为数据项、平滑项、先验项,λ s、λ p为各项的权重,u表示
Figure PCTCN2021088788-appb-000015
中的像素;具体的:
Figure PCTCN2021088788-appb-000016
Figure PCTCN2021088788-appb-000017
Figure PCTCN2021088788-appb-000018
Figure PCTCN2021088788-appb-000019
其中H是拉普拉斯矩阵;函数ω -1返回二维坐标,将图像I k′中的点u按照深度值和相机内外参投影到图像I k
Figure PCTCN2021088788-appb-000020
表示
Figure PCTCN2021088788-appb-000021
投影得到的深度图;v表示
Figure PCTCN2021088788-appb-000022
中的顶点;
为了最小化上述能量函数,使用交替优化方案,对于每一轮优化,首先固定
Figure PCTCN2021088788-appb-000023
Figure PCTCN2021088788-appb-000024
优化
Figure PCTCN2021088788-appb-000025
其中
Figure PCTCN2021088788-appb-000026
的初始值用如下公式计算:
Figure PCTCN2021088788-appb-000027
Figure PCTCN2021088788-appb-000028
给定初始值,使用非线性共轭梯度法进行优化;之后,固定
Figure PCTCN2021088788-appb-000029
优化
Figure PCTCN2021088788-appb-000030
Figure PCTCN2021088788-appb-000031
同样使用共轭梯度法;一次交替为一轮优化,整个优化过程总共进行两轮优化,并且在第一轮优化之后,利用多个视角间前景图片的一致性约束对于第一轮优化后的
Figure PCTCN2021088788-appb-000032
进行去噪,具体的,已知第一轮优化后的
Figure PCTCN2021088788-appb-000033
Figure PCTCN2021088788-appb-000034
利用如下公式获取去噪后的图像
Figure PCTCN2021088788-appb-000035
Figure PCTCN2021088788-appb-000036
Figure PCTCN2021088788-appb-000037
Figure PCTCN2021088788-appb-000038
使用
Figure PCTCN2021088788-appb-000039
Figure PCTCN2021088788-appb-000040
作为
Figure PCTCN2021088788-appb-000041
的初始值继续第二轮的优化,进一步地,在第二轮优化中添加一个先验项到总的能量方程:
Figure PCTCN2021088788-appb-000042
其中λ g为先验项权重,用于约束第二轮优化;
在两轮优化之后,利用
Figure PCTCN2021088788-appb-000043
变换
Figure PCTCN2021088788-appb-000044
获取最终的两层简化的三角网格
Figure PCTCN2021088788-appb-000045
和分解之后的
Figure PCTCN2021088788-appb-000046
用于正确渲染物体表面的反射效果;
S4:给定虚拟视角,利用邻域图片及三角网格绘制虚拟视角图片,对于反射区域,利用前景背景图片和前景背景三角网格进行绘制,具体为:将邻域图片中的反射区域β k绘制到当前虚拟视点,获取当前虚拟视点的反射区域β n,对于反射区域内的像素,需要利用前景背景两层的图片和简化的三角网格进行绘制,分别对两层的图像进行深度图的计算和颜色混合,由于两层图片
Figure PCTCN2021088788-appb-000047
Figure PCTCN2021088788-appb-000048
是进行过反伽马矫正之后分解获得的,在渲染阶段,对两层混合的图片相加,做一次伽马矫正获得正确的带反射效果的图片。
进一步地,所述S2中,将深度图的深度边缘对齐到原图片的彩色边缘,获取对齐的深度图,具体为:
首先计算深度图对应的法线图,然后对于深度图中的每个像素i,将其深度值d i按照相机内参转换为局部坐标系的三维点v i,计算相邻点i,j之间的平面距离dt ij=max(|(v i-v j)·n i|,|(v i-v j)·n j|),n i,n j分别为点i,j的法线向量,如果dt ij大于λ*max(1,min(d i,d j)),则将该像素记为深度边缘像素,其中λ为边缘检测阈值;
对于每一幅图片,在获取所有深度边缘像素之后,利用索贝尔卷积计算深度边缘的局部二维梯度,然后以每一个深度边缘像素为起点,沿着边缘二维梯度方向及其反方向同时逐个像素遍历,直到两边的其中一边遍历到彩色边缘像素;在遍历到彩色边缘像素之后,删除起点像素到该彩色边缘像素中间路径的所有像素的深度值;将删除深度值的像素定义为未对齐像素,将未删除深度值的像素定义对齐像素,对于每个被删除的深度值,利用周围未删除的深度值进行插值填充。
进一步地,对于每个被删除的深度值,利用周围未删除的深度值进行插值填充,具体为:对于每个待插值的未对齐像素p i,计算其到所有其他对齐像素的测地线距离d g(p i,p j),利用测地线距离找到m个最近的对齐像素,计算插值的深度值
Figure PCTCN2021088788-appb-000049
其中
Figure PCTCN2021088788-appb-000050
表示像素p i的最近邻对齐像素集合,w g(i,j)=exp(-d g(p i,p j)),
Figure PCTCN2021088788-appb-000051
表示将像素p i投影到p j的局部平面方程上,该局部平面方程由v j和n j计算获得。
进一步地,所述S4中,根据虚拟相机内外参计算邻域图片集合,将当前虚拟相机的局部坐标系按照坐标轴平面分割为8个象限,在每个象限中,进一步选取一系列邻域图片,利用图片光心方向
Figure PCTCN2021088788-appb-000052
和虚拟相机光心方向
Figure PCTCN2021088788-appb-000053
的夹角
Figure PCTCN2021088788-appb-000054
和图片光心t k和虚拟相机光心t n的距离‖t k-t n‖,将每个象限再一次分割为若干区域;之后,在每个区域内,选出相似性d k最小的1张图片加入到邻域图片集合中,
Figure PCTCN2021088788-appb-000055
其中λ为距离占比权重;
在获得邻域图片集合之后,将邻域图片集合中的每一幅图片按照其对应的简化后的三角网格绘制到虚拟视点,具体为:
a)计算一幅鲁棒深度图,对于面片着色器的每一个像素,计算其渲染代价c(t k,t n,x):
c(t k,t n,x)=∠(t k-x,t n-x)*π/180+max(0,1-‖t n-x‖/‖t k-x‖)
其中t k和t n代表图片和虚拟相机的光心三维坐标,x表示该像素对应三维点的三维坐标,每个像素有一系列的三角面片渲染到,这里用“点”表示面片和像素所决定光线的交点,如果某个点的渲染代价大于该像素中所有点的最小渲染代价+范围阈值λ,那么该点不参与计算深 度图,如此将所有参与计算深度图的点的深度进行比较取最小值,作为该像素的深度值;
b)计算完虚拟相机的深度图,将图片作为纹理贴图加入三角网格进行绘制,对于每个虚拟相机图片的像素,将深度图附近的点的颜色按照设定权重w k进行混合,获取最终的渲染颜色。
进一步地,所述S4中,为了减小存储规模,所有图片降采样到1/n存储,n≥1,渲染的时候虚拟窗口设置为原图大小。
进一步地,训练一个超分辨率神经网络补偿存储图片降采样造成的清晰度损失,同时减少可能存在的绘制错误,具体为:在每一个新的虚拟视角渲染获得深度图片和彩色图片之后,利用深度神经网络减少渲染错误并且提升清晰度;网络利用当前帧彩色图片和深度图片加上前一帧彩色图片和深度图片作为输入;首先利用一个三层卷积网络分别对当前帧的深度彩色图片和前一帧的深度彩色图片提取特征,下一步将前一帧特征扭曲映射到当前帧,初始的对应由深度图计算获得,由于深度图并不完全准确,利用对齐模块进一步拟合一个局部二维偏移将前后两帧特征进行进一步对齐,将对齐后的前后两帧特征合并输入通过U-Net卷积神经网络实现的超分辨模块,输出当前帧的高清图片。
本发明的有益效果在于:
1、构建一套完整的流程可以处理大数量的拍摄数据,对较大规模的室内场景实现较大自由度的虚拟视点漫游;
2、检测室内场景中的反射面和图片中的反射区域,对反射区域构建双层表达,使得在室内场景漫游过程中可以较好的渲染出反射效果,极大提高渲染真实感;
3、通过后接一个专用的超分辨率神经网络,在减少渲染错误的同时,减少支持单个场景漫游需要的图片分辨率,从而减少存储和内存消耗。
附图说明
图1为本发明实施例提供的基于反射分解的室内场景虚拟漫游方法流程图;
图2为本发明实施例提供的全局三角网格模型示意图;
图3为本发明实施例提供的反射区域双层表达构建结果示意图;
图4为本发明实施例提供的带反射虚拟视点绘制结果示意图;
图5为本发明实施例提供的是否使用超分辨率神经网络结果对比图;
图6为本发明实施例提供的超分辨率神经网络结构图。
具体实施方式
下面结合附图和具体实施例对本发明作进一步详细说明,应当理解,此处描述的具体实施方式仅仅用以解释本发明,并不用于限定本发明。
如图1所示,本发明实施例提供的一种基于反射分解的室内场景虚拟漫游方法,该方法包括以下步骤:
(1)在目标室内场景拍摄足够覆盖场景的图片,基于拍摄图片对室内场景进行三维重建,如图2所示,获取相机内外参及室内场景的粗略全局三角网格模型。
具体的,可以采用三维重建软件COLMAP或者RealityCapture获取相机内外参及全局三角网格模型。
(2)对于每张图片,将全局三角网格模型投影为对应的深度图,将深度边缘对齐到彩色边缘,将对齐后的深度图转换为三角网格,对三角网格进行网格简化。
具体的,由于全局三角网格模型包含一些错误,将投影出来的深度图的深度边缘对齐到原图片的彩色边缘,获取对齐的深度图,该步骤具体为:
首先计算深度图对应的法线图,然后对于深度图中的每个像素i,将其深度值d i按照相机内参转换为局部坐标系的三维点v i,计算相邻点i,j之间的平面距离dt ij=max(|(v i-v j)·n i|,|(v i-v j)·n j|),n i,n j分别为点i,j的法线向量,如果dt ij大于λ*max(1,min(d i,d j)),则将该像素记为深度边缘像素,其中λ为边缘检测阈值,本实施例中取λ=0.01。
对于每一幅图片,在获取所有深度边缘像素之后,利用索贝尔(Sobel)卷积计算深度边缘的局部二维梯度,然后以每一个深度边缘像素为起点,沿着边缘二维梯度方向及其反方向同时逐个像素遍历,直到两边的其中一边遍历到彩色边缘像素,其中彩色边缘像素由坎尼(Canny)边缘提取算法获得;在遍历到彩色边缘像素之后,删除起点像素到该彩色边缘像素中间路径的所有像素的深度值;将删除深度值的像素定义为未对齐像素,将未删除深度值的像素定义对齐像素,对于每个被删除的深度值,利用周围未删除的深度值进行插值填充,具体的,对于每个待插值的未对齐像素p i,计算其到所有其他对齐像素的测地线距离d g(p i,p j),参见Revaud,Jerome,et al.“Epicflow:Edge-preserving interpolation of correspondences for optical flow.”Proceedings of the IEEE conference on computer vision and pattern recognition.2015,利用测地线距离找到m个(本实施例中取m=4)最近的对齐像素,计算插值的深度值
Figure PCTCN2021088788-appb-000056
其中
Figure PCTCN2021088788-appb-000057
表示像素p i的最近邻对齐像素集合,w g(i,j)=exp(-d g(p i,p j)),
Figure PCTCN2021088788-appb-000058
表示将像素p i投影到p j的局部平面方程上,该局部平面方程由v j和n j计算获得。
具体的,深度图对齐后,将对齐后的深度图转换为三角网格,具体为:将深度值转换为三维坐标,将所有横竖边连接并且连接一条斜边,遇到前面步骤所述的深度边缘则断开相应的边,获取三角网格。
具体的,调用网格简化算法对生成的三角网格进行网格简化,参见Garland,Michael,and Paul S.Heckbert.“Surface simplification using quadric error metrics.”Proceedings of the 24th annual conference on Computer graphics and interactive techniques.1997。
(3)检测全局三角网格模型中的平面,利用相邻图像间的颜色一致性,检测该平面是否为反射平面,如果是则对每张可见该反射平面的图片在反射区域上构建双层表达,用于正确渲染物体表面的反射效果。图3为本发明实施例提供的反射区域双层表达构建结果示意图。
所述双层表达包含前景背景双层三角网格及前景背景两张分解后的图片,前景三角网格用于表达物体表面几何,背景三角网格用于表达场景几何在反射平面的镜像,前景图片用于表达去除反射分量后的物体表面纹理,背景图片用于表达场景在物体表面的反射分量。
具体的,首先检测全局三角网格模型中的平面,保留面积大于面积阈值的平面(本实施例中面积阈值取0.09m 2),将平面投影到可见的图片上面,将可见该平面的图片集合记为
Figure PCTCN2021088788-appb-000059
对于
Figure PCTCN2021088788-appb-000060
中的每一张图片I k,计算其K近邻(本实施例中取K=6)图片集合
Figure PCTCN2021088788-appb-000061
这里K近邻的计算是按照平面反射之后的全局三角网格模型中顶点的重叠率排序获得的,包含图片I k自身,其重叠率一定是最高的。之后,利用
Figure PCTCN2021088788-appb-000062
构建匹配代价体(cost volume),参见,Sinha,Sudipta N.,et al.“Image-based rendering for scenes with reflections.”ACM Transactions on Graphics(TOG)31.4(2012):1-10,判断该平面在图片I k上是否具有足够的反射分量,具体的,对于每一个像素,将全局三角网格模型依照平面方程镜像之后,在匹配代价体中寻找镜像后的深度值对应的代价,判断代价位置是否是一个局部最小点,如果该图片中代价局部最小点像素数量大于像素数量阈值(本实施例中取50),则认为该平面在该图片上具有反射分量,如果某个平面具有反射分量的可见图片数量大于图片数量阈值(本实施例中取5),则认为该平面为反射平面。
具体的,对于每个反射平面,计算其在每一幅可见图片上的二维反射区域β k,具体的,将反射平面(具有三维边界)投影到可见的图片上获得投影深度图,对投影深度图进行膨胀操作(可以采用9x9窗口),然后将膨胀后的投影深度图和前面步骤对齐的深度图做比较,获取精确的二维反射区域,对于投影深度图中的每个有深度值的像素,利用三维点距离和法线夹角进行筛选(将三维点距离小于0.03米并且法线夹角小于60度的保留),将筛选后的像素区域作为该反射平面在该图片上的反射区域β k;同时,利用该平面的平面方程,获取初始的两层深度图,具体的,将投影深度图作为初始前景深度图,将该图片的相机内外参依照该平面方程镜像为虚拟相机,然后在虚拟相机中利用全局三角网格模型渲染初始背景深度图,注意需要修改渲染的近剪裁平面为反射平面,然后按照步骤2)的方法将初始前景背景深度图转换为简化的两层三角网格
Figure PCTCN2021088788-appb-000063
Figure PCTCN2021088788-appb-000064
下一步,利用迭代优化算法,计算两层前景背景图片
Figure PCTCN2021088788-appb-000065
Figure PCTCN2021088788-appb-000066
并且进一步优化
Figure PCTCN2021088788-appb-000067
Figure PCTCN2021088788-appb-000068
在优化前,所有相关的原始图片均预先进行反伽马矫正(gamma correction),用于后续分解。
优化的目标是最小化如下能量函数:
Figure PCTCN2021088788-appb-000069
其中优化目标中
Figure PCTCN2021088788-appb-000070
代表反射层三角网格的刚体变换,其初始值分别为单位矩阵和0,
Figure PCTCN2021088788-appb-000071
Figure PCTCN2021088788-appb-000072
只优化网格顶点三维位置,不改变拓扑结构,E d、E s、E p分别为数据项、平滑项、先验项,λ s、λ p为各项的权重,分别为0.04和0.01,u表示
Figure PCTCN2021088788-appb-000073
中的像素;具体的:
Figure PCTCN2021088788-appb-000074
Figure PCTCN2021088788-appb-000075
Figure PCTCN2021088788-appb-000076
Figure PCTCN2021088788-appb-000077
其中H是拉普拉斯矩阵;函数ω -1返回二维坐标,将图像I k′中的点u按照深度值和相机内外参投影到图像I k
Figure PCTCN2021088788-appb-000078
表示
Figure PCTCN2021088788-appb-000079
投影得到的深度图;v表示
Figure PCTCN2021088788-appb-000080
中的顶点。
为了最小化上述能量函数,使用交替优化方案,对于每一轮优化,首先固定
Figure PCTCN2021088788-appb-000081
Figure PCTCN2021088788-appb-000082
优化
Figure PCTCN2021088788-appb-000083
其中
Figure PCTCN2021088788-appb-000084
的初始值用如下公式计算:
Figure PCTCN2021088788-appb-000085
Figure PCTCN2021088788-appb-000086
给定初始值,使用非线性共轭梯度法进行优化,迭代次数为30次;之后,固定
Figure PCTCN2021088788-appb-000087
优化
Figure PCTCN2021088788-appb-000088
Figure PCTCN2021088788-appb-000089
同样使用共轭梯度法,迭代30次;一次交替为一轮优化,整个优化过程总共进行两轮优化,并且在第一轮优化之后,利用多个视角间前景图片(表面颜色)的一致性约束对于第一轮优化后的
Figure PCTCN2021088788-appb-000090
进行去噪,具体的,已知第一轮优化后的
Figure PCTCN2021088788-appb-000091
Figure PCTCN2021088788-appb-000092
利用如下公式获取去噪后的图像
Figure PCTCN2021088788-appb-000093
Figure PCTCN2021088788-appb-000094
Figure PCTCN2021088788-appb-000095
Figure PCTCN2021088788-appb-000096
使用
Figure PCTCN2021088788-appb-000097
Figure PCTCN2021088788-appb-000098
作为
Figure PCTCN2021088788-appb-000099
的初始值继续第二轮的优化,进一步地,在第二轮优化中添加一个先验项到总的能量方程:
Figure PCTCN2021088788-appb-000100
其中先验项权重λ g等于0.05,用于约束第二轮优化。
在两轮优化之后,利用
Figure PCTCN2021088788-appb-000101
变换
Figure PCTCN2021088788-appb-000102
获取最终的两层简化的三角网格
Figure PCTCN2021088788-appb-000103
和分解之后的
Figure PCTCN2021088788-appb-000104
用于正确渲染物体表面的反射效果。
(4)给定虚拟视角,利用邻域图片及三角网格绘制虚拟视角图片,对于反射区域,利用前景背景图片和前景背景三角网格进行绘制。图4为本发明实施例提供的带反射虚拟视点绘制结果示意图。
具体的,在线渲染流程的目标是给定虚拟相机的内外参,输出是对应该虚拟相机的虚拟图片。具体的:根据虚拟相机内外参计算邻域图片集合,将当前虚拟相机的局部坐标系按照坐标轴平面分割为8个象限,在每个象限中,进一步选取一系列邻域图片,利用图片光心方向
Figure PCTCN2021088788-appb-000105
和虚拟相机光心方向
Figure PCTCN2021088788-appb-000106
的夹角
Figure PCTCN2021088788-appb-000107
和图片光心t k和虚拟相机光心t n的距离‖t k-t n‖,将每个象限再一次分割为若干区域;优选的,分割为9个区域,9个区域为
Figure PCTCN2021088788-appb-000108
在[0°,10°)、[10°,20°)、[20°,∞),‖t k-t n‖在[0,0.6)、[0.6,1.2)、[1.2,1.8),各自三个区间的排列组合;之后,在每个区域内,选出相似性d k最小的1张图片加入到邻域图片集合中:
Figure PCTCN2021088788-appb-000109
其中,距离占比权重λ等于0.1。
在获得邻域图片集合之后,将邻域图片集合中的每一幅图片按照其对应的简化后的三角网格绘制到虚拟视点,具体为:
a)计算一幅鲁棒深度图,对于面片着色器的每一个像素,计算其渲染代价c(t k,t n,x):
c(t k,t n,x)=∠(t k-x,t n-x)*π/180+max(0,1-‖t n-x‖/‖t k-x‖)
其中,t k和t n代表图片和虚拟相机的光心三维坐标,x表示该像素对应三维点的三维坐标,每个像素有一系列的三角面片渲染到,这里用“点”表示面片和像素所决定光线的交点,如果某个点的渲染代价过大,大于该像素中所有点的最小渲染代价+范围阈值λ,本实施例中λ取0.17,那么该点不参与计算深度图,如此将所有参与计算深度图的点的深度进行比较取最小值,作为该像素的深度值。
b)计算完虚拟相机的深度图,将图片作为纹理贴图加入三角网格进行绘制,对于每个虚拟相机图片的像素,将深度图附近(距离小于3cm)的点的颜色按照设定权重w k(w k=exp(-d k/0.033))进行混合(blending),获取最终的渲染颜色。
具体的,将邻域图片中的反射区域β k也绘制到当前虚拟视点,获取当前虚拟视点的反射区域β n,对于反射区域内的像素,需要利用前景背景两层的图片和简化的三角网格进行绘制, 按照上述步骤分别对两层的图像进行深度图的计算和颜色混合,由于两层图片
Figure PCTCN2021088788-appb-000110
Figure PCTCN2021088788-appb-000111
是进行过反伽马矫正之后分解获得的,在渲染阶段,对两层混合的图片相加,然后需要做一次伽马矫正获得正确的带反射效果的图片。
具体的,上述渲染步骤中,为了减小存储,所有图片降采样到1/n存储(n≥1,本实施例中n取4),渲染的时候虚拟窗口设置为原图大小,如此渲染的虚拟视点图片分辨率不变但是会较为模糊,在下一步利用超分辨率神经网络提升清晰度。
(5)训练一个超分辨率神经网络补偿存储图片降采样造成的清晰度损失,同时减少可能存在的绘制错误;图5为本发明实施例提供的是否使用超分辨率神经网络结果对比图,图6为本发明实施例提供的超分辨率神经网络结构图。
具体的,在每一个新的虚拟视角渲染获得深度图片和彩色图片之后,利用深度神经网络减少渲染错误并且提升清晰度。具体的,网络利用当前帧彩色图片和深度图片加上前一帧彩色图片和深度图片作为输入,使用前后两帧图片的目的是加入更多的有效信息并且提升时序稳定性;首先利用一个三层卷积网络分别对当前帧的深度彩色图片和前一帧的深度彩色图片提取特征,下一步将前一帧特征扭曲映射(warp)到当前帧,初始的对应由深度图计算获得,由于深度图并不完全准确,利用一个对齐模块(通过三层卷积层的卷积神经网络实现)进一步拟合一个局部二维偏移将前后两帧特征进行进一步对齐,将对齐后的前后两帧特征合并(concat)输入超分辨模块(通过U-Net卷积神经网络实现),输出当前帧的高清图片。
在一个实施例中,提出了一种计算机设备,包括存储器和处理器,存储器中存储有计算机可读指令,计算机可读指令被处理器执行时,使得处理器执行上述各实施例中基于反射分解的室内场景虚拟漫游方法中的步骤。
在一个实施例中,提出了一种存储有计算机可读指令的存储介质,计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器执行上述各实施例中基于反射分解的室内场景虚拟漫游方法中的步骤。其中,存储介质可以为非易失性存储介质。
本领域普通技术人员可以理解上述实施例的各种方法中的全部或部分步骤是可以通过程序来指令相关的硬件来完成,该程序可以存储于一计算机可读存储介质中,存储介质可以包括:只读存储器(ROM,Read Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁盘或光盘等。
以上所述仅为本说明书一个或多个实施例的较佳实施例而已,并不用以限制本说明书一个或多个实施例,凡在本说明书一个或多个实施例的精神和原则之内,所做的任何修改、等同替换、改进等,均应包含在本说明书一个或多个实施例保护的范围之内。

Claims (6)

  1. 一种基于反射分解的室内场景虚拟漫游方法,其特征在于,包括以下步骤:
    S1:在目标室内场景拍摄足够覆盖场景的图片,基于拍摄图片对室内场景进行三维重建,获取相机内外参及室内场景的粗略全局三角网格模型;
    S2:对于每张图片,将全局三角网格模型投影为对应的深度图,将深度边缘对齐到彩色边缘,将对齐后的深度图转换为三角网格,对三角网格进行网格简化;
    S3:检测全局三角网格模型中的平面,利用相邻图像间的颜色一致性,检测该平面是否为反射平面,如果是则对每张可见该反射平面的图片在反射区域上构建双层表达,用于正确渲染物体表面的反射效果;所述双层表达包含前景背景双层三角网格及前景背景两张分解后的图片,前景三角网格用于表达物体表面几何,背景三角网格用于表达场景几何在反射平面的镜像,前景图片用于表达去除反射分量后的物体表面纹理,背景图片用于表达场景在物体表面的反射分量;具体包括以下子步骤:
    S31:检测全局三角网格模型中的平面,保留面积大于面积阈值的平面,将平面投影到可见的图片上面,将可见该平面的图片集合记为
    Figure PCTCN2021088788-appb-100001
    对于
    Figure PCTCN2021088788-appb-100002
    中的每一张图片I k,计算其K近邻图片集合
    Figure PCTCN2021088788-appb-100003
    K近邻的计算是按照平面反射之后的全局三角网格模型中顶点的重叠率排序获得的;
    利用
    Figure PCTCN2021088788-appb-100004
    构建匹配代价体,判断该平面在图片I k上是否具有足够的反射分量,判断方法为:对于每一个像素,将全局三角网格模型依照平面方程镜像之后,在匹配代价体中寻找镜像后的深度值对应的代价,判断代价位置是否是一个局部最小点,如果该图片中代价局部最小点像素数量大于像素数量阈值,则认为该平面在该图片上具有反射分量,如果某个平面具有反射分量的可见图片数量大于图片数量阈值,则认为该平面为反射平面;
    S32:对于每个反射平面,计算其在每一幅可见图片上的二维反射区域β k,具体为:将反射平面投影到可见的图片上获得投影深度图,对投影深度图进行膨胀操作,然后将膨胀后的投影深度图和对齐的深度图做比较,获取精确的二维反射区域,对于投影深度图中的每个有深度值的像素,利用三维点距离和法线夹角进行筛选,将筛选后的像素区域作为该反射平面在该图片上的反射区域β k
    S33:对每张可见该反射平面的图片在反射区域上构建双层表达,具体为:将投影深度图作为初始前景深度图,将该图片的相机内外参依照该平面方程镜像为虚拟相机,然后在虚拟相机中利用全局三角网格模型渲染初始背景深度图,将初始前景背景深度图转换为简化的两 层三角网格
    Figure PCTCN2021088788-appb-100005
    Figure PCTCN2021088788-appb-100006
    利用迭代优化算法,计算两层前景背景图片
    Figure PCTCN2021088788-appb-100007
    Figure PCTCN2021088788-appb-100008
    并且进一步优化
    Figure PCTCN2021088788-appb-100009
    Figure PCTCN2021088788-appb-100010
    在优化前,所有相关的原始图片均预先进行反伽马矫正,用于后续分解;
    优化的目标是最小化如下能量函数:
    Figure PCTCN2021088788-appb-100011
    其中优化目标中
    Figure PCTCN2021088788-appb-100012
    代表反射层三角网格的刚体变换,其初始值分别为单位矩阵和0,
    Figure PCTCN2021088788-appb-100013
    Figure PCTCN2021088788-appb-100014
    只优化网格顶点三维位置,不改变拓扑结构,E d、E s、E p分别为数据项、平滑项、先验项,λ s、λ p为各项的权重,u表示
    Figure PCTCN2021088788-appb-100015
    中的像素;具体的:
    Figure PCTCN2021088788-appb-100016
    Figure PCTCN2021088788-appb-100017
    Figure PCTCN2021088788-appb-100018
    Figure PCTCN2021088788-appb-100019
    其中H是拉普拉斯矩阵;函数ω -1返回二维坐标,将图像I k′中的点u按照深度值和相机内外参投影到图像I k
    Figure PCTCN2021088788-appb-100020
    表示
    Figure PCTCN2021088788-appb-100021
    投影得到的深度图;v表示
    Figure PCTCN2021088788-appb-100022
    中的顶点;
    为了最小化上述能量函数,使用交替优化方案,对于每一轮优化,首先固定
    Figure PCTCN2021088788-appb-100023
    Figure PCTCN2021088788-appb-100024
    优化
    Figure PCTCN2021088788-appb-100025
    其中
    Figure PCTCN2021088788-appb-100026
    的初始值用如下公式计算:
    Figure PCTCN2021088788-appb-100027
    Figure PCTCN2021088788-appb-100028
    给定初始值,使用非线性共轭梯度法进行优化;之后,固定
    Figure PCTCN2021088788-appb-100029
    优化
    Figure PCTCN2021088788-appb-100030
    Figure PCTCN2021088788-appb-100031
    同样使用共轭梯度法;一次交替为一轮优化,整个优化过程总共进行两轮优化,并且在第一轮优化之后,利用多个视角间前景图片的一致性约束对于第一轮优化后的
    Figure PCTCN2021088788-appb-100032
    进行去噪,具体的,已知第一轮优化后的
    Figure PCTCN2021088788-appb-100033
    Figure PCTCN2021088788-appb-100034
    利用如下公式获取去噪后的图像
    Figure PCTCN2021088788-appb-100035
    Figure PCTCN2021088788-appb-100036
    Figure PCTCN2021088788-appb-100037
    Figure PCTCN2021088788-appb-100038
    使用
    Figure PCTCN2021088788-appb-100039
    Figure PCTCN2021088788-appb-100040
    作为
    Figure PCTCN2021088788-appb-100041
    的初始值继续第二轮的优化,进一步地,在第二轮优化中添加一个先验项到总的能量方程:
    Figure PCTCN2021088788-appb-100042
    其中λ g为先验项权重,用于约束第二轮优化;
    在两轮优化之后,利用
    Figure PCTCN2021088788-appb-100043
    变换
    Figure PCTCN2021088788-appb-100044
    获取最终的两层简化的三角网格
    Figure PCTCN2021088788-appb-100045
    和分解之后的
    Figure PCTCN2021088788-appb-100046
    用于正确渲染物体表面的反射效果;
    S4:给定虚拟视角,利用邻域图片及三角网格绘制虚拟视角图片,对于反射区域,利用前景背景图片和前景背景三角网格进行绘制,具体为:将邻域图片中的反射区域β k绘制到当前虚拟视点,获取当前虚拟视点的反射区域β n,对于反射区域内的像素,需要利用前景背景两层的图片和简化的三角网格进行绘制,分别对两层的图像进行深度图的计算和颜色混合,由于两层图片
    Figure PCTCN2021088788-appb-100047
    Figure PCTCN2021088788-appb-100048
    是进行过反伽马矫正之后分解获得的,在渲染阶段,对两层混合的图片相加,做一次伽马矫正获得正确的带反射效果的图片。
  2. 根据权利要求1所述的一种基于反射分解的室内场景虚拟漫游方法,其特征在于,所述S2中,将深度图的深度边缘对齐到原图片的彩色边缘,获取对齐的深度图,具体为:
    首先计算深度图对应的法线图,然后对于深度图中的每个像素i,将其深度值d i按照相机内参转换为局部坐标系的三维点v i,计算相邻点i,j之间的平面距离dt ij=max(|(v i-v j)·n i|,|(v i-v j)·n j|),n i,n j分别为点i,j的法线向量,如果dt ij大于λ*max(1,min(d i,d j)),则将该像素记为深度边缘像素,其中λ为边缘检测阈值;
    对于每一幅图片,在获取所有深度边缘像素之后,利用索贝尔卷积计算深度边缘的局部二维梯度,然后以每一个深度边缘像素为起点,沿着边缘二维梯度方向及其反方向同时逐个像素遍历,直到两边的其中一边遍历到彩色边缘像素;在遍历到彩色边缘像素之后,删除起点像素到该彩色边缘像素中间路径的所有像素的深度值;将删除深度值的像素定义为未对齐像素,将未删除深度值的像素定义对齐像素,对于每个被删除的深度值,利用周围未删除的深度值进行插值填充。
  3. 根据权利要求2所述的一种基于反射分解的室内场景虚拟漫游方法,其特征在于,对于每个被删除的深度值,利用周围未删除的深度值进行插值填充,具体为:对于每个待插值的未对齐像素p i,计算其到所有其他对齐像素的测地线距离d g(p i,p j),利用测地线距离找到m个最近的对齐像素,计算插值的深度值
    Figure PCTCN2021088788-appb-100049
    其中
    Figure PCTCN2021088788-appb-100050
    表示像素p i的最近邻对齐像素集合,w g(i,j)=exp(-d g(p i,p j)),
    Figure PCTCN2021088788-appb-100051
    表示将像素p i投影到p j的局部平面方程上,该局部平面方程由v j和n j计算获得。
  4. 根据权利要求1所述的一种基于反射分解的室内场景虚拟漫游方法,其特征在于,所 述S4中,根据虚拟相机内外参计算邻域图片集合,将当前虚拟相机的局部坐标系按照坐标轴平面分割为8个象限,在每个象限中,进一步选取一系列邻域图片,利用图片光心方向
    Figure PCTCN2021088788-appb-100052
    和虚拟相机光心方向
    Figure PCTCN2021088788-appb-100053
    的夹角
    Figure PCTCN2021088788-appb-100054
    和图片光心t k和虚拟相机光心t n的距离‖t k-t n‖,将每个象限再一次分割为若干区域;之后,在每个区域内,选出相似性d k最小的1张图片加入到邻域图片集合中,
    Figure PCTCN2021088788-appb-100055
    其中λ为距离占比权重;
    在获得邻域图片集合之后,将邻域图片集合中的每一幅图片按照其对应的简化后的三角网格绘制到虚拟视点,具体为:
    a)计算一幅鲁棒深度图,对于面片着色器的每一个像素,计算其渲染代价c(t k,t n,x):
    c(t k,t n,x)=∠(t k-x,t n-x)*π/180+max(0,1-‖t n-x‖/‖t k-x‖)
    其中t k和t n代表图片和虚拟相机的光心三维坐标,x表示该像素对应三维点的三维坐标,每个像素有一系列的三角面片渲染到,这里用“点”表示面片和像素所决定光线的交点,如果某个点的渲染代价大于该像素中所有点的最小渲染代价+范围阈值λ,那么该点不参与计算深度图,如此将所有参与计算深度图的点的深度进行比较取最小值,作为该像素的深度值;
    b)计算完虚拟相机的深度图,将图片作为纹理贴图加入三角网格进行绘制,对于每个虚拟相机图片的像素,将深度图附近的点的颜色按照设定权重w k进行混合,获取最终的渲染颜色。
  5. 根据权利要求1所述的一种基于反射分解的室内场景虚拟漫游方法,其特征在于,所述S4中,为了减小存储规模,所有图片降采样到1/n存储,n≥1,渲染的时候虚拟窗口设置为原图大小。
  6. 根据权利要求1-5任一项所述的一种基于反射分解的室内场景虚拟漫游方法,其特征在于,训练一个超分辨率神经网络补偿存储图片降采样造成的清晰度损失,同时减少可能存在的绘制错误,具体为:在每一个新的虚拟视角渲染获得深度图片和彩色图片之后,利用深度神经网络减少渲染错误并且提升清晰度;网络利用当前帧彩色图片和深度图片加上前一帧彩色图片和深度图片作为输入;首先利用一个三层卷积网络分别对当前帧的深度彩色图片和前一帧的深度彩色图片提取特征,下一步将前一帧特征扭曲映射到当前帧,初始的对应由深度图计算获得,由于深度图并不完全准确,利用对齐模块进一步拟合一个局部二维偏移将前后两帧特征进行进一步对齐,将对齐后的前后两帧特征合并输入通过U-Net卷积神经网络实现的超分辨模块,输出当前帧的高清图片。
PCT/CN2021/088788 2021-04-21 2021-04-21 基于反射分解的室内场景虚拟漫游方法 WO2022222077A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/CN2021/088788 WO2022222077A1 (zh) 2021-04-21 2021-04-21 基于反射分解的室内场景虚拟漫游方法
US18/490,790 US20240169674A1 (en) 2021-04-21 2023-10-20 Indoor scene virtual roaming method based on reflection decomposition

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2021/088788 WO2022222077A1 (zh) 2021-04-21 2021-04-21 基于反射分解的室内场景虚拟漫游方法

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US18/490,790 Continuation US20240169674A1 (en) 2021-04-21 2023-10-20 Indoor scene virtual roaming method based on reflection decomposition

Publications (1)

Publication Number Publication Date
WO2022222077A1 true WO2022222077A1 (zh) 2022-10-27

Family

ID=83723623

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/088788 WO2022222077A1 (zh) 2021-04-21 2021-04-21 基于反射分解的室内场景虚拟漫游方法

Country Status (2)

Country Link
US (1) US20240169674A1 (zh)
WO (1) WO2022222077A1 (zh)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116546183A (zh) * 2023-04-06 2023-08-04 华中科技大学 一种基于单帧图像的3d动态视频生成方法
CN116758136A (zh) * 2023-08-21 2023-09-15 杭州蓝芯科技有限公司 一种货物体积实时在线识别方法、系统、设备以及介质
CN116883607A (zh) * 2023-09-06 2023-10-13 四川物通科技有限公司 基于辐射传输的虚拟现实场景生成系统
CN116958449A (zh) * 2023-09-12 2023-10-27 北京邮电大学 城市场景三维建模方法、装置及电子设备
CN117011446A (zh) * 2023-08-23 2023-11-07 苏州深捷信息科技有限公司 一种动态环境光照的实时渲染方法
CN117934700A (zh) * 2023-11-15 2024-04-26 广州极点三维信息科技有限公司 基于神经渲染的三维家居漫游场景重建方法、系统及介质
CN117994444A (zh) * 2024-04-03 2024-05-07 浙江华创视讯科技有限公司 复杂场景的重建方法、设备及存储介质
CN118135079A (zh) * 2024-05-07 2024-06-04 中国人民解放军国防科技大学 基于云端融合的三维场景漫游绘制方法、装置及设备
CN118135079B (zh) * 2024-05-07 2024-07-09 中国人民解放军国防科技大学 基于云端融合的三维场景漫游绘制方法、装置及设备

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150178988A1 (en) * 2012-05-22 2015-06-25 Telefonica, S.A. Method and a system for generating a realistic 3d reconstruction model for an object or being
CN109064533A (zh) * 2018-07-05 2018-12-21 深圳奥比中光科技有限公司 一种3d漫游方法及系统
CN110288712A (zh) * 2019-03-30 2019-09-27 天津大学 室内场景的稀疏多视角三维重建方法
CN110458939A (zh) * 2019-07-24 2019-11-15 大连理工大学 基于视角生成的室内场景建模方法
CN111652963A (zh) * 2020-05-07 2020-09-11 浙江大学 一种基于神经网络的增强现实绘制方法

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150178988A1 (en) * 2012-05-22 2015-06-25 Telefonica, S.A. Method and a system for generating a realistic 3d reconstruction model for an object or being
CN109064533A (zh) * 2018-07-05 2018-12-21 深圳奥比中光科技有限公司 一种3d漫游方法及系统
CN110288712A (zh) * 2019-03-30 2019-09-27 天津大学 室内场景的稀疏多视角三维重建方法
CN110458939A (zh) * 2019-07-24 2019-11-15 大连理工大学 基于视角生成的室内场景建模方法
CN111652963A (zh) * 2020-05-07 2020-09-11 浙江大学 一种基于神经网络的增强现实绘制方法

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
JIANG HANGQING;ZHAO CHANGFEI;ZHANG GUOFENG;WANG HUIYAN;BAO HUJUN: "Multi-View Depth Map Sampling for 3D Reconstruction of Natural Scene", JOURNAL OF COMPUTER-AIDED DESIGN & COMPUTER GRAPHICS, vol. 27, no. 10, 15 October 2015 (2015-10-15), pages 1805 - 1815, XP055979893 *
REVAUD JEROME; WEINZAEPFEL PHILIPPE; HARCHAOUI ZAID; SCHMID CORDELIA: "EpicFlow: Edge-preserving interpolation of correspondences for optical flow", 2015 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 7 June 2015 (2015-06-07), pages 1164 - 1172, XP032793523, DOI: 10.1109/CVPR.2015.7298720 *

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116546183B (zh) * 2023-04-06 2024-03-22 华中科技大学 基于单帧图像的具有视差效果的动态图像生成方法及系统
CN116546183A (zh) * 2023-04-06 2023-08-04 华中科技大学 一种基于单帧图像的3d动态视频生成方法
CN116758136A (zh) * 2023-08-21 2023-09-15 杭州蓝芯科技有限公司 一种货物体积实时在线识别方法、系统、设备以及介质
CN116758136B (zh) * 2023-08-21 2023-11-10 杭州蓝芯科技有限公司 一种货物体积实时在线识别方法、系统、设备以及介质
CN117011446A (zh) * 2023-08-23 2023-11-07 苏州深捷信息科技有限公司 一种动态环境光照的实时渲染方法
CN117011446B (zh) * 2023-08-23 2024-03-08 苏州深捷信息科技有限公司 一种动态环境光照的实时渲染方法
CN116883607A (zh) * 2023-09-06 2023-10-13 四川物通科技有限公司 基于辐射传输的虚拟现实场景生成系统
CN116883607B (zh) * 2023-09-06 2023-12-05 四川物通科技有限公司 基于辐射传输的虚拟现实场景生成系统
CN116958449A (zh) * 2023-09-12 2023-10-27 北京邮电大学 城市场景三维建模方法、装置及电子设备
CN116958449B (zh) * 2023-09-12 2024-04-30 北京邮电大学 城市场景三维建模方法、装置及电子设备
CN117934700A (zh) * 2023-11-15 2024-04-26 广州极点三维信息科技有限公司 基于神经渲染的三维家居漫游场景重建方法、系统及介质
CN117994444A (zh) * 2024-04-03 2024-05-07 浙江华创视讯科技有限公司 复杂场景的重建方法、设备及存储介质
CN118135079A (zh) * 2024-05-07 2024-06-04 中国人民解放军国防科技大学 基于云端融合的三维场景漫游绘制方法、装置及设备
CN118135079B (zh) * 2024-05-07 2024-07-09 中国人民解放军国防科技大学 基于云端融合的三维场景漫游绘制方法、装置及设备

Also Published As

Publication number Publication date
US20240169674A1 (en) 2024-05-23

Similar Documents

Publication Publication Date Title
WO2022222077A1 (zh) 基于反射分解的室内场景虚拟漫游方法
US11727587B2 (en) Method and system for scene image modification
Penner et al. Soft 3d reconstruction for view synthesis
CN105245841B (zh) 一种基于cuda的全景视频监控系统
CN106651938B (zh) 一种融合高分辨率彩色图像的深度图增强方法
Lei et al. Depth map super-resolution considering view synthesis quality
CN106570507B (zh) 单目视频场景三维结构的多视角一致的平面检测解析方法
Li et al. Detail-preserving and content-aware variational multi-view stereo reconstruction
Bradley et al. Accurate multi-view reconstruction using robust binocular stereo and surface meshing
Dolson et al. Upsampling range data in dynamic environments
US5963664A (en) Method and system for image combination using a parallax-based technique
CN111243071A (zh) 实时三维人体重建的纹理渲染方法、系统、芯片、设备和介质
CN113223132B (zh) 基于反射分解的室内场景虚拟漫游方法
WO2016110239A1 (zh) 图像处理方法和装置
CN112434709A (zh) 基于无人机实时稠密三维点云和dsm的航测方法及系统
Ma et al. An operational superresolution approach for multi-temporal and multi-angle remotely sensed imagery
CN111553841A (zh) 一种基于最佳缝合线更新的实时视频拼接算法
CN116958437A (zh) 融合注意力机制的多视图重建方法及系统
Xu et al. Hybrid mesh-neural representation for 3D transparent object reconstruction
Chen et al. Kinect depth recovery using a color-guided, region-adaptive, and depth-selective framework
Pan et al. Depth map completion by jointly exploiting blurry color images and sparse depth maps
Xu et al. Scalable image-based indoor scene rendering with reflections
Galea et al. Denoising of 3D point clouds constructed from light fields
CN112132971A (zh) 三维人体建模方法、装置、电子设备和存储介质
Coorg Pose imagery and automated three-dimensional modeling of urban environments

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21937320

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21937320

Country of ref document: EP

Kind code of ref document: A1