WO2024007182A1 - 静态NeRF模型与动态NeRF模型融合的视频渲染方法及系统 - Google Patents

静态NeRF模型与动态NeRF模型融合的视频渲染方法及系统 Download PDF

Info

Publication number
WO2024007182A1
WO2024007182A1 PCT/CN2022/104048 CN2022104048W WO2024007182A1 WO 2024007182 A1 WO2024007182 A1 WO 2024007182A1 CN 2022104048 W CN2022104048 W CN 2022104048W WO 2024007182 A1 WO2024007182 A1 WO 2024007182A1
Authority
WO
WIPO (PCT)
Prior art keywords
nerf
static
dynamic
model
rendering
Prior art date
Application number
PCT/CN2022/104048
Other languages
English (en)
French (fr)
Inventor
许杭锟
张岩
李兆涵
李阮存
Original Assignee
北京原创力科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京原创力科技有限公司 filed Critical 北京原创力科技有限公司
Priority to PCT/CN2022/104048 priority Critical patent/WO2024007182A1/zh
Publication of WO2024007182A1 publication Critical patent/WO2024007182A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T15/003D [Three Dimensional] image rendering
    • G06T15/08Volume rendering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/90Determination of colour characteristics

Definitions

  • the invention relates to the technical fields of three-dimensional scene reconstruction and three-dimensional video editing and rendering, and in particular to a method and system for integrating a static NeRF model and a dynamic NeRF model.
  • Neural Radiance Fields can reconstruct the 3D model of the static scene. Based on NeRF, pictures from a new perspective can be generated, and the imaging quality is photo-level.
  • some dynamic NeRF adding time information can reconstruct the 3D model of the dynamic scene.
  • images can be rendered at any viewing angle and time.
  • Dynamic NeRF does not make this distinction, resulting in the entire scene having to be reconstructed using a larger model. Training larger dynamic NeRF requires excessive computing resources.
  • the present invention proposes a video rendering method that integrates the static NeRF model and the dynamic NeRF model, which includes:
  • Step 1 Obtain the training video taken in the specified scene, and build a video rendering model including static NeRF, dynamic NeRF and activation function network layer;
  • Step 2 Connect the optical center of the video frame of the training video to the specified pixel point in the video frame to obtain a sampling ray.
  • Volume rendering is performed based on the position, volume density and color information of the sampling point to obtain the rendering color of a given pixel. Based on the difference between the rendering color and the actual color, a loss function is constructed to train the static NeRF;
  • Step 3 Take a certain number of points on the sampling ray and input the trained static NeRF to obtain the static volume density and static color information on the sampling points; combine the sampling points on the sampling ray with the time information, input the dynamic NeRF, and obtain the samples Dynamic volume density and dynamic color information of the point; add the dynamic volume density and static volume density and input it into the activation function network layer to get the final volume density; add the dynamic color information and static color information and input it into the activation function network layer , obtain the final color information; perform volume rendering based on the position of the sampling point, final volume density and final color information to obtain the final color of a given pixel. Based on the difference between the final color and the actual color, a loss function is constructed to train the dynamic NeRF;
  • Step 4 Based on the camera parameters to be rendered, camera external parameters and time period, calculate the sampling rays of the pixels to be rendered corresponding to each specific time point in the time period, select the sampling points and input them into the trained video rendering model to obtain all samples
  • the volume density and color information of the point, combined with volume rendering, are used to obtain the color of all given pixel points as the single-frame picture rendering result at that specific time point, and the video rendering result is generated by combining all the single-frame picture rendering results at that specific time point.
  • the optical center is obtained based on the internal parameters and external parameters of the camera.
  • the internal parameters are the focal length and pixel size of the camera
  • the external parameters are the position and orientation of the camera in the selected coordinate system.
  • the training video is a multi-channel video after frame synchronization.
  • the present invention also proposes a video rendering system that integrates the static NeRF model and the dynamic NeRF model, which includes:
  • the initial module is used to obtain training videos taken in specified scenes and build a video rendering model including static NeRF, dynamic NeRF and activation function network layers;
  • the static NeRF training module is used to connect the optical center of the video frame of the training video to the specified pixel point in the video frame to obtain a sampling ray, select multiple sampling points on the sampling ray, input the static NeRF, and obtain the sampling points Based on the volume density and color information of the sampling point, volume rendering is performed based on the position, volume density and color information of the sampling point to obtain the rendering color of a given pixel. Based on the difference between the rendering color and the actual color, a loss function is constructed to train the static NeRF;
  • the dynamic NeRF training module is used to select a certain number of points on the sampling ray and input the trained static NeRF to obtain the static volume density and static color information on the sampling point; combine the sampling points on the sampling ray with time information to input the dynamic NeRF, obtain the dynamic volume density and dynamic color information of the sampling point; add the dynamic volume density and static volume density and input it into the activation function network layer to get the final volume density; add the dynamic color information and static color information and input it Activate the function network layer to obtain the final color information; perform volume rendering based on the position of the sampling point, final volume density and final color information to obtain the final color of a given pixel, and construct a loss function based on the difference between the final color and the actual color , train the dynamic NeRF;
  • the rendering module is used to calculate the sampling rays of the pixels to be rendered corresponding to each specific time point in the time period based on the camera parameters to be rendered, the camera external parameters and the time period, and select the sampling points to input the trained video rendering model to obtain
  • the volume density and color information of all sampling points are combined with volume rendering to obtain the color of all given pixels as the single-frame image rendering result at that specific time point.
  • the video rendering result is generated by combining all the single-frame image rendering results at that specific time point. .
  • the optical center is obtained based on the internal parameters and external parameters of the camera.
  • the internal parameters are the focal length and pixel size of the camera, and the external parameters are the position and orientation of the camera in the selected coordinate system.
  • the training video is a multi-channel video after frame synchronization.
  • the present invention also provides a storage medium for storing a program for executing the video rendering method of integrating any one of the static NeRF models and the dynamic NeRF models.
  • the present invention also proposes a client for a video rendering system that integrates any of the static NeRF models and the dynamic NeRF models.
  • Static NeRF does not contain time information, and the model capacity can be reduced a lot without losing the rendering effect of the static part of the scene.
  • Figure 1 is a schematic diagram of the combination of static NeRF and dynamic NeRF
  • FIG. 1 is the overall flow chart of this application.
  • Figure 3 is a schematic diagram of NeRF model inference and training
  • Figure 4 is a schematic diagram of the shooting scene according to the embodiment.
  • FIG. 1 The overall model structure diagram of the present invention is shown in Figure 1.
  • (x, y, z) are the three-dimensional coordinates of the position
  • t is the time
  • is the volume density
  • rgb is the value of the three color channels of the image.
  • ⁇ h and rgb h are the feature vectors corresponding to ⁇ and rgb.
  • the steps of the present invention include:
  • the line connecting time t to the model in the figure is a dotted line, which means that static NeRF training does not require time input, and only dynamic NeRF requires time input.
  • a certain number of pixels are first randomly selected.
  • the optical center of the image can be obtained based on the internal and external parameters of the corresponding camera.
  • a sampling ray can be obtained by connecting a line with a given pixel point.
  • Select a certain number of points on the ray input the static NeRF model, and obtain the volume density and color RGB information at the sampling points.
  • the rendering color RGB of a given pixel can be obtained. Calculate the difference between the rendered color rgb and the real shooting rgb information, and optimize this difference, you can get the static NeRF model.
  • static NeRF In the process of training static NeRF, time information is not used. That is, for all frames of the same camera, the rays emitted from the same pixels, sampling points, static NeRF output and volume rendering results are the same. In the static areas captured by the camera, the color information of the same pixels in all frames is consistent. Static NeRF can converge well in these areas and achieve good results. In the dynamic area captured by the camera, static NeRF cannot achieve good convergence here because the color information is different between different frames. Judging from the results, the rendering of dynamic areas by static NeRF is blurry.
  • the static NeRF output volume density features are added to the dynamic NeRF output volume density features, and then the volume density is obtained through the activation function.
  • the static NeRF output color (RGB) features are added to the dynamic NeRF output color (RGB) features, and then the color (RGB) is obtained through the activation function.
  • the parameters of static NeRF remain unchanged, and only the parameters of dynamic NeRF are trained. In other words, dynamic NeRF only needs to compensate for areas where static NeRF is not clearly expressed.
  • the steps of the present invention include:
  • Video collection As shown in Figure 4, a camera is used to shoot a dancing person in a row.
  • the camera arrays are frame synchronized.
  • Internal parameters refer to parameters such as the focal length of the camera and the size of the pixels.
  • External parameters refer to the position and orientation of the camera relative to a selected coordinate system.
  • a certain number of pixels are first randomly selected. According to the internal parameters and external parameters of the corresponding camera, we can get the optical center. Starting from the optical center, a sampling ray can be obtained by connecting a given pixel point. Select a certain number of points on the ray, input the static NeRF model, and obtain the volume density and color RGB information of the sampling points without activation function. A certain number of points are sampled on the ray, combined with time information, and input into the dynamic NeRF model to obtain the volume density and color RGB information on the sampling points without activation functions.
  • the volume density and color rgb information obtained by dynamic NeRF and static NeRF without activation function are added and input into the final activation function network layer to obtain the final volume density and color rgb information.
  • the rendering color RGB of a given pixel can be obtained.
  • the dynamic NeRF model can be obtained.
  • the present invention also proposes a video rendering system that integrates the static NeRF model and the dynamic NeRF model, which includes:
  • the initial module is used to obtain training videos taken in specified scenes and build a video rendering model including static NeRF, dynamic NeRF and activation function network layers;
  • the static NeRF training module is used to connect the optical center of the video frame of the training video to the specified pixel point in the video frame to obtain a sampling ray, select multiple sampling points on the sampling ray, input the static NeRF, and obtain the sampling points Based on the volume density and color information of the sampling point, volume rendering is performed based on the position, volume density and color information of the sampling point to obtain the rendering color of a given pixel. Based on the difference between the rendering color and the actual color, a loss function is constructed to train the static NeRF;
  • the dynamic NeRF training module is used to select a certain number of points on the sampling ray and input the trained static NeRF to obtain the static volume density and static color information on the sampling point; combine the sampling points on the sampling ray with time information to input the dynamic NeRF, obtain the dynamic volume density and dynamic color information of the sampling point; add the dynamic volume density and static volume density and input it into the activation function network layer to get the final volume density; add the dynamic color information and static color information and input it Activate the function network layer to obtain the final color information; perform volume rendering based on the position of the sampling point, final volume density and final color information to obtain the final color of a given pixel, and construct a loss function based on the difference between the final color and the actual color , train the dynamic NeRF;
  • the rendering module is used to calculate the sampling rays of the pixels to be rendered corresponding to each specific time point in the time period based on the camera parameters to be rendered, the camera external parameters and the time period, and select the sampling points to input the trained video rendering model to obtain
  • the volume density and color information of all sampling points are combined with volume rendering to obtain the color of all given pixels as the single-frame image rendering result at that specific time point.
  • the video rendering result is generated by combining all the single-frame image rendering results at that specific time point. .
  • the optical center is obtained based on the internal parameters and external parameters of the camera.
  • the internal parameters are the focal length and pixel size of the camera, and the external parameters are the position and orientation of the camera in the selected coordinate system.
  • the training video is a multi-channel video after frame synchronization.
  • the present invention also provides a storage medium for storing a program for executing the video rendering method of integrating any one of the static NeRF models and the dynamic NeRF models.
  • the present invention also proposes a client for a video rendering system that integrates any of the static NeRF models and the dynamic NeRF models.
  • the present invention proposes a video rendering method and system that integrates a static NeRF model and a dynamic NeRF model, including: connecting the optical center of the video frame of the training video to the designated pixel point in the video frame to obtain a sampling ray, on the sampling ray Select multiple sampling points to train static NeRF; and obtain the static volume density and static color information of the sampling points based on the static NeRF extraction completed by training. Combined with the time information, train the dynamic NeRF to obtain the dynamic volume density and dynamic color information of the sampling points.
  • the static NeRF used in the present invention does not contain time information, the model size is small, and the rendering effect of the static part in the scene is not lost.
  • the volume density features and color features of dynamic NeRF in static areas in the scene are very close to 0. Therefore, during the rendering process, dynamic NeRF does not require inference in areas where the volume density features and color features are close to 0, further speeding up the inference speed.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Graphics (AREA)
  • Geometry (AREA)
  • Software Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Generation (AREA)

Abstract

本发明提出了一种静态NeRF模型与动态NeRF模型融合的视频渲染方法和系统,包括:在训练用视频的采样射线上选取多个采样点,训练静态NeRF;并基于训练完成的静态NeRF提取的静态体积密度和颜色信息,结合时间信息,训练动态NeRF,获得采样点的动态体积密度和颜色信息;将动态体积密度和静态体积密度相加后输入激活函数网络层,得到最终体积密度;将动态颜色信息和静态颜色信息相加后输入激活函数网络层,得到最终颜色信息;将待渲染的参数输入静态NeRF和动态NeRF,得到采样点的体积密度和颜色信息,结合体积渲染,得到单帧图片渲染结果,结合所有单帧图片渲染结果生成视频渲染结果。本发明的采用静态NeRF模型与动态NeRF模型相互结合,加快了视频渲染速度。

Description

静态NeRF模型与动态NeRF模型融合的视频渲染方法及系统 技术领域
本发明涉及三维场景重建和三维视频编辑渲染技术领域,并特别涉及一种静态NeRF模型与动态NeRF模型融合的方法及系统。
背景技术
基于静态场景的多路RGB图片输入,神经辐射场(Neural Radiance Fields,NeRF)可以重建该静态场景的3D模型。基于NeRF可以生成全新视角下的图片,而且成像质量是照片级别。
基于动态场景的多路RGB视频输入,一些加入了时间信息的动态NeRF可以重建动态场景的3D模型。基于动态的NeRF,可以在任意的视角和时间,渲染出图片。
但是目前在很多动态场景中,很大的一部分物体或者背景都是静态的,而只有一小部分物体是动态的。动态NeRF不对此做区分,导致必须使用较大的模型重建整个场景。而训练较大的动态NeRF需要过多的计算资源。
发明公开
针对现有技术的不足,本发明提出一种静态NeRF模型与动态NeRF模型融合的视频渲染方法,其中包括:
步骤1、获取指定场景下拍摄的训练用视频,构建包括静态NeRF、动态NeRF和激活函数网络层的视频渲染模型;
步骤2、将该训练用视频的视频帧的光心与该视频帧中指定像素点相连,得到采样射线,在该采样射线上选取多个采样点,输入静态NeRF,获得采样点上的体积密度和颜色信息,基于采样点的位置、体积密度和颜色信息进行体积渲染,得到给定像素点的渲染颜色,根据该渲染颜色与实际颜色间的差别,构建损失函数,训练该静态NeRF;
步骤3、在采样射线上采一定数量的点输入训练完成的静态NeRF,获得采样点上的静态体积密度和静态颜色信息;将采样射线上的采样点结合时间信 息,输入该动态NeRF,获得采样点的动态体积密度和动态颜色信息;将动态体积密度和静态体积密度相加后输入该激活函数网络层,得到最终体积密度;将动态颜色信息和静态颜色信息相加后输入该激活函数网络层,得到最终颜色信息;基于采样点的位置、最终体积密度和最终颜色信息进行体积渲染,得到给定像素点的最终颜色,根据该最终颜色与实际颜色间的差别,构建损失函数,训练该动态NeRF;
步骤4、根据待渲染的相机参数、相机外参和时间段,计算该时间段中每一特定时间点对应的待渲染像素的采样射线,选取采样点输入训练完成的视频渲染模型,得到所有采样点的体积密度和颜色信息,结合体积渲染,得到全部给定像素点的颜色作为该特定时间点的单帧图片渲染结果,结合所有该特定时间点的单帧图片渲染结果生成视频渲染结果。
所述的静态NeRF模型与动态NeRF模型融合的视频渲染方法,其中根据相机的内参和外参,得到该光心。
所述的静态NeRF模型与动态NeRF模型融合的视频渲染方法,其中该内参为相机的焦距,像素点尺寸,该外参为在选取的坐标系,摄像机所处的位置和朝向。
所述的静态NeRF模型与动态NeRF模型融合的视频渲染方法,其中该训练用视频为经过帧同步后的多路视频。
本发明还提出了一种静态NeRF模型与动态NeRF模型融合的视频渲染系统,其中包括:
初始模块,用于获取指定场景下拍摄的训练用视频,构建包括静态NeRF、动态NeRF和激活函数网络层的视频渲染模型;
静态NeRF训练模块,用于将该训练用视频的视频帧的光心与该视频帧中指定像素点相连,得到采样射线,在该采样射线上选取多个采样点,输入静态NeRF,获得采样点上的体积密度和颜色信息,基于采样点的位置、体积密度和颜色信息进行体积渲染,得到给定像素点的渲染颜色,根据该渲染颜色与实际颜色间的差别,构建损失函数,训练该静态NeRF;
动态NeRF训练模块,用于在采样射线上采一定数量的点输入训练完成的静态NeRF,获得采样点上的静态体积密度和静态颜色信息;将采样射线上的采样点结合时间信息,输入该动态NeRF,获得采样点的动态体积密度和动态 颜色信息;将动态体积密度和静态体积密度相加后输入该激活函数网络层,得到最终体积密度;将动态颜色信息和静态颜色信息相加后输入该激活函数网络层,得到最终颜色信息;基于采样点的位置、最终体积密度和最终颜色信息进行体积渲染,得到给定像素点的最终颜色,根据该最终颜色与实际颜色间的差别,构建损失函数,训练该动态NeRF;
渲染模块,用于根据待渲染的相机参数、相机外参和时间段,计算该时间段中每一特定时间点对应的待渲染像素的采样射线,选取采样点输入训练完成的视频渲染模型,得到所有采样点的体积密度和颜色信息,结合体积渲染,得到全部给定像素点的颜色作为该特定时间点的单帧图片渲染结果,结合所有该特定时间点的单帧图片渲染结果生成视频渲染结果。
所述的静态NeRF模型与动态NeRF模型融合的视频渲染系统,其中根据相机的内参和外参,得到该光心。
所述的静态NeRF模型与动态NeRF模型融合的视频渲染系统,其中该内参为相机的焦距,像素点尺寸,该外参为在选取的坐标系,摄像机所处的位置和朝向。
所述的静态NeRF模型与动态NeRF模型融合的视频渲染系统,其中该训练用视频为经过帧同步后的多路视频。
本发明还提出了一种存储介质,用于存储执行所述任意一种静态NeRF模型与动态NeRF模型融合的视频渲染方法的程序。
本发明还提出了一种客户端,用于所述任意一种静态NeRF模型与动态NeRF模型融合的视频渲染系统。
由以上方案可知,本发明的优点在于:
1、静态NeRF不含时间的信息,模型容量可以减小很多,而不损失场景中静态部分的渲染效果。
2、由于静态NeRF对于场景中静态部分已经进行了很好的重建,动态NeRF在这些区域输出的体积密度特征和颜色特征十分接近0,因此大大减小了动态NeRF所需表达的信息,动态NeRF的模型容量也可以大大减小。
3、由于静态NeRF和动态NeRF的容量减小,总的模型的推理时间大大减小,加快了渲染速度。
4、动态NeRF在场景中的静态区域的体积密度特征和颜色特征十分接近 0。基于此,渲染过程中,动态NeRF在这些区域是无需推理的,进一步加快了推理速度。
附图简要说明
图1为静态NeRF与动态NeRF的结合方式示意图;
图2为本申请整体流程图;
图3为NeRF模型推理以及训练示意图;
图4为实施例的拍摄场景示意图。
实现本发明的最佳方式
为让本发明的上述特征和效果能阐述的更明确易懂,下文特举实施例,并配合说明书附图作详细说明如下。
本发明整体模型结构图如图1所示,图中(x,y,z)为位置的三维坐标,t为时间,σ为体积密度,rgb为图像的颜色三通道的值。而σ h和rgb h是与σ和rgb对应的特征向量。本发明步骤包括:
训练静态NeRF,如图3所示,图中时间t到模型的连线为虚线,表示静态NeRF训练不需要时间输入,只有动态NeRF需要时间输入。对于输入图片,首先随机选取一定数量的像素点。根据对应摄像机的内参和外参可以得到图片的光心。由光心出发,与给定像素点连线可得一根采样射线。在射线上采一定数量的点,输入静态NeRF模型,获得采样点上的体积密度和颜色rgb信息。将该采样射线上所有的采样点的位置,体积密度和颜色rgb信息输入到体积渲染模块中,就可以获得给定像素点的渲染颜色rgb。计算渲染颜色rgb与真实的拍摄rgb信息之间的差别,并且优化这个差别,就可以得到静态NeRF模型。
在训练静态NeRF的过程中,并没有利用到时间信息。即对于同一个摄像机的所有帧,其相同像素点射出的射线,采样点,静态NeRF的输出和体积渲染的结果都是相同的。在该摄像机拍摄到的静态区域,所有帧的相同像素点的颜色信息是一致的,静态NeRF在这些区域能够很好收敛,获得很好的效果。在该摄像机拍摄到的动态区域,由于不同帧之间颜色信息是不同的,静态NeRF在这里无法获得很好的收敛。从结果上看,静态NeRF对动态区域的渲染是模糊的。
训练动态NeRF模型。在利用模型进行推理过程中,静态NeRF输出体积 密度特征与动态NeRF输出的体积密度特征相加,再经过激活函数得到体积密度。在利用模型进行推理过程中,静态NeRF输出颜色(RGB)特征与动态NeRF输出的颜色(RGB)特征相加,再经过激活函数得到颜色(RGB)。训练过程中,静态NeRF的参数保持不变,只训练动态NeRF的参数。也就是说,动态NeRF只需补偿静态NeRF表达不清晰的区域。
分拆静态NeRF和动态NeRF,加速渲染。由于静态NeRF对于场景中的静态部分已经有良好建模,对静态区域已经可以充分表达。比如空间中静态的某一点的体积密度是1,已经很好地和真实情况吻合,假设此处真实的是体积密度是1.01.此时加入动态模型,在训练的过程中,此处的动态模型只需要收敛0.01,所以动态NeRF在这些静态区域的补偿接近为0。由此我们可以把动态NeRF输出的体积密度和颜色特征的模小于某个阈值的区域划分为静态区域,其他的为动态区域。在渲染过程中,对于静态区域,动态NeRF将不做推理,完全由静态NeRF进行渲染,提升了渲染速度。
以图4的拍摄环境举例来说,本发明步骤包括:
(1)视频采集。如图4所示,采用一个摄像机整列拍摄一个跳舞的人。摄像机阵列之间是帧同步的。
(2)获取摄像机的内参和外参。内参指的是摄像机的焦距,像素点大小等参数。外参指的是相对于选取的某个坐标系,摄像机所处的位置和朝向。我们抽取相同时间下,所有摄像机整列的某一帧输入colmap开源软件包,既可以计算得到所有摄像机的内参和外参。
(3)训练静态NeRF。对于输入图片,首先随机选取一定数量的像素点。根据对应摄像机的内参和外参我们可以得到光心。由光心出发,与给定像素点连线可得一根采样射线。在射线上采一定数量的点,输入静态NeRF模型,获得采样点上的体积密度和颜色rgb信息。将该采样射线上所有的采样点的位置,体积密度和颜色rgb信息输入到体积渲染模块中,就可以获得给定像素点的渲染颜色rgb。计算渲染颜色rgb与真实的拍摄rgb信息之间的差别,并且优化这个差别,就可以得到静态NeRF模型。真实的拍摄rgb信息来源为该输入图片中与给定像素点相对应的像素的色彩。
(4)训练动态NeRF。对于输入图片,首先随机选取一定数量的像素点。根据对应摄像机的内参和外参我们可以得到光心。由光心出发,与给定像素点 连线可得一根采样射线。在射线上采一定数量的点,输入静态NeRF模型,获得采样点上的没有经过激活函数的体积密度和颜色rgb信息。在射线上采一定数量的点,结合时间信息,输入动态NeRF模型,获得采样点上的没有经过激活函数的体积密度和颜色rgb信息。将动态NeRF和静态NeRF得到的没有经过激活函数的体积密度和颜色rgb信息相加并输入最后的激活函数网络层,得到最终的体积密度和颜色rgb信息。将该采样射线上所有的采样点的位置,体积密度和颜色rgb信息输入到体积渲染模块中,就可以获得给定像素点的渲染颜色rgb。计算渲染颜色rgb与真实的拍摄rgb信息之间的差别。优化这个差别,且在优化的过程中固定静态NeRF参数就可以得到动态NeRF模型。
(5)利用训练好的模型进行渲染。给定需要渲染的相机内参,相机外参,和时间,计算要渲染的像素的射线,选取采样点,输入静态NeRF和动态NeRF,得到各个采样点的体积密度和颜色rgb信息,结合体积渲染,得到给定像素点获得的颜色rgb信息。将所有像素点合成图片就完成了给定时间,给定视角下的渲染。对于某个时间段内,根据外参(例如相机位置和朝向),在时间上采样,渲染出特定时间的视角的图片,并且结合所有图片生成视频。
以下为与上述方法实施例对应的系统实施例,本实施方式可与上述实施方式互相配合实施。上述实施方式中提到的相关技术细节在本实施方式中依然有效,为了减少重复,这里不再赘述。相应地,本实施方式中提到的相关技术细节也可应用在上述实施方式中。
本发明还提出了一种静态NeRF模型与动态NeRF模型融合的视频渲染系统,其中包括:
初始模块,用于获取指定场景下拍摄的训练用视频,构建包括静态NeRF、动态NeRF和激活函数网络层的视频渲染模型;
静态NeRF训练模块,用于将该训练用视频的视频帧的光心与该视频帧中指定像素点相连,得到采样射线,在该采样射线上选取多个采样点,输入静态NeRF,获得采样点上的体积密度和颜色信息,基于采样点的位置、体积密度和颜色信息进行体积渲染,得到给定像素点的渲染颜色,根据该渲染颜色与实际颜色间的差别,构建损失函数,训练该静态NeRF;
动态NeRF训练模块,用于在采样射线上采一定数量的点输入训练完成的静态NeRF,获得采样点上的静态体积密度和静态颜色信息;将采样射线上的 采样点结合时间信息,输入该动态NeRF,获得采样点的动态体积密度和动态颜色信息;将动态体积密度和静态体积密度相加后输入该激活函数网络层,得到最终体积密度;将动态颜色信息和静态颜色信息相加后输入该激活函数网络层,得到最终颜色信息;基于采样点的位置、最终体积密度和最终颜色信息进行体积渲染,得到给定像素点的最终颜色,根据该最终颜色与实际颜色间的差别,构建损失函数,训练该动态NeRF;
渲染模块,用于根据待渲染的相机参数、相机外参和时间段,计算该时间段中每一特定时间点对应的待渲染像素的采样射线,选取采样点输入训练完成的视频渲染模型,得到所有采样点的体积密度和颜色信息,结合体积渲染,得到全部给定像素点的颜色作为该特定时间点的单帧图片渲染结果,结合所有该特定时间点的单帧图片渲染结果生成视频渲染结果。
所述的静态NeRF模型与动态NeRF模型融合的视频渲染系统,其中根据相机的内参和外参,得到该光心。
所述的静态NeRF模型与动态NeRF模型融合的视频渲染系统,其中该内参为相机的焦距,像素点尺寸,该外参为在选取的坐标系,摄像机所处的位置和朝向。
所述的静态NeRF模型与动态NeRF模型融合的视频渲染系统,其中该训练用视频为经过帧同步后的多路视频。
本发明还提出了一种存储介质,用于存储执行所述任意一种静态NeRF模型与动态NeRF模型融合的视频渲染方法的程序。
本发明还提出了一种客户端,用于所述任意一种静态NeRF模型与动态NeRF模型融合的视频渲染系统。
工业应用性
本发明提出了一种静态NeRF模型与动态NeRF模型融合的视频渲染方法和系统,包括:将训练用视频的视频帧的光心与视频帧中指定像素点相连,得到采样射线,在采样射线上选取多个采样点,训练静态NeRF;并基于训练完成的静态NeRF提取的获得采样点的静态体积密度和静态颜色信息,结合时间信息,训练动态NeRF,获得采样点的动态体积密度和动态颜色信息;将动态体积密度和静态体积密度相加后输入激活函数网络层,得到最终体积密度;将动态颜色信息和静态颜色信息相加后输入激活函数网络层,得到最终颜色信息; 将待渲染的参数输入静态NeRF和动态NeRF,得到采样点的体积密度和颜色信息,结合体积渲染,得到全部给定像素点的颜色作为特定时间点的单帧图片渲染结果,结合所有特定时间点的单帧图片渲染结果生成视频渲染结果。本发明的采用静态NeRF不含时间的信息,模型体积小,且不损失场景中静态部分的渲染效果。动态NeRF在场景中的静态区域的体积密度特征和颜色特征十分接近0,因此渲染过程中,动态NeRF在体积密度特征和颜色特征近0的区域无需推理的,进一步加快了推理速度。

Claims (10)

  1. 一种静态NeRF模型与动态NeRF模型融合的视频渲染方法,其特征在于,包括:
    步骤1、获取指定场景下拍摄的训练用视频,构建包括静态NeRF、动态NeRF和激活函数网络层的视频渲染模型;
    步骤2、将该训练用视频的视频帧的光心与该视频帧中指定像素点相连,得到采样射线,在该采样射线上选取多个采样点,输入静态NeRF,获得采样点上的体积密度和颜色信息,基于采样点的位置、体积密度和颜色信息进行体积渲染,得到给定像素点的渲染颜色,根据该渲染颜色与实际颜色间的差别,构建损失函数,训练该静态NeRF;
    步骤3、在采样射线上采一定数量的点输入训练完成的静态NeRF,获得采样点上的静态体积密度和静态颜色信息;将采样射线上的采样点结合时间信息,输入该动态NeRF,获得采样点的动态体积密度和动态颜色信息;将动态体积密度和静态体积密度的特征相加后输入该激活函数网络层,得到最终体积密度;将动态颜色信息和静态颜色特征相加后输入该激活函数网络层,得到最终颜色信息;基于采样点的位置、最终体积密度和最终颜色信息进行体积渲染,得到给定像素点的最终颜色,根据该最终颜色与实际颜色间的差别,构建损失函数,训练该动态NeRF;
    步骤4、根据待渲染的相机参数、相机外参和时间段,计算该时间段中每一特定时间点对应的待渲染像素的采样射线,选取采样点输入训练完成的视频渲染模型,得到所有采样点的体积密度和颜色信息,结合体积渲染,得到全部给定像素点的颜色作为该特定时间点的单帧图片渲染结果,结合所有该特定时间点的单帧图片渲染结果生成视频渲染结果。
  2. 如权利要求1所述的静态NeRF模型与动态NeRF模型融合的视频渲染方法,其特征在于,根据相机的内参和外参,得到该光心。
  3. 如权利要求1所述的静态NeRF模型与动态NeRF模型融合的视频渲染方法,其特征在于,该内参为相机的焦距,像素点尺寸,该外参为在选取的坐标系,摄像机所处的位置和朝向。
  4. 如权利要求1所述的静态NeRF模型与动态NeRF模型融合的视频渲染方 法,其特征在于,该训练用视频为经过帧同步后的多路视频。
  5. 一种静态NeRF模型与动态NeRF模型融合的视频渲染系统,其特征在于,包括:
    初始模块,用于获取指定场景下拍摄的训练用视频,构建包括静态NeRF、动态NeRF和激活函数网络层的视频渲染模型;
    静态NeRF训练模块,用于将该训练用视频的视频帧的光心与该视频帧中指定像素点相连,得到采样射线,在该采样射线上选取多个采样点,输入静态NeRF,获得采样点上的体积密度和颜色信息,基于采样点的位置、体积密度和颜色信息进行体积渲染,得到给定像素点的渲染颜色,根据该渲染颜色与实际颜色间的差别,构建损失函数,训练该静态NeRF;
    动态NeRF训练模块,用于在采样射线上采一定数量的点输入训练完成的静态NeRF,获得采样点上的静态体积密度和静态颜色信息;将采样射线上的采样点结合时间信息,输入该动态NeRF,获得采样点的动态体积密度和动态颜色信息;将动态体积密度和静态体积密度的特征相加后输入该激活函数网络层,得到最终体积密度;将动态颜色信息和静态颜色信息相加后输入该激活函数网络层,得到最终颜色信息;基于采样点的位置、最终体积密度和最终颜色信息进行体积渲染,得到给定像素点的最终颜色,根据该最终颜色与实际颜色间的差别,构建损失函数,训练该动态NeRF;
    渲染模块,用于根据待渲染的相机参数、相机外参和时间段,计算该时间段中每一特定时间点对应的待渲染像素的采样射线,选取采样点输入训练完成的视频渲染模型,得到所有采样点的体积密度和颜色信息,结合体积渲染,得到全部给定像素点的颜色作为该特定时间点的单帧图片渲染结果,结合所有该特定时间点的单帧图片渲染结果生成视频渲染结果。
  6. 如权利要求5所述的静态NeRF模型与动态NeRF模型融合的视频渲染系统,其特征在于,根据相机的内参和外参,得到该光心。
  7. 如权利要求5所述的静态NeRF模型与动态NeRF模型融合的视频渲染系统,其特征在于,该内参为相机的焦距,像素点尺寸,该外参为在选取的坐标系,摄像机所处的位置和朝向。
  8. 如权利要求5所述的静态NeRF模型与动态NeRF模型融合的视频渲染系统,其特征在于,该训练用视频为经过帧同步后的多路视频。
  9. 一种存储介质,用于存储执行如权利要求1到4所述任意一种静态NeRF模型与动态NeRF模型融合的视频渲染方法的程序。
  10. 一种客户端,用于权利要求5至8中任意一种静态NeRF模型与动态NeRF模型融合的视频渲染系统。
PCT/CN2022/104048 2022-07-06 2022-07-06 静态NeRF模型与动态NeRF模型融合的视频渲染方法及系统 WO2024007182A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2022/104048 WO2024007182A1 (zh) 2022-07-06 2022-07-06 静态NeRF模型与动态NeRF模型融合的视频渲染方法及系统

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2022/104048 WO2024007182A1 (zh) 2022-07-06 2022-07-06 静态NeRF模型与动态NeRF模型融合的视频渲染方法及系统

Publications (1)

Publication Number Publication Date
WO2024007182A1 true WO2024007182A1 (zh) 2024-01-11

Family

ID=89454751

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/104048 WO2024007182A1 (zh) 2022-07-06 2022-07-06 静态NeRF模型与动态NeRF模型融合的视频渲染方法及系统

Country Status (1)

Country Link
WO (1) WO2024007182A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117593470A (zh) * 2024-01-18 2024-02-23 深圳奥雅设计股份有限公司 一种基于ai模型的街景重构方法及系统

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113099208A (zh) * 2021-03-31 2021-07-09 清华大学 基于神经辐射场的动态人体自由视点视频生成方法和装置
US20220036602A1 (en) * 2020-07-31 2022-02-03 Google Llc View Synthesis Robust To Unconstrained Image Data
CN114049434A (zh) * 2021-11-05 2022-02-15 成都艾特能电气科技有限责任公司 一种基于全卷积神经网络的3d建模方法及系统
CN114241113A (zh) * 2021-11-26 2022-03-25 浙江大学 一种基于深度引导采样的高效神经辐射场渲染方法
CN114663603A (zh) * 2022-05-24 2022-06-24 成都索贝数码科技股份有限公司 一种基于神经辐射场的静态对象三维网格模型生成方法

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20220036602A1 (en) * 2020-07-31 2022-02-03 Google Llc View Synthesis Robust To Unconstrained Image Data
CN113099208A (zh) * 2021-03-31 2021-07-09 清华大学 基于神经辐射场的动态人体自由视点视频生成方法和装置
CN114049434A (zh) * 2021-11-05 2022-02-15 成都艾特能电气科技有限责任公司 一种基于全卷积神经网络的3d建模方法及系统
CN114241113A (zh) * 2021-11-26 2022-03-25 浙江大学 一种基于深度引导采样的高效神经辐射场渲染方法
CN114663603A (zh) * 2022-05-24 2022-06-24 成都索贝数码科技股份有限公司 一种基于神经辐射场的静态对象三维网格模型生成方法

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
CHANG, YUAN; GAI, MENG: "A Review on Neural Radiance Fields Based View Synthesis", JOURNAL OF GRAPHICS, vol. 42, no. 3, 4 June 2021 (2021-06-04), pages 376 - 384, XP009547824, ISSN: 2095-302X, DOI: 10.11996/JG.j.2095-302X.2021030376 *
MILDENHALL BEN; SRINIVASAN PRATUL P.; TANCIK MATTHEW; BARRON JONATHAN T.; RAMAMOORTHI RAVI; NG REN: "NeRF", COMMUNICATIONS OF THE ACM, ASSOCIATION FOR COMPUTING MACHINERY, INC, UNITED STATES, vol. 65, no. 1, 17 December 2021 (2021-12-17), United States , pages 99 - 106, XP058924963, ISSN: 0001-0782, DOI: 10.1145/3503250 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117593470A (zh) * 2024-01-18 2024-02-23 深圳奥雅设计股份有限公司 一种基于ai模型的街景重构方法及系统
CN117593470B (zh) * 2024-01-18 2024-04-02 深圳奥雅设计股份有限公司 一种基于ai模型的街景重构方法及系统

Similar Documents

Publication Publication Date Title
US20220060639A1 (en) Live style transfer on a mobile device
CN109102462A (zh) 一种基于深度学习的视频超分辨率重建方法
US11538138B2 (en) Methods and apparatus for applying motion blur to overcaptured content
CN110120049B (zh) 由单张图像联合估计场景深度与语义的方法
CN111951368B (zh) 一种点云、体素和多视图融合的深度学习方法
RU2770748C1 (ru) Способ и аппарат для обработки изображений, устройство и носитель данных
US11328436B2 (en) Using camera effect in the generation of custom synthetic data for use in training an artificial intelligence model to produce an image depth map
KR20200021891A (ko) 라이트 필드의 중간 뷰 합성 방법, 라이트 필드의 중간 뷰 합성 시스템과 라이트 필드 압축 방법
WO2024007182A1 (zh) 静态NeRF模型与动态NeRF模型融合的视频渲染方法及系统
US20230230304A1 (en) Volumetric capture and mesh-tracking based machine learning 4d face/body deformation training
CN115239857B (zh) 图像生成方法以及电子设备
US10354399B2 (en) Multi-view back-projection to a light-field
Dastjerdi et al. EverLight: Indoor-outdoor editable HDR lighting estimation
WO2022217470A1 (en) Hair rendering system based on deep neural network
Pagés et al. Volograms & v-sense volumetric video dataset
Chen et al. Flow supervised neural radiance fields for static-dynamic decomposition
WO2023217138A1 (zh) 一种参数配置方法、装置、设备、存储介质及产品
Lucas et al. 3D Video: From Capture to Diffusion
WO2022257184A1 (zh) 图像生成装置获取方法及图像生成装置
CN116228855A (zh) 视角图像的处理方法、装置、电子设备及计算机存储介质
CN111064905A (zh) 面向自动驾驶的视频场景转换方法
Huynh et al. A new dimension in testimony: Relighting video with reflectance field exemplars
KR102561903B1 (ko) 클라우드 서버를 이용한 ai 기반의 xr 콘텐츠 서비스 방법
US11893688B2 (en) Method of fusing mesh sequences within volumetric video
CN113824967B (zh) 一种基于深度学习的视频压缩方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22949770

Country of ref document: EP

Kind code of ref document: A1