CN117788672A

CN117788672A - Real-time dynamic human body new perspective rendering method and system based on multi-view video

Info

Publication number: CN117788672A
Application number: CN202311767083.2A
Authority: CN
Inventors: 徐枫; 林文镔; 雍俊海
Original assignee: Tsinghua University
Current assignee: Tsinghua University
Priority date: 2023-12-20
Filing date: 2023-12-20
Publication date: 2024-03-29

Abstract

The invention discloses a real-time dynamic human body new perspective rendering method and system based on multi-view video. This method estimates the posture parameters of the human body based on the multi-view human body image information; constructs the three-dimensional geometric field and texture feature field of the human body based on the posture parameters; based on The implicit neural network performs volume rendering on the shooting perspective image to obtain the shooting perspective rendering image, and constructs a consistency constraint between the shooting perspective rendering image and the shooting perspective image, so as to obtain the three-dimensional geometric field and texture feature field as optimized variables. The optimized texture feature field; based on the optimized texture feature field and human body image information, the human body is rendered from a new perspective to obtain a new perspective rendering image. The invention can realize a new perspective rendering of a dynamic three-dimensional human body with a three-dimensional effect.

Description

Real-time dynamic human body new perspective rendering method and system based on multi-view video

技术领域Technical field

本发明涉及计算机视觉和计算机图形学技术领域，特别是涉及基于多视角视频的实时动态人体新视角渲染方法及系统。The invention relates to the technical fields of computer vision and computer graphics, and in particular to a real-time dynamic human body new perspective rendering method and system based on multi-view videos.

背景技术Background technique

在生产、生活中，人与人之间的实时远程视频通话技术已经被广泛地使用。随着人们对远程通信沉浸感与体验感的追求不断提高，实现三维的、立体的、可自由变换视角的实时远程通信成为了新的技术需求。动态三维人体重建技术在虚拟现实、增强现实、远程通信、影视动画等领域具有广泛的应用前景和重要的应用价值。在实际应用中，人们常常有实时通信的需求，相较于常规的二维视频，更具有立体感三维视频能够带来更加沉浸式的体验。因此如何实现新视角渲染是目前技术的一项空白。In production and life, real-time remote video call technology between people has been widely used. As people's pursuit of immersion and experience in remote communication continues to increase, realizing three-dimensional, three-dimensional, real-time remote communication with freely changing viewing angles has become a new technical requirement. Dynamic three-dimensional human body reconstruction technology has broad application prospects and important application value in fields such as virtual reality, augmented reality, remote communications, film and television animation, etc. In practical applications, people often have real-time communication needs. Compared with conventional two-dimensional videos, three-dimensional videos with a more three-dimensional feel can bring a more immersive experience. Therefore, how to implement new perspective rendering is a gap in current technology.

发明内容Contents of the invention

本发明旨在至少在一定程度上解决相关技术中的技术问题之一。The present invention aims to solve one of the technical problems in the related art, at least to a certain extent.

为此，本发明提出了一种基于多视角视频的实时动态人体新视角渲染方法，能够通过用户拍摄的多视角人体运动视频，实现对目标三维人体的实时三维重建，并渲染出任意视角下的图像，实现具有三维立体感的呈现效果。To this end, the present invention proposes a real-time dynamic human body new perspective rendering method based on multi-view videos, which can achieve real-time three-dimensional reconstruction of the target three-dimensional human body through multi-view human body movement videos taken by users, and render out any perspective of the human body. images to achieve a three-dimensional presentation effect.

本发明的另一个目的在于提出一种基于多视角视频的实时动态人体新视角渲染系统。Another object of the present invention is to provide a real-time dynamic human body new perspective rendering system based on multi-perspective video.

为达上述目的，本发明一方面提出一种基于多视角视频的实时动态人体新视角渲染方法，包括：In order to achieve the above objectives, on the one hand, the present invention proposes a real-time dynamic human body new perspective rendering method based on multi-view videos, including:

基于多视角的人体图像信息估计人体的姿态参数；Estimating the posture parameters of the human body based on multi-view human body image information;

基于所述姿态参数构建人体的三维几何场和纹理特征场；Construct a three-dimensional geometric field and texture feature field of the human body based on the posture parameters;

基于隐式神经网络对拍摄视角图像进行体渲染得到拍摄视角渲染图像，并构建所述拍摄视角渲染图像与所述拍摄视角图像之间的一致性约束，以将所述三维几何场和纹理特征场作为优化的变量得到优化后的纹理特征场；Based on the implicit neural network, the shooting perspective image is volume rendered to obtain the shooting perspective rendering image, and a consistency constraint between the shooting perspective rendering image and the shooting perspective image is constructed to combine the three-dimensional geometric field and the texture feature field. The optimized texture feature field is obtained as an optimized variable;

基于所述优化后的纹理特征场和所述人体图像信息进行人体新视角渲染以得到新视角渲染图像。Based on the optimized texture feature field and the human body image information, a new perspective rendering of the human body is performed to obtain a new perspective rendering image.

本发明实施例的基于多视角视频的实时动态人体新视角渲染方法还可以具有以下附加技术特征：The real-time dynamic human body new perspective rendering method based on multi-view videos according to the embodiment of the present invention may also have the following additional technical features:

在本发明的一个实施例中，基于多视角的人体图像信息估计人体的姿态参数，包括：In one embodiment of the present invention, estimating posture parameters of a human body based on multi-view human body image information includes:

获取多视角的人体图像信息；Obtain multi-view human body image information;

利用二维人体姿态估计工具计算人体图像信息中人体关节点的二维坐标；Use two-dimensional human posture estimation tools to calculate the two-dimensional coordinates of human joint points in human body image information;

基于所述人体关节点的二维坐标和人体多视图几何信息求解三维人体姿态Solve the three-dimensional human body posture based on the two-dimensional coordinates of the human body joint points and the multi-view geometric information of the human body

在本发明的一个实施例中，基于所述姿态参数构建人体的三维几何场和纹理特征场，包括：In one embodiment of the present invention, the three-dimensional geometric field and texture feature field of the human body are constructed based on the posture parameters, including:

构建用于表征三维几何场和纹理特征场的三维体素；Constructing three-dimensional voxels for representing three-dimensional geometric fields and texture feature fields;

利用所述三维体素表示所述三维人体姿态以得到标准空间下的人体几何信息和人体表面纹理特征。The three-dimensional voxels are used to represent the three-dimensional human body posture to obtain human body geometric information and human body surface texture characteristics in standard space.

在本发明的一个实施例中，所述三维几何场记录当前三维体素的有向距离函数值，即当前三维体素位置上距离人体表面最近点的距离值；对于人体内部的三维体素，距离值为负，对于人体外部的三维体素，距离值为正。In one embodiment of the present invention, the three-dimensional geometric field records the signed distance function value of the current three-dimensional voxel, that is, the distance value of the current three-dimensional voxel position to the nearest point on the human body surface; for the three-dimensional voxels inside the human body, the distance value is negative, and for the three-dimensional voxels outside the human body, the distance value is positive.

在本发明的一个实施例中，基于隐式神经网络对拍摄视角图像进行体渲染得到拍摄视角渲染图像，包括：In one embodiment of the present invention, volume rendering is performed on the shooting perspective image based on an implicit neural network to obtain a shooting perspective rendering image, including:

对拍摄视角图像的一个像素所在方向投射一条光线，采样若干光线点，并通过隐式神经网络计算光线点的颜色值；Project a ray in the direction of a pixel of the shooting perspective image, sample a number of light points, and calculate the color value of the light point through an implicit neural network;

通过查询所述三维几何场得到所述光线点的密度信息；Obtain the density information of the light point by querying the three-dimensional geometric field;

基于所述光线点的颜色值和密度信息进行加权积分得到像素的颜色数据，以得到拍摄视角渲染图像。Weighted integration is performed based on the color value and density information of the light points to obtain the color data of the pixels to obtain the shooting perspective rendering image.

为达上述目的，本发明另一方面提出一种基于多视角视频的实时动态人体新视角渲染系统，包括：To achieve the above object, the present invention further provides a real-time dynamic human body new perspective rendering system based on multi-perspective video, comprising:

姿态参数估计模块，用于基于多视角的人体图像信息估计人体的姿态参数；The posture parameter estimation module is used to estimate the posture parameters of the human body based on multi-view human body image information;

人体特征表征模块，用于基于所述姿态参数构建人体的三维几何场和纹理特征场；A human body feature characterization module, used to construct a three-dimensional geometric field and texture feature field of the human body based on the posture parameters;

特征变量优化模块，用于基于隐式神经网络对拍摄视角图像进行体渲染得到拍摄视角渲染图像，并构建所述拍摄视角渲染图像与所述拍摄视角图像之间的一致性约束，以将所述三维几何场和纹理特征场作为优化的变量得到优化后的纹理特征场；A feature variable optimization module is used to perform volume rendering on the shooting perspective image based on an implicit neural network to obtain a shooting perspective rendering image, and to construct a consistency constraint between the shooting perspective rendering image and the shooting perspective image to reduce the The three-dimensional geometric field and texture feature field are used as optimized variables to obtain the optimized texture feature field;

视角图像渲染模块，用于基于所述优化后的纹理特征场和所述人体图像信息进行人体新视角渲染以得到新视角渲染图像。A perspective image rendering module is used to perform a new perspective rendering of the human body based on the optimized texture feature field and the human body image information to obtain a new perspective rendering image.

本发明实施例的基于多视角视频的实时动态人体新视角渲染方法和系统，能够通过用户拍摄的多视角人体运动视频，实现对目标三维人体的实时三维重建，并渲染出任意视角下的图像，实现具有三维立体感的呈现效果。The real-time dynamic human body new perspective rendering method and system based on multi-view videos according to the embodiment of the present invention can realize real-time three-dimensional reconstruction of the target three-dimensional human body through multi-view human body movement videos taken by users, and render images from any perspective. Achieve a three-dimensional presentation effect.

本发明附加的方面和优点将在下面的描述中部分给出，部分将从下面的描述中变得明显，或通过本发明的实践了解到。Additional aspects and advantages of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

本发明上述的和/或附加的方面和优点从下面结合附图对实施例的描述中将变得明显和容易理解，其中：The above and/or additional aspects and advantages of the present invention will become apparent and easily understood from the following description of the embodiments in conjunction with the accompanying drawings, in which:

图1是根据本发明实施例的基于多视角视频的实时动态人体新视角渲染方法的流程图；Figure 1 is a flow chart of a real-time dynamic human body new perspective rendering method based on multi-view videos according to an embodiment of the present invention;

图2是根据本发明实施例的基于多视角视频的实时动态人体新视角渲染的网络架构示意图；Figure 2 is a schematic network architecture diagram of real-time dynamic human body new perspective rendering based on multi-view videos according to an embodiment of the present invention;

图3是根据本发明实施例的基于多视角视频的实时动态人体新视角渲染系统的结构示意图。Figure 3 is a schematic structural diagram of a real-time dynamic human body new perspective rendering system based on multi-view videos according to an embodiment of the present invention.

具体实施方式Detailed ways

需要说明的是，在不冲突的情况下，本发明中的实施例及实施例中的特征可以相互组合。下面将参考附图并结合实施例来详细说明本发明。It should be noted that, in the absence of conflict, the embodiments of the present invention and the features in the embodiments can be combined with each other. The present invention will be described in detail below with reference to the accompanying drawings and in combination with the embodiments.

为了使本技术领域的人员更好地理解本发明方案，下面将结合本发明实施例中的附图，对本发明实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例仅仅是本发明一部分的实施例，而不是全部的实施例。基于本发明中的实施例，本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例，都应当属于本发明保护的范围。In order to enable those skilled in the art to better understand the solutions of the present invention, the technical solutions in the embodiments of the present invention will be clearly and completely described below in conjunction with the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only These are some embodiments of the present invention, rather than all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative efforts should fall within the scope of protection of the present invention.

下面参照附图描述根据本发明实施例提出的基于多视角视频的实时动态人体新视角渲染方法和系统。The real-time dynamic human body new perspective rendering method and system based on multi-view videos proposed according to the embodiments of the present invention will be described below with reference to the accompanying drawings.

图1是本发明实施例的基于多视角视频的实时动态人体新视角渲染方法的流程图。Figure 1 is a flow chart of a real-time dynamic human body new perspective rendering method based on multi-view videos according to an embodiment of the present invention.

如图1所示，该方法包括但不限于以下步骤：As shown in Figure 1, the method includes but is not limited to the following steps:

S1，基于多视角的人体图像信息估计人体的姿态参数。S1, estimate the posture parameters of the human body based on multi-view human body image information.

可以理解的是，本发明通过多视角人体运动视频估计出人体的姿态参数。It can be understood that the present invention estimates the posture parameters of the human body through multi-view human movement videos.

具体地，获取多视角的人体图像信息，利用二维人体姿态估计工具计算人体图像信息中人体关节点的二维坐标，基于人体关节点的二维坐标和人体多视图几何信息求解三维人体姿态。Specifically, multi-view human body image information is obtained, a two-dimensional human body posture estimation tool is used to calculate the two-dimensional coordinates of human joint points in the human body image information, and the three-dimensional human body posture is solved based on the two-dimensional coordinates of human joint points and the multi-view geometric information of the human body.

在本发明的一个实施例中，本发明使用二维人体姿态估计工具OpenPose计算人体关节点的二维坐标，再结合多视图几何信息求解三维人体姿态。In one embodiment of the present invention, the present invention uses the two-dimensional human body posture estimation tool OpenPose to calculate the two-dimensional coordinates of human body joints, and then combines multi-view geometric information to solve the three-dimensional human body posture.

在本发明的一个实施例中，求解动态人体的姿态有利于将不同姿态的人体对齐到如图2所示的标准的“大”字形空间下，以便在统一的标准空间下融合不同人体姿态下的信息。In one embodiment of the present invention, solving the posture of the dynamic human body is beneficial to aligning the human body in different postures into the standard "big" glyph space as shown in Figure 2, so as to integrate different human postures in a unified standard space. Information.

S2，基于姿态参数构建人体的三维几何场和纹理特征场。S2, construct the three-dimensional geometric field and texture feature field of the human body based on the posture parameters.

可以理解的是，本发明进一步构建人体三维几何场和纹理特征场。It can be understood that the present invention further constructs a three-dimensional geometric field and texture feature field of the human body.

具体地，构建用于表征三维几何场和纹理特征场的三维体素，利用三维体素表示三维人体姿态以得到标准空间下的人体几何信息和人体表面纹理特征。Specifically, three-dimensional voxels are constructed to represent the three-dimensional geometric field and texture feature field, and the three-dimensional voxels are used to represent the three-dimensional human body posture to obtain human body geometric information and human body surface texture features in standard space.

在本发明的一个实施例中，为了实现对三维人体几何形状和表面纹理的建模，本发明引入三维几何场和纹理特征场，二者均使用三维体素的形式存储，且均在人体“大”字形的标准空间下表示。In one embodiment of the present invention, in order to achieve modeling of three-dimensional human body geometry and surface texture, the present invention introduces a three-dimensional geometric field and a texture feature field, both of which are stored in the form of three-dimensional voxels and are represented in the standard space of the human body's "big" shape.

其中，三维几何场记录每个体素的有向距离函数值，即该体素位置上距离人体表面最近点的距离值，对于人体内部的体素，该距离值为负，对于人体外部的体素，该距离值为正。通过该三维几何场即可确定标准空间下的人体几何。Among them, the three-dimensional geometric field records the directional distance function value of each voxel, that is, the distance value of the voxel position from the closest point to the human body surface. For voxels inside the human body, the distance value is negative, and for voxels outside the human body, the distance value is negative. , the distance value is positive. Through this three-dimensional geometric field, the human body geometry in standard space can be determined.

进一步地，纹理特征场存储人体表面的纹理特征，该特征用于辅助进行新视角图像渲染。Furthermore, the texture feature field stores the texture features of the human body surface, which are used to assist in new perspective image rendering.

S3，基于隐式神经网络对拍摄视角图像进行体渲染得到拍摄视角渲染图像，并构建所述拍摄视角渲染图像与所述拍摄视角图像之间的一致性约束，以将三维几何场和纹理特征场作为优化的变量得到优化后的纹理特征场。S3. Perform volume rendering on the shooting perspective image based on the implicit neural network to obtain the shooting perspective rendering image, and construct a consistency constraint between the shooting perspective rendering image and the shooting perspective image to combine the three-dimensional geometric field and the texture feature field. The optimized texture feature field is obtained as an optimized variable.

具体地，本发明在得到三维几何场和纹理特征场后，本发明通过一个隐式神经网络，在拍摄视角下进行体渲染，得到一系列渲染图像。Specifically, after the present invention obtains the three-dimensional geometric field and texture feature field, the present invention uses an implicit neural network to perform volume rendering under the shooting perspective to obtain a series of rendered images.

在本发明的一个实施例中，针对图像中的一个像素，本发明针对该像素所在方向投射一条光线，并采样若干光线点，通过一个隐式神经网络，查询光线点的颜色值。同时，可以通过查询三维几何场得到光线点的密度。结合密度和颜色，可以进行加权积分，得到该像素的颜色。此处的隐式神经网络输入光线点处的纹理特征，输出该光线点的颜色。In one embodiment of the present invention, for a pixel in the image, the present invention projects a light ray in the direction of the pixel, samples several light points, and queries the color value of the light point through an implicit neural network. At the same time, the density of light points can be obtained by querying the three-dimensional geometric field. Combining density and color, a weighted integration can be performed to obtain the color of the pixel. The implicit neural network here inputs the texture features at the ray point and outputs the color of the ray point.

到渲染图像后，可以构建渲染图像与拍摄图像之间的一致性约束，要求两张图像的逐像素L2误差尽可能小，并将三维几何场和纹理特征场作为可优化的变量，即可实现对二者的迭代优化。After rendering the image, you can construct a consistency constraint between the rendered image and the captured image, requiring the pixel-by-pixel L2 error of the two images to be as small as possible, and using the three-dimensional geometric field and texture feature field as optimizable variables to achieve Iterative optimization of both.

S4，基于优化后的纹理特征场和所述人体图像信息进行人体新视角渲染以得到新视角渲染图像。S4, performing new perspective rendering of the human body based on the optimized texture feature field and the human body image information to obtain a new perspective rendering image.

具体地，为了实现更加高质量的新视角渲染效果，本发明进一步将步骤S3中优化得到的纹理特征场与输入图像中的特征结合，进行图像渲染。Specifically, in order to achieve a higher-quality new perspective rendering effect, the present invention further combines the texture feature field optimized in step S3 with the features in the input image to perform image rendering.

可以理解的是，通过多帧融合优化得到的纹理特征场提供的信息相对完整，但缺乏纹理细节；另一方面，输入的多视角图像特征中包含较多的纹理细节，但存在部分未被观察到的区域，这些区域的信息有所缺失。It is understandable that the texture feature field obtained through multi-frame fusion optimization provides relatively complete information, but lacks texture details; on the other hand, the input multi-view image features contain more texture details, but some of them have not been observed. The information in these areas is missing.

可以理解的是，本发明结合二者的优势，通过一个隐式神经网络，输入光线点处的纹理特征和投影得到的图像特征，输出该光线点的颜色，进行人体图像渲染。It can be understood that the present invention combines the advantages of both, inputs the texture features at the light point and the image features obtained by projection through an implicit neural network, outputs the color of the light point, and renders the human body image.

至此，本发明实现了对目标三维人体的实时重建，并支持具有立体感的新视角渲染。So far, the present invention has achieved real-time reconstruction of the target three-dimensional human body and supports rendering from a new perspective with a three-dimensional sense.

图2为本发明的架构图，如图2所示，通过多视角人体运动视频估计人体的姿态参数。构建人体三维几何场和纹理特征场。优化人体三维几何场和纹理特征场。结合输入图像信息进行实时渲染。本发明使用多视角人体运动视频作为输入，实现了实时动态三维人体重建，并且支持具体三维立体感的实时新视角视频渲染。Figure 2 is an architecture diagram of the present invention. As shown in Figure 2, the posture parameters of the human body are estimated through multi-view human movement videos. Construct the three-dimensional geometric field and texture feature field of the human body. Optimize the three-dimensional geometric field and texture feature field of the human body. Real-time rendering is performed based on input image information. The present invention uses multi-view human body motion video as input, realizes real-time dynamic three-dimensional human body reconstruction, and supports real-time new-view video rendering with a specific three-dimensional sense.

本发明实施例的基于多视角视频的实时动态人体新视角渲染方法，能够通过用户拍摄的多视角人体运动视频，实现对目标三维人体的实时三维重建，并渲染出任意视角下的图像，实现具有三维立体感的呈现效果。The real-time dynamic human body new perspective rendering method based on multi-perspective video of the embodiment of the present invention can realize real-time three-dimensional reconstruction of the target three-dimensional human body through the multi-perspective human body movement video shot by the user, and render the image at any perspective, thereby achieving a presentation effect with a three-dimensional stereoscopic sense.

为了实现上述实施例，如图3所示，本实施例中还提供了基于多视角视频的实时动态人体新视角渲染系统10，该系统10包括，姿态参数估计模块100、人体特征表征模块200、特征变量优化模块300和视角图像渲染模块400；In order to implement the above embodiment, as shown in Figure 3, this embodiment also provides a real-time dynamic human body new perspective rendering system 10 based on multi-view video. The system 10 includes a posture parameter estimation module 100, a human body feature characterization module 200, Feature variable optimization module 300 and perspective image rendering module 400;

姿态参数估计模块100，用于基于多视角的人体图像信息估计人体的姿态参数；The posture parameter estimation module 100 is used to estimate the posture parameters of the human body based on multi-view human body image information;

人体特征表征模块200，用于基于姿态参数构建人体的三维几何场和纹理特征场；The human body feature representation module 200 is used to construct the three-dimensional geometric field and texture feature field of the human body based on posture parameters;

特征变量优化模块300，用于基于隐式神经网络对拍摄视角图像进行体渲染得到拍摄视角渲染图像，并构建拍摄视角渲染图像与拍摄视角图像之间的一致性约束，以将三维几何场和纹理特征场作为优化的变量得到优化后的纹理特征场；The feature variable optimization module 300 is used to perform volume rendering on the shooting perspective image based on the implicit neural network to obtain the shooting perspective rendering image, and construct a consistency constraint between the shooting perspective rendering image and the shooting perspective image to combine the three-dimensional geometric field and texture The feature field is used as an optimized variable to obtain the optimized texture feature field;

视角图像渲染模块400，用于基于优化后的纹理特征场和人体图像信息进行人体新视角渲染以得到新视角渲染图像。The perspective image rendering module 400 is used to render the human body from a new perspective based on the optimized texture feature field and human body image information to obtain a new perspective rendering image.

进一步地，姿态参数估计模块100，还用于：Further, the attitude parameter estimation module 100 is also used to:

基于人体关节点的二维坐标和人体多视图几何信息求解三维人体姿态。Solve the 3D human posture based on the 2D coordinates of human joints and multi-view geometric information of the human body.

进一步地，人体特征表征模块200，还用于：Furthermore, the human body feature characterization module 200 is also used for:

构建用于表征三维几何场和纹理特征场的三维体素；Construct three-dimensional voxels used to represent three-dimensional geometric fields and texture feature fields;

利用三维体素表示所述三维人体姿态以得到标准空间下的人体几何信息和人体表面纹理特征。Three-dimensional voxels are used to represent the three-dimensional human body posture to obtain human body geometric information and human body surface texture characteristics in standard space.

进一步地，三维几何场记录当前三维体素的有向距离函数值，即当前三维体素位置上距离人体表面最近点的距离值；对于人体内部的三维体素，距离值为负，对于人体外部的三维体素，距离值为正。Furthermore, the three-dimensional geometric field records the directed distance function value of the current three-dimensional voxel, that is, the distance value of the current three-dimensional voxel position from the closest point to the human body surface; for the three-dimensional voxels inside the human body, the distance value is negative, and for the three-dimensional voxels outside the human body, the distance value is negative. 3D voxels, the distance value is positive.

进一步地，特征变量优化模块300，还用于：Further, the feature variable optimization module 300 is also used to:

通过查询三维几何场得到光线点的密度信息；Get the density information of light points by querying the three-dimensional geometric field;

基于光线点的颜色值和密度信息进行加权积分得到像素的颜色数据，以得到拍摄视角渲染图像。Based on the color value and density information of the light points, weighted integration is performed to obtain the color data of the pixels to obtain the shooting perspective rendering image.

本发明实施例的基于多视角视频的实时动态人体新视角渲染系统，能够通过用户拍摄的多视角人体运动视频，实现对目标三维人体的实时三维重建，并渲染出任意视角下的图像，实现具有三维立体感的呈现效果。The real-time dynamic human body new perspective rendering system based on multi-view videos according to the embodiment of the present invention can realize real-time three-dimensional reconstruction of the target three-dimensional human body through multi-view human body movement videos taken by users, and render images from any viewing angle, achieving the goal of Three-dimensional rendering effect.

在本说明书的描述中，参考术语“一个实施例”、“一些实施例”、“示例”、“具体示例”、或“一些示例”等的描述意指结合该实施例或示例描述的具体特征、结构、材料或者特点包含于本发明的至少一个实施例或示例中。在本说明书中，对上述术语的示意性表述不必须针对的是相同的实施例或示例。而且，描述的具体特征、结构、材料或者特点可以在任一个或多个实施例或示例中以合适的方式结合。此外，在不相互矛盾的情况下，本领域的技术人员可以将本说明书中描述的不同实施例或示例以及不同实施例或示例的特征进行结合和组合。In the description of this specification, reference to the terms "one embodiment," "some embodiments," "an example," "specific examples," or "some examples" or the like means that specific features are described in connection with the embodiment or example. , structures, materials or features are included in at least one embodiment or example of the invention. In this specification, the schematic expressions of the above terms are not necessarily directed to the same embodiment or example. Furthermore, the specific features, structures, materials or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, those skilled in the art may combine and combine different embodiments or examples and features of different embodiments or examples described in this specification unless they are inconsistent with each other.

此外，术语“第一”、“第二”仅用于描述目的，而不能理解为指示或暗示相对重要性或者隐含指明所指示的技术特征的数量。由此，限定有“第一”、“第二”的特征可以明示或者隐含地包括至少一个该特征。在本发明的描述中，“多个”的含义是至少两个，例如两个，三个等，除非另有明确具体的限定。In addition, the terms “first” and “second” are used for descriptive purposes only and cannot be understood as indicating or implying relative importance or implicitly indicating the quantity of indicated technical features. Therefore, features defined as "first" and "second" may explicitly or implicitly include at least one of these features. In the description of the present invention, "plurality" means at least two, such as two, three, etc., unless otherwise expressly and specifically limited.

Claims

1. A real-time dynamic human body new perspective rendering method based on multi-perspective video, characterized in that the method comprises the following steps:

Estimate human body posture parameters based on multi-view human image information;

Construct a three-dimensional geometric field and texture feature field of the human body based on the posture parameters;

Based on the implicit neural network, the shooting perspective image is volume rendered to obtain the shooting perspective rendering image, and a consistency constraint between the shooting perspective rendering image and the shooting perspective image is constructed to combine the three-dimensional geometric field and the texture feature field. The optimized texture feature field is obtained as an optimized variable;

A new perspective rendering of the human body is performed based on the optimized texture feature field and the human body image information to obtain a new perspective rendering image.

2. The method according to claim 1, characterized in that estimating posture parameters of a human body based on multi-view human body image information comprises:

Obtain multi-view human body image information;

Use two-dimensional human posture estimation tools to calculate the two-dimensional coordinates of human joint points in human body image information;

The three-dimensional human body posture is solved based on the two-dimensional coordinates of the human body joint points and the multi-view geometric information of the human body.

3. The method according to claim 2, characterized in that constructing the three-dimensional geometric field and texture feature field of the human body based on the posture parameters includes:

Constructing three-dimensional voxels for representing three-dimensional geometric fields and texture feature fields;

The three-dimensional voxels are used to represent the three-dimensional human body posture to obtain human body geometric information and human body surface texture characteristics in standard space.

4. The method according to claim 3, characterized in that the three-dimensional geometric field records the directional distance function value of the current three-dimensional voxel, that is, the distance value of the current three-dimensional voxel position from the closest point to the human body surface; for the human body For internal three-dimensional voxels, the distance value is negative, and for three-dimensional voxels outside the human body, the distance value is positive.

5. The method according to claim 4, characterized in that, performing volume rendering on the shooting perspective image based on an implicit neural network to obtain the shooting perspective rendering image, including:

Project a ray in the direction of a pixel of the shooting perspective image, sample a number of light points, and calculate the color value of the light point through an implicit neural network;

Obtain the density information of the light point by querying the three-dimensional geometric field;

Weighted integration is performed based on the color value and density information of the light points to obtain the color data of the pixels to obtain the shooting perspective rendering image.

6. A real-time dynamic human body new perspective rendering system based on multi-view videos, which is characterized by:

The posture parameter estimation module is used to estimate the posture parameters of the human body based on multi-view human body image information;

A human body feature characterization module, used to construct a three-dimensional geometric field and texture feature field of the human body based on the posture parameters;

A feature variable optimization module is used to perform volume rendering on the shooting perspective image based on an implicit neural network to obtain a shooting perspective rendering image, and to construct a consistency constraint between the shooting perspective rendering image and the shooting perspective image to reduce the The three-dimensional geometric field and texture feature field are used as optimized variables to obtain the optimized texture feature field;

A perspective image rendering module is used to perform a new perspective rendering of the human body based on the optimized texture feature field and the human body image information to obtain a new perspective rendering image.

7. The system according to claim 6, characterized in that the posture parameter estimation module is further used for:

Obtain multi-view human body image information;

8. The system according to claim 7, characterized in that the human body feature characterization module is also used for:

Construct three-dimensional voxels used to represent three-dimensional geometric fields and texture feature fields;

The three-dimensional voxels are used to represent the three-dimensional human posture to obtain human body geometry information and human body surface texture features in a standard space.

9. The system according to claim 8, characterized in that the three-dimensional geometric field records the directional distance function value of the current three-dimensional voxel, that is, the distance value of the current three-dimensional voxel position from the closest point to the human body surface; for the human body For internal three-dimensional voxels, the distance value is negative, and for three-dimensional voxels outside the human body, the distance value is positive.

10. The system according to claim 9, characterized in that the feature variable optimization module is also used to:

Obtaining density information of the light point by querying the three-dimensional geometric field;