CN117788672A - Real-time dynamic human body new view angle rendering method and system based on multi-view video - Google Patents

Real-time dynamic human body new view angle rendering method and system based on multi-view video Download PDF

Info

Publication number
CN117788672A
CN117788672A CN202311767083.2A CN202311767083A CN117788672A CN 117788672 A CN117788672 A CN 117788672A CN 202311767083 A CN202311767083 A CN 202311767083A CN 117788672 A CN117788672 A CN 117788672A
Authority
CN
China
Prior art keywords
human body
dimensional
image
field
rendering
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311767083.2A
Other languages
Chinese (zh)
Inventor
徐枫
林文镔
雍俊海
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tsinghua University
Original Assignee
Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tsinghua University filed Critical Tsinghua University
Priority to CN202311767083.2A priority Critical patent/CN117788672A/en
Publication of CN117788672A publication Critical patent/CN117788672A/en
Pending legal-status Critical Current

Links

Landscapes

  • Processing Or Creating Images (AREA)

Abstract

The invention discloses a real-time dynamic human body new view angle rendering method and a system based on multi-view video, wherein the method estimates the posture parameters of a human body based on multi-view human body image information; constructing a three-dimensional geometric field and a texture characteristic field of the human body based on the gesture parameters; performing volume rendering on a shooting visual angle image based on an implicit neural network to obtain a shooting visual angle rendering image, and constructing consistency constraint between the shooting visual angle rendering image and the shooting visual angle image so as to obtain an optimized texture feature field by taking a three-dimensional geometric field and the texture feature field as optimized variables; and performing new view angle rendering of the human body based on the optimized texture feature field and the human body image information to obtain a new view angle rendering image. The invention can realize the new visual angle rendering of the dynamic three-dimensional human body with stereoscopic impression.

Description

Real-time dynamic human body new view angle rendering method and system based on multi-view video
Technical Field
The invention relates to the technical field of computer vision and computer graphics, in particular to a real-time dynamic human new view angle rendering method and system based on multi-view video.
Background
In production and life, a real-time remote video call technology between people has been widely used. With the continuous improvement of people's pursuit of sense of immersion and experience of remote communication, the realization of three-dimensional, real-time remote communication that can freely change the visual angle becomes new technical demand. The dynamic three-dimensional human body reconstruction technology has wide application prospect and important application value in the fields of virtual reality, augmented reality, remote communication, video animation and the like. In practical applications, people often have a real-time communication requirement, and compared with a conventional two-dimensional video, a three-dimensional video with a stereoscopic effect can bring more immersive experience. How to implement new view rendering is therefore a blank in the current art.
Disclosure of Invention
The present invention aims to solve at least one of the technical problems in the related art to some extent.
Therefore, the invention provides a real-time dynamic human body new view angle rendering method based on multi-view video, which can realize real-time three-dimensional reconstruction of a target three-dimensional human body through the multi-view human body motion video shot by a user, render images under any view angle and realize a rendering effect with three-dimensional stereoscopic impression.
Another object of the present invention is to provide a real-time dynamic human new view angle rendering system based on multi-view video.
In order to achieve the above objective, in one aspect, the present invention provides a method for rendering a real-time dynamic human new view angle based on a multi-view video, comprising:
estimating posture parameters of a human body based on multi-view human body image information;
constructing a three-dimensional geometric field and a texture characteristic field of the human body based on the gesture parameters;
performing volume rendering on a shooting visual angle image based on an implicit neural network to obtain a shooting visual angle rendering image, and constructing consistency constraint between the shooting visual angle rendering image and the shooting visual angle image so as to obtain an optimized texture feature field by taking the three-dimensional geometric field and the texture feature field as optimized variables;
and performing new view angle rendering of the human body based on the optimized texture feature field and the human body image information to obtain a new view angle rendering image.
The real-time dynamic human new view angle rendering method based on the multi-view video provided by the embodiment of the invention can also have the following additional technical characteristics:
in one embodiment of the present invention, estimating a pose parameter of a human body based on multi-view human body image information includes:
acquiring multi-view human body image information;
calculating two-dimensional coordinates of a human body joint point in the human body image information by using a two-dimensional human body posture estimation tool;
solving three-dimensional human body posture based on two-dimensional coordinates of human body joint points and human body multi-view geometric information
In one embodiment of the invention, constructing a three-dimensional geometric field and a texture feature field of a human body based on the pose parameters comprises:
constructing three-dimensional voxels for characterizing the three-dimensional geometric field and the texture feature field;
and representing the three-dimensional human body gesture by using the three-dimensional voxels so as to obtain human body geometric information and human body surface texture characteristics in a standard space.
In one embodiment of the present invention, the three-dimensional geometric field records a directional distance function value of a current three-dimensional voxel, that is, a distance value of a closest point to the surface of the human body on the current three-dimensional voxel position; the distance value is negative for three-dimensional voxels inside the human body and positive for three-dimensional voxels outside the human body.
In one embodiment of the present invention, performing volume rendering on a shooting view image based on an implicit neural network to obtain a shooting view rendered image includes:
projecting a light ray in the direction of a pixel of a photographed visual angle image, sampling a plurality of light ray points, and calculating the color value of the light ray points through an implicit neural network;
obtaining density information of the light points by inquiring the three-dimensional geometric field;
and carrying out weighted integration based on the color value and the density information of the light points to obtain the color data of the pixels so as to obtain a shooting visual angle rendering image.
In order to achieve the above object, another aspect of the present invention provides a real-time dynamic human body new view angle rendering system based on multi-view video, comprising:
the gesture parameter estimation module is used for estimating gesture parameters of a human body based on multi-view human body image information;
the human body characteristic characterization module is used for constructing a three-dimensional geometric field and a texture characteristic field of the human body based on the gesture parameters;
the feature variable optimization module is used for performing volume rendering on the shooting visual angle image based on the implicit neural network to obtain a shooting visual angle rendering image, and constructing consistency constraint between the shooting visual angle rendering image and the shooting visual angle image so as to obtain an optimized texture feature field by taking the three-dimensional geometric field and the texture feature field as optimized variables;
and the visual angle image rendering module is used for performing new visual angle rendering of the human body based on the optimized texture feature field and the human body image information so as to obtain a new visual angle rendering image.
According to the real-time dynamic human body new view angle rendering method and system based on the multi-view video, disclosed by the embodiment of the invention, the real-time three-dimensional reconstruction of a target three-dimensional human body can be realized through the multi-view human body motion video shot by a user, the image under any view angle is rendered, and the rendering effect with three-dimensional sense is realized.
Additional aspects and advantages of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.
Drawings
The foregoing and/or additional aspects and advantages of the invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings, in which:
FIG. 1 is a flow chart of a method for real-time dynamic human new view rendering based on multi-view video according to an embodiment of the present invention;
FIG. 2 is a network architecture diagram of real-time dynamic human new view rendering based on multi-view video according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of a real-time dynamic human new view angle rendering system based on multi-view video according to an embodiment of the present invention.
Detailed Description
It should be noted that, without conflict, the embodiments of the present invention and features of the embodiments may be combined with each other. The invention will be described in detail below with reference to the drawings in connection with embodiments.
In order that those skilled in the art will better understand the present invention, a technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, shall fall within the scope of the present invention.
The method and the system for rendering the real-time dynamic human new view angle based on the multi-view video according to the embodiment of the invention are described below with reference to the accompanying drawings.
Fig. 1 is a flowchart of a real-time dynamic human new view rendering method based on multi-view video according to an embodiment of the present invention.
As shown in fig. 1, the method includes, but is not limited to, the steps of:
s1, estimating the posture parameters of a human body based on multi-view human body image information.
It can be understood that the invention estimates the posture parameters of the human body through the multi-view human body motion video.
Specifically, multi-view human body image information is acquired, two-dimensional coordinates of human body joint points in the human body image information are calculated by using a two-dimensional human body posture estimation tool, and three-dimensional human body postures are solved based on the two-dimensional coordinates of the human body joint points and the human body multi-view geometric information.
In one embodiment of the invention, the two-dimensional human body posture estimation tool OpenPose is used for calculating the two-dimensional coordinates of the human body joint points, and then the three-dimensional human body posture is solved by combining the multi-view geometric information.
In one embodiment of the invention, solving the pose of a dynamic human body facilitates aligning the human body of different poses under a standard "large" font space as shown in FIG. 2, so as to fuse information under different human body poses under a unified standard space.
S2, constructing a three-dimensional geometric field and a texture characteristic field of the human body based on the gesture parameters.
It is understood that the present invention further constructs three-dimensional geometric and textural fields of the human body.
Specifically, three-dimensional voxels for representing three-dimensional geometric fields and texture feature fields are constructed, and three-dimensional human body gestures are represented by the three-dimensional voxels to obtain human body geometric information and human body surface texture features in a standard space.
In one embodiment of the invention, to enable modeling of three-dimensional body geometry and surface texture, the invention introduces three-dimensional geometry fields and texture feature fields, both stored in the form of three-dimensional voxels, and both represented under the standard space of the body's "large" font.
Wherein the three-dimensional geometric field records the directed distance function value of each voxel, namely the distance value of the closest point to the surface of the human body on the voxel position, the distance value is negative for voxels inside the human body, and positive for voxels outside the human body. The human body geometry in the standard space can be determined through the three-dimensional geometric field.
Further, the texture feature field stores texture features of the human surface that are used to assist in rendering new corner images.
And S3, performing volume rendering on the shooting visual angle image based on the implicit neural network to obtain a shooting visual angle rendering image, and constructing consistency constraint between the shooting visual angle rendering image and the shooting visual angle image so as to obtain an optimized texture feature field by taking the three-dimensional geometric field and the texture feature field as optimized variables.
Specifically, after the three-dimensional geometric field and the texture characteristic field are obtained, the invention performs volume rendering under a shooting visual angle through an implicit neural network to obtain a series of rendering images.
In one embodiment of the present invention, for a pixel in an image, the present invention projects a ray in a direction in which the pixel is located, samples a plurality of ray points, and queries color values of the ray points through an implicit neural network. Meanwhile, the density of the light points can be obtained by inquiring the three-dimensional geometric field. Combining density and color, a weighted integration may be performed to obtain the color of the pixel. The implicit neural network here inputs the texture features at a ray point and outputs the color of that ray point.
After the image is rendered, consistency constraint between the rendered image and the shot image can be constructed, pixel-by-pixel L2 errors of the two images are required to be as small as possible, and the three-dimensional geometric field and the texture characteristic field are used as variables capable of being optimized, so that iterative optimization of the two images can be realized.
And S4, performing new view angle rendering of the human body based on the optimized texture feature field and the human body image information to obtain a new view angle rendering image.
Specifically, in order to achieve a new view angle rendering effect with higher quality, the invention further combines the texture feature field obtained by optimization in step S3 with features in the input image to perform image rendering.
It will be appreciated that the texture feature field obtained by multi-frame fusion optimization provides relatively complete information, but lacks texture details; on the other hand, the input multi-view image features contain more texture details, but there are regions where the information of the regions is not observed, and the information of the regions is missing.
It can be understood that the invention combines the advantages of the two, and the texture features at the light points and the image features obtained by projection are input through an implicit neural network, the colors of the light points are output, and the human body image rendering is carried out.
The invention realizes the real-time reconstruction of the target three-dimensional human body and supports the rendering of the new view angle with the stereoscopic impression.
Fig. 2 is a schematic diagram of the present invention, and as shown in fig. 2, the posture parameters of the human body are estimated through the multi-view human body motion video. And constructing a three-dimensional geometrical field and a texture characteristic field of the human body. Optimizing the three-dimensional geometric field and texture characteristic field of the human body. And carrying out real-time rendering by combining the input image information. The invention uses the multi-view human motion video as input, realizes real-time dynamic three-dimensional human reconstruction, and supports real-time new view video rendering of specific three-dimensional sense.
According to the real-time dynamic human body new view angle rendering method based on the multi-view video, disclosed by the embodiment of the invention, the real-time three-dimensional reconstruction of a target three-dimensional human body can be realized through the multi-view human body motion video shot by a user, and an image under any view angle is rendered, so that a three-dimensional stereoscopic presentation effect is realized.
In order to implement the above embodiment, as shown in fig. 3, a real-time dynamic human body new view angle rendering system 10 based on multi-view video is further provided in this embodiment, where the system 10 includes a pose parameter estimation module 100, a human body feature characterization module 200, a feature variable optimization module 300, and a view image rendering module 400;
a posture parameter estimation module 100 for estimating posture parameters of a human body based on multi-view human body image information;
the human body characteristic characterization module 200 is used for constructing a three-dimensional geometric field and a texture characteristic field of the human body based on the gesture parameters;
the feature variable optimization module 300 is configured to perform volume rendering on the photographed view image based on the implicit neural network to obtain a photographed view rendered image, and construct a consistency constraint between the photographed view rendered image and the photographed view image, so as to obtain an optimized texture feature field by using the three-dimensional geometric field and the texture feature field as optimized variables;
the visual angle image rendering module 400 is configured to perform new visual angle rendering of the human body based on the optimized texture feature field and the human body image information to obtain a new visual angle rendered image.
Further, the attitude parameter estimation module 100 is further configured to:
acquiring multi-view human body image information;
calculating two-dimensional coordinates of a human body joint point in the human body image information by using a two-dimensional human body posture estimation tool;
and solving the three-dimensional human body posture based on the two-dimensional coordinates of the human body joint points and the human body multi-view geometric information.
Further, the human feature characterization module 200 is further configured to:
constructing three-dimensional voxels for characterizing the three-dimensional geometric field and the texture feature field;
and representing the three-dimensional human body gesture by using the three-dimensional voxels to obtain the human body geometric information and the human body surface texture characteristics under the standard space.
Further, the three-dimensional geometric field records the directional distance function value of the current three-dimensional voxel, namely the distance value of the nearest point from the surface of the human body on the current three-dimensional voxel position; the distance value is negative for three-dimensional voxels inside the human body and positive for three-dimensional voxels outside the human body.
Further, the feature variable optimization module 300 is further configured to:
projecting a light ray in the direction of a pixel of a photographed visual angle image, sampling a plurality of light ray points, and calculating the color value of the light ray points through an implicit neural network;
obtaining density information of light points by inquiring the three-dimensional geometric field;
and carrying out weighted integration based on the color value and the density information of the light points to obtain the color data of the pixels so as to obtain a shooting visual angle rendering image.
According to the real-time dynamic human body new view angle rendering system based on the multi-view video, disclosed by the embodiment of the invention, the real-time three-dimensional reconstruction of a target three-dimensional human body can be realized through the multi-view human body motion video shot by a user, and an image under any view angle is rendered, so that a three-dimensional stereoscopic presentation effect is realized.
In the description of the present specification, a description referring to terms "one embodiment," "some embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present invention. In this specification, schematic representations of the above terms are not necessarily directed to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, the different embodiments or examples described in this specification and the features of the different embodiments or examples may be combined and combined by those skilled in the art without contradiction.
Furthermore, the terms "first," "second," and the like, are used for descriptive purposes only and are not to be construed as indicating or implying a relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include at least one such feature. In the description of the present invention, the meaning of "plurality" means at least two, for example, two, three, etc., unless specifically defined otherwise.

Claims (10)

1. A real-time dynamic human new view angle rendering method based on multi-view video, which is characterized by comprising the following steps:
estimating posture parameters of a human body based on multi-view human body image information;
constructing a three-dimensional geometric field and a texture characteristic field of the human body based on the gesture parameters;
performing volume rendering on a shooting visual angle image based on an implicit neural network to obtain a shooting visual angle rendering image, and constructing consistency constraint between the shooting visual angle rendering image and the shooting visual angle image so as to obtain an optimized texture feature field by taking the three-dimensional geometric field and the texture feature field as optimized variables;
and performing new view angle rendering of the human body based on the optimized texture feature field and the human body image information to obtain a new view angle rendering image.
2. The method of claim 1, wherein estimating the pose parameters of the human body based on the multi-view human body image information comprises:
acquiring multi-view human body image information;
calculating two-dimensional coordinates of a human body joint point in the human body image information by using a two-dimensional human body posture estimation tool;
and solving the three-dimensional human body posture based on the two-dimensional coordinates of the human body joint points and the human body multi-view geometric information.
3. The method of claim 2, wherein constructing a three-dimensional geometric field and a texture feature field of the human body based on the pose parameters comprises:
constructing three-dimensional voxels for characterizing the three-dimensional geometric field and the texture feature field;
and representing the three-dimensional human body gesture by using the three-dimensional voxels so as to obtain human body geometric information and human body surface texture characteristics in a standard space.
4. A method according to claim 3, wherein the three-dimensional geometrical field records a directed distance function value of the current three-dimensional voxel, i.e. a distance value from the closest point of the surface of the human body at the current three-dimensional voxel position; the distance value is negative for three-dimensional voxels inside the human body and positive for three-dimensional voxels outside the human body.
5. The method of claim 4, wherein volume rendering the captured perspective image based on the implicit neural network results in a captured perspective rendered image, comprising:
projecting a light ray in the direction of a pixel of a photographed visual angle image, sampling a plurality of light ray points, and calculating the color value of the light ray points through an implicit neural network;
obtaining density information of the light points by inquiring the three-dimensional geometric field;
and carrying out weighted integration based on the color value and the density information of the light points to obtain the color data of the pixels so as to obtain a shooting visual angle rendering image.
6. A real-time dynamic human new view angle rendering system based on multi-view video, comprising:
the gesture parameter estimation module is used for estimating gesture parameters of a human body based on multi-view human body image information;
the human body characteristic characterization module is used for constructing a three-dimensional geometric field and a texture characteristic field of the human body based on the gesture parameters;
the feature variable optimization module is used for performing volume rendering on the shooting visual angle image based on the implicit neural network to obtain a shooting visual angle rendering image, and constructing consistency constraint between the shooting visual angle rendering image and the shooting visual angle image so as to obtain an optimized texture feature field by taking the three-dimensional geometric field and the texture feature field as optimized variables;
and the visual angle image rendering module is used for performing new visual angle rendering of the human body based on the optimized texture feature field and the human body image information so as to obtain a new visual angle rendering image.
7. The system of claim 6, wherein the pose parameter estimation module is further configured to:
acquiring multi-view human body image information;
calculating two-dimensional coordinates of a human body joint point in the human body image information by using a two-dimensional human body posture estimation tool;
and solving the three-dimensional human body posture based on the two-dimensional coordinates of the human body joint points and the human body multi-view geometric information.
8. The system of claim 7, wherein the human feature characterization module is further configured to:
constructing three-dimensional voxels for characterizing the three-dimensional geometric field and the texture feature field;
and representing the three-dimensional human body gesture by using the three-dimensional voxels so as to obtain human body geometric information and human body surface texture characteristics in a standard space.
9. The system of claim 8, wherein the three-dimensional geometric field records a directional distance function value of a current three-dimensional voxel, i.e., a distance value from a nearest point on the surface of the human body at the current three-dimensional voxel position; the distance value is negative for three-dimensional voxels inside the human body and positive for three-dimensional voxels outside the human body.
10. The system of claim 9, wherein the feature variable optimization module is further configured to:
projecting a light ray in the direction of a pixel of a photographed visual angle image, sampling a plurality of light ray points, and calculating the color value of the light ray points through an implicit neural network;
obtaining density information of the light points by inquiring the three-dimensional geometric field;
and carrying out weighted integration based on the color value and the density information of the light points to obtain the color data of the pixels so as to obtain a shooting visual angle rendering image.
CN202311767083.2A 2023-12-20 2023-12-20 Real-time dynamic human body new view angle rendering method and system based on multi-view video Pending CN117788672A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311767083.2A CN117788672A (en) 2023-12-20 2023-12-20 Real-time dynamic human body new view angle rendering method and system based on multi-view video

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311767083.2A CN117788672A (en) 2023-12-20 2023-12-20 Real-time dynamic human body new view angle rendering method and system based on multi-view video

Publications (1)

Publication Number Publication Date
CN117788672A true CN117788672A (en) 2024-03-29

Family

ID=90388445

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311767083.2A Pending CN117788672A (en) 2023-12-20 2023-12-20 Real-time dynamic human body new view angle rendering method and system based on multi-view video

Country Status (1)

Country Link
CN (1) CN117788672A (en)

Similar Documents

Publication Publication Date Title
CN109003325B (en) Three-dimensional reconstruction method, medium, device and computing equipment
CN112465955B (en) Dynamic human body three-dimensional reconstruction and visual angle synthesis method
US8933966B2 (en) Image processing device, image processing method and program
CN102592275B (en) Virtual viewpoint rendering method
RU2215326C2 (en) Image-based hierarchic presentation of motionless and animated three-dimensional object, method and device for using this presentation to visualize the object
CN108022278B (en) Character animation drawing method and system based on motion tracking in video
US20050140670A1 (en) Photogrammetric reconstruction of free-form objects with curvilinear structures
CN101916454A (en) Method for reconstructing high-resolution human face based on grid deformation and continuous optimization
CN111462030A (en) Multi-image fused stereoscopic set vision new angle construction drawing method
Fickel et al. Stereo matching and view interpolation based on image domain triangulation
CN114049464B (en) Reconstruction method and device of three-dimensional model
US20220148207A1 (en) Processing of depth maps for images
CN113781621A (en) Three-dimensional reconstruction processing method, device, equipment and storage medium
CN112862736A (en) Real-time three-dimensional reconstruction and optimization method based on points
Liu et al. Creating simplified 3D models with high quality textures
CN113469886B (en) Image splicing method based on three-dimensional reconstruction
CN113034681B (en) Three-dimensional reconstruction method and device for spatial plane relation constraint
CN107862732B (en) Real-time three-dimensional eyelid reconstruction method and device
CN113989434A (en) Human body three-dimensional reconstruction method and device
CN109859255B (en) Multi-view non-simultaneous acquisition and reconstruction method for large-motion moving object
CN116766596A (en) Character model printing method, character model printing device, electronic equipment and storage medium
CN116801115A (en) Sparse array camera deployment method
CN117788672A (en) Real-time dynamic human body new view angle rendering method and system based on multi-view video
CN116452715A (en) Dynamic human hand rendering method, device and storage medium
CN109816765A (en) Texture towards dynamic scene determines method, apparatus, equipment and medium in real time

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination