CN113096228B

CN113096228B - Real-time illumination estimation and rendering method and system based on neural network

Info

Publication number: CN113096228B
Application number: CN202110639919.5A
Authority: CN
Inventors: 徐迪; 李臻; 毛文涛; 孙立
Original assignee: Shanghai Shadow Creator Information Technology Co Ltd
Current assignee: Shanghai Shadow Creator Information Technology Co Ltd
Priority date: 2021-06-09
Filing date: 2021-06-09
Publication date: 2021-08-31
Anticipated expiration: 2041-06-09
Also published as: CN113096228A

Abstract

The invention provides a real-time illumination estimation and rendering system based on a neural network, which comprises a camera reading module: real-time camera video stream acquisition for end-side devices; the neural network reasoning module: using a lightweight network for generating a result of illumination estimation, and using spherical harmonic wave coefficients to render a virtual object; a virtual object loading module: the system comprises a three-dimensional virtual object loading unit, a three-dimensional virtual object loading unit and a three-dimensional virtual object loading unit, wherein the three-dimensional virtual object is used as an object rendered by an illumination result; the spherical harmonic wave real-time rendering and fusion module comprises: and rendering the virtual object by utilizing the spherical harmonic wave coefficient for realizing cross-platform rendering and fusion. The light-weight neural network is adopted, and the spherical harmonic wave coefficient is calculated according to the video sequence frame, so that real-time and efficient illumination estimation is achieved; meanwhile, the virtual-real fusion rendering module is adopted to render the three-dimensional virtual object in real time and display the three-dimensional virtual object in front of the camera video stream, so that real-time fusion of the virtual object and the real background is realized.

Description

Real-time illumination estimation and rendering method and system based on neural network

Technical Field

The invention relates to the technical field of illumination, in particular to a real-time illumination estimation and rendering method and system based on a neural network.

Background

In the existing illumination estimation technology on the market, one type needs extra hardware to collect illumination information, and the other type needs a neural network with huge parameter quantity to calculate a high dynamic panoramic illumination map, and both the two types are difficult to achieve real-time illumination estimation. In addition, after the illumination information is obtained, the virtual object needs to be rendered, which needs to be parallel to the camera reading module and the neural network reasoning module, however, the computing power of the mobile terminal computing platform is often limited. Therefore, the prior art is difficult to solve the real-time illumination estimation and rendering of the mobile terminal.

Through retrieval, patent document CN103440684A discloses a method for applying spherical harmonic illumination technology to surface rendering, which includes, firstly, performing discrete sampling on each patch in a surface model to convert the patch into a point model; then obtaining a spherical harmonic coefficient group of each point by a spherical harmonic illumination method; generating a group of spherical harmonic coefficient textures for each surface by a texture backfill method according to the point dispersed from each surface and the corresponding spherical harmonic coefficient group; and finally, drawing the surface model by using a surface drawing method, and programming through a GPU rendering pipeline to complete the spherical harmonic illumination simulation. The method has the defects that the spherical harmonic coefficient is obtained by calculating after the surface model is converted into the point model, and then the spherical harmonic coefficient is drawn into the surface model, so that the calculated amount is large, the conversion is needed to obtain related data, the process is complicated, and the efficiency is low.

Therefore, it is necessary to develop a system that involves a small amount of parameters and is capable of performing illumination estimation and rendering in real time.

Disclosure of Invention

Aiming at the defects in the prior art, the invention aims to provide a real-time illumination estimation and rendering method and system based on a neural network, which solve the problems that the real-time illumination estimation and rendering of a mobile terminal are not required to adopt extra hardware to collect illumination information, and the neural network with huge parameter quantity is not required to calculate a high-dynamic panoramic illumination map.

The invention provides a real-time illumination estimation and rendering system based on a neural network, which comprises the following modules:

a camera reading module: real-time camera video stream acquisition for end-side devices;

the neural network reasoning module: using a lightweight network for generating a result of illumination estimation, and using spherical harmonic wave coefficients to render a virtual object;

a virtual object loading module: the system comprises a three-dimensional virtual object loading unit, a three-dimensional virtual object loading unit and a three-dimensional virtual object loading unit, wherein the three-dimensional virtual object is used as an object rendered by an illumination result;

the spherical harmonic wave real-time rendering and fusion module comprises: and rendering the virtual object by utilizing the spherical harmonic wave coefficient for realizing cross-platform rendering and fusion. Preferably, the camera reading module continuously acquires a real-time video stream of the camera, and performs a preprocessing operation on the video frame by frame to generate video sequence frames.

Preferably, the video sequence frames after pre-processing are used as input for the illumination estimation.

Preferably, the neural network inference module adopts a lightweight convolutional neural network as a backbone network.

Preferably, the neural network inference module improves learning ability using an optimized inverse residual bottleneck structure.

Preferably, for the video sequence frame of the input network, a potential space vector of 1280 dimensions is generated through a neural network inference module, and then an illumination result, namely a spherical harmonic wave coefficient, is obtained from the potential space vector through a full connection layer.

Preferably, when the virtual object loading module loads a three-dimensional virtual object, the entire model can be packaged into a scene object, and the scene object includes a series of scene nodes for storing vertex data and surface indexes.

Preferably, the spherical harmonic wave real-time rendering and fusion module renders the loaded three-dimensional virtual object in real time according to the spherical harmonic wave coefficient.

According to the real-time illumination estimation and rendering method based on the neural network, the real-time illumination estimation and rendering system based on the neural network is adopted to carry out real-time illumination estimation and rendering.

Preferably, the method comprises the following steps:

a camera reading step: acquiring a real-time camera video stream of a terminal side device;

neural network reasoning step: generating an illumination estimation result by utilizing a lightweight network;

loading a virtual object: loading a three-dimensional virtual object as an object rendered by an illumination result;

real-time rendering and fusing spherical harmonic waves: and rendering the virtual object by utilizing the spherical harmonic wave coefficient to realize cross-platform rendering and fusion. Compared with the prior art, the invention has the following beneficial effects:

1. the invention solves the problems of real-time illumination estimation and rendering of the mobile terminal, does not need extra hardware to collect illumination information, and does not need a neural network with huge parameter quantity to calculate a high-dynamic panoramic illumination map.

2. The light-weight neural network is adopted, and the spherical harmonic wave coefficient is calculated according to the video sequence frame, so that real-time and efficient illumination estimation is achieved.

3. The invention adopts the rendering module of virtual-real fusion to render the three-dimensional virtual object in real time and display the three-dimensional virtual object in front of the camera video stream, thereby realizing the real-time fusion of the virtual object and the real background.

Drawings

Other features, objects and advantages of the invention will become more apparent upon reading of the detailed description of non-limiting embodiments with reference to the following drawings:

FIG. 1 is a general flowchart of a real-time illumination estimation and rendering method based on neural network according to the present invention;

FIG. 2 is a flowchart of the operation of the rendering and fusion module in the real-time illumination estimation and rendering system based on neural network according to the present invention;

FIG. 3 is a flowchart illustrating the operation of the neural network module in the real-time illumination estimation and rendering system based on the neural network according to the present invention.

Detailed Description

The present invention will be described in detail with reference to specific examples. The following examples will assist those skilled in the art in further understanding the invention, but are not intended to limit the invention in any way. It should be noted that it would be obvious to those skilled in the art that various changes and modifications can be made without departing from the spirit of the invention. All falling within the scope of the present invention.

As shown in fig. 1, the present invention provides a real-time illumination estimation and rendering system based on a neural network, which includes any one or more of the following modules:

a camera reading module: the camera reading module is responsible for acquiring the real-time camera video stream of the end-side device, continuously acquiring the real-time video stream of the camera, and performing preprocessing operations such as interception, scaling, filtering and the like on the video frame by frame. The preprocessed video sequence frames will be used as input for the illumination estimation. The present invention is capable of calculating illumination in real time at a rate of 60 frames per second.

A neural network module: and adopting a lightweight convolutional neural network as a backbone network for generating a result of the illumination estimation. Network 95% of the convolutional layers are convolved with 1x1, and the depth separable convolutions reduce the amount of computation. Therefore, the method can be ensured to run at the mobile terminal efficiently and in real time. Meanwhile, the learning capability of the network is improved by adopting an optimized inverse residual bottleneck structure. For a video sequence frame input into the network, a 1280-dimensional potential space vector is generated through the backbone network, and then an illumination result, namely a spherical harmonic wave coefficient, is obtained from the potential space vector through the full connection layer. As shown in fig. 3, specifically, a video sequence frame input to the network is subjected to convolution and deconvolution operations by a lightweight convolutional neural network, so that a 1280-dimensional potential space vector can be obtained; wherein, the potential space vector of 1280 dimension can output the spherical harmonic wave coefficient of 27 dimensions after the convolution calculation of the full connection layer. The parameters of the neural network are obtained through training of the paired input images and the corresponding actual spherical harmonic wave coefficients.

A virtual object loading module: the method is used for loading the three-dimensional virtual object as an object rendered by the illumination result. When a three-dimensional virtual object is loaded, the module encapsulates the whole model into a scene object, the scene object comprises a series of scene nodes for storing vertex data and face indexes, and the scene object also stores material information. The virtual object loading module adopts a recursive analysis method to recursively acquire the grid object and process the grid object to obtain vertex data, surface indexes, textures, material data and the like required by rendering.

The spherical harmonic wave real-time rendering and fusion module comprises: the method is used for realizing cross-platform rendering and fusion and rendering the loaded three-dimensional virtual object in real time according to the spherical harmonic wave coefficient. The method comprises the steps of firstly binding Vertex data, surface indexes, textures and material data to a Vertex Array Object (VAO) in combination with a virtual Object loading module, then obtaining information such as a camera video stream and a spherical harmonic wave coefficient, finally updating a shader, and realizing the effect of virtual-real fusion before displaying a rendered three-dimensional virtual Object on the camera video stream.

As shown in fig. 2, the present invention further provides a real-time illumination estimation and rendering method based on the neural network, which adopts the real-time illumination estimation and rendering system based on the neural network to perform real-time illumination estimation and rendering; comprising any one or more of the following steps:

neural network reasoning step: generating a result of the illumination estimation;

real-time rendering and fusing spherical harmonic waves: and realizing cross-platform rendering and fusion.

Specifically, a binding grid model is loaded firstly, then a neural network is loaded, and then a camera is initialized; the method comprises the steps of obtaining camera video stream after initializing a camera, ending if an ending command is received, entering neural network reasoning if the ending command is not received, generating a result of illumination estimation, continuously updating illumination, motion and visual angle, finally obtaining the camera video stream after rendering and displaying, and ending if the ending command is received.

Those skilled in the art will appreciate that, in addition to implementing the system and its various devices, modules, units provided by the present invention as pure computer readable program code, the system and its various devices, modules, units provided by the present invention can be fully implemented by logically programming method steps in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers and the like. Therefore, the system and various devices, modules and units thereof provided by the invention can be regarded as a hardware component, and the devices, modules and units included in the system for realizing various functions can also be regarded as structures in the hardware component; means, modules, units for performing the various functions may also be regarded as structures within both software modules and hardware components for performing the method.

The foregoing description of specific embodiments of the present invention has been presented. It is to be understood that the present invention is not limited to the specific embodiments described above, and that various changes or modifications may be made by one skilled in the art within the scope of the appended claims without departing from the spirit of the invention. The embodiments and features of the embodiments of the present application may be combined with each other arbitrarily without conflict.

Claims

1. A real-time illumination estimation and rendering system based on a neural network is characterized by comprising the following modules:

a camera reading module: the method comprises the steps that a real-time camera video stream used for end-side equipment is obtained, and video sequence frames are generated by performing preprocessing operation on the video frame by frame;

the neural network reasoning module: adopting a lightweight convolutional neural network as a backbone network to generate an illumination estimation result, adopting an optimized inverse residual bottleneck structure to improve the learning capacity of the network, inputting video sequence frames into the lightweight convolutional neural network, generating 1280-dimensional potential space vectors through the backbone network, and then obtaining an illumination result, namely a spherical harmonic wave coefficient, from the potential space vectors through a full connection layer;

a virtual object loading module: the system is used for loading a three-dimensional virtual object as an object rendered by an illumination result, and when a three-dimensional virtual object model is loaded, the whole model can be packaged into a scene object, the scene object comprises a series of scene nodes for storing vertex data and surface indexes, and the vertex data, the surface indexes, the textures and the material data required by rendering are obtained by recursively obtaining a grid object and processing the grid object;

the spherical harmonic wave real-time rendering and fusion module comprises: the cross-platform rendering and fusion are realized by rendering the virtual object by utilizing the spherical harmonic wave coefficient, the vertex data, the surface index, the texture and the material data are bound to the vertex group object by combining with the virtual object loading module, then the camera video stream and the spherical harmonic wave coefficient are obtained, finally the shader is updated, and the rendered three-dimensional virtual object is displayed in front of the camera video stream.

2. The neural network-based real-time illumination estimation and rendering system of claim 1, wherein the preprocessed video sequence frames are used as input to the illumination estimation.

3. A real-time illumination estimation and rendering method based on a neural network, characterized in that the real-time illumination estimation and rendering is performed by using the real-time illumination estimation and rendering system based on the neural network of any one of claims 1-2.

4. The real-time illumination estimation and rendering method based on neural network as claimed in claim 3, comprising the steps of:

neural network reasoning step: generating a result of the illumination estimation by using the lightweight network;

real-time rendering and fusing spherical harmonic waves: and rendering the virtual object by utilizing the spherical harmonic wave coefficient to realize cross-platform rendering and fusion.