CN114445546A

CN114445546A - Rendering model training method, rendering device, rendering equipment and storage medium

Info

Publication number: CN114445546A
Application number: CN202210118001.0A
Authority: CN
Inventors: 张彤; 胡忠冰
Original assignee: Bigo Technology Singapore Pte Ltd
Current assignee: Bigo Technology Singapore Pte Ltd
Priority date: 2022-02-08
Filing date: 2022-02-08
Publication date: 2022-05-06

Abstract

The invention relates to a rendering model training method, a rendering device, equipment and a storage medium, wherein the method comprises the following steps: inputting the image of the second scene into the first machine learning module, and extracting corresponding high-dimensional features; inputting the high-dimensional features into a second machine learning module, and obtaining corresponding illumination parameters according to the high-dimensional features; inputting the illumination parameters and data to be rendered of the first scene into a differentiable rendering module, and rendering the first scene according to the illumination parameters in a differentiable rendering mode to obtain a rendered image with integrated illumination; and calculating loss according to the rendering image and the label image, calculating a gradient map of the loss, carrying out back propagation, and updating the weights of the first machine learning module and the second machine learning module in a gradient descent mode. By utilizing the rendering model training method and the rendering method, the illumination parameters can be extracted accurately, the illumination of the whole image obtained by rendering can be consistent, and a result image with a very good sense of reality can be obtained.

Description

Rendering model training method, rendering device, rendering equipment and storage medium

Technical Field

The invention relates to the technical field of computers, in particular to a rendering model training method, a rendering device, rendering equipment and a storage medium.

Background

The statements herein merely provide background information related to the present disclosure and may not necessarily constitute prior art.

The rendering model can be divided into a local illumination model and a global illumination model, the local illumination model only considers the illumination effect of the light source on the surface of the object, and the global illumination model not only considers the illumination effect of the light source on the object but also needs to consider the mutual influence between the objects. The global illumination model generally utilizes a physics-based method to simulate and calculate the propagation and intensity of light everywhere by using the laws of the physical world, so that the rendering result of the global illumination model has very high reality. Classical global illumination algorithms include radiometry, path tracking and photon mapping.

The acquisition and application of the illumination parameters are indispensable links in the rendering process. In order to achieve high-quality rendering results with reality, the conventional rendering method still has inconvenience and disadvantages in obtaining and applying illumination parameters, and needs to be further improved.

Disclosure of Invention

The invention aims to provide a novel rendering model training method, a rendering method, a device, equipment and a storage medium.

The purpose of the invention is realized by adopting the following technical scheme. The invention provides a rendering model training method, which comprises the following steps: acquiring one or more training data, wherein the training data comprises data to be rendered of a first scene, an image of a second scene with illumination, and a label image of the first scene; inputting the image of the second scene into a first machine learning module, wherein the first machine learning module is used for extracting high-dimensional features corresponding to the image of the second scene, and the high-dimensional features are used for representing information of objects and/or environments contained in the image of the second scene; inputting the high-dimensional features into a second machine learning module, wherein the second machine learning module is used for obtaining illumination parameters corresponding to the second scene according to the high-dimensional features; inputting the illumination parameter and the data to be rendered of the first scene into a differentiable rendering module, wherein the differentiable rendering module is used for rendering the first scene according to the illumination parameter by adopting a differentiable rendering mode to obtain an illumination-fused rendering image; calculating loss according to the illumination fused rendering image and the label image, calculating a gradient map of the loss, and performing back propagation on the differentiable rendering module, the first machine learning module and the second machine learning module so as to update the weights of the first machine learning module and the second machine learning module by using a gradient descent mode; and after multiple rounds of training iteration, obtaining a rendering model comprising the first machine learning module, the second machine learning module and the differentiable rendering module after a target condition is reached.

The purpose of the invention is realized by adopting the following technical scheme. According to the rendering method provided by the application, the rendering method comprises the following steps: inputting data to be rendered of a third scene into a differentiable rendering module in a trained rendering model obtained according to the rendering model training method, and inputting an image of a fourth scene into a first machine learning module in the rendering model, so as to extract an illumination parameter from the image of the fourth scene and render the third scene according to the illumination parameter, thereby obtaining a rendered image with illumination fused with the illumination of the third scene.

The purpose of the invention is realized by adopting the following technical scheme. According to this application, a render model training device includes: an acquisition module and a training module;

the acquisition module is configured to: acquiring one or more training data, wherein the training data comprises data to be rendered of a first scene, an image of a second scene with illumination, and a label image of the first scene;

the training module is used for: inputting the image of the second scene into a first machine learning module, wherein the first machine learning module is used for extracting high-dimensional features corresponding to the image of the second scene, and the high-dimensional features are used for representing information of objects and/or environments contained in the image of the second scene; inputting the high-dimensional features into a second machine learning module, wherein the second machine learning module is used for obtaining illumination parameters corresponding to the second scene according to the high-dimensional features; inputting the illumination parameter and the data to be rendered of the first scene into a differentiable rendering module, wherein the differentiable rendering module is used for rendering the first scene according to the illumination parameter by adopting a differentiable rendering mode to obtain an illumination-fused rendering image; calculating loss according to the illumination fused rendering image and the label image, calculating a gradient map of the loss, and performing back propagation on the differentiable rendering module, the first machine learning module and the second machine learning module so as to update the weights of the first machine learning module and the second machine learning module by using a gradient descent mode; and after multiple rounds of training iteration, obtaining a rendering model comprising the first machine learning module, the second machine learning module and the differentiable rendering module after a target condition is reached.

The purpose of the invention is realized by adopting the following technical scheme. The rendering device provided by the application comprises an inference module, and is used for: inputting data to be rendered of a third scene to a differentiable rendering module in a trained rendering model obtained by the rendering model training device according to claim 9, and inputting an image of a fourth scene to a first machine learning module in the rendering model, so as to extract an illumination parameter from the image of the fourth scene and render the third scene according to the illumination parameter, thereby obtaining an illumination-fused rendering image of the third scene.

The purpose of the invention is realized by adopting the following technical scheme. A rendering device proposed according to the present application includes: a memory for storing non-transitory computer readable instructions; and a processor for executing the computer readable instructions, such that the processor implements any one of the aforementioned rendering model training methods or rendering methods when executed.

The purpose of the invention is realized by adopting the following technical scheme. A computer readable storage medium according to the present application is provided for storing non-transitory computer readable instructions which, when executed by a computer, cause the computer to perform any one of the rendering model training methods or the rendering methods described above.

Compared with the prior art, the invention has obvious advantages and beneficial effects. By the technical scheme, the rendering model training method, the rendering device, the rendering equipment and the storage medium provided by the invention at least have the following advantages and beneficial effects:

1. the invention provides a rendering method based on differential rendering for integrating illumination parameter acquisition and corresponding illumination.

2. By adopting the method and the device, illumination information can be extracted from a real scene, illumination fusion is carried out on a reconstructed or to-be-rendered scene, and rendering is carried out in ways of path tracking and the like, so that a result image, streaming media or a synthesized short video which is consistent in overall image illumination and has high sense of reality is achieved.

The foregoing description is only an overview of the technical solutions of the present invention, and in order to make the technical means of the present invention more clearly understood, the present invention may be implemented in accordance with the content of the description, and in order to make the above and other objects, features, and advantages of the present invention more clearly understandable, the following preferred embodiments are described in detail with reference to the accompanying drawings.

Drawings

FIG. 1 is a flow chart of a rendering model training method according to an embodiment of the invention;

FIG. 2 is a flow diagram of a rendering method according to an embodiment of the invention;

FIG. 3 is a flow chart of a rendering method according to another embodiment of the invention;

FIG. 4 is a block diagram of a first machine learning module according to an embodiment of the present disclosure;

FIG. 5 is a diagram illustrating a second machine learning module according to an embodiment of the present invention;

fig. 6 is a schematic structural diagram of a neural network block in the second machine learning module according to an embodiment of the present invention;

FIG. 7 is a schematic diagram of a rendering model training apparatus according to an embodiment of the present invention;

fig. 8 is a schematic diagram of a rendering apparatus according to an embodiment of the present invention.

Detailed Description

To further illustrate the technical means and effects of the present invention adopted to achieve the predetermined objects, the following detailed description will be given of specific embodiments, structures, features and effects of a rendering model training method, a rendering method, an apparatus, a device and a storage medium according to the present invention with reference to the accompanying drawings and preferred embodiments.

It is noted that, in this document, relational terms such as "first," "second," and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. In addition, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

It should be noted that the image mentioned in this application may be a stand-alone picture or may also be a video frame.

FIG. 1 is a schematic flow chart diagram of an embodiment of a rendering model training method of the present invention. In some embodiments of the present invention, referring to fig. 1, an exemplary rendering model training method of the present invention mainly includes the following steps:

step S11, one or more training data are acquired. The training data includes data to be rendered for a first scene, an image of a second scene, and a label image of the first scene. The data to be rendered of the first scene and the image of the second scene are used as training input, and the label image of the first scene is used as a training label.

The data to be rendered of the first scene can comprise a three-dimensional model of the first scene and other necessary parameters; alternatively, the data to be rendered of the first scene may not necessarily comprise the lighting parameters.

Wherein the image of the second scene is an image with illumination. In fact, the environment and objects in the scene can only be seen with light, otherwise a piece of paint is blackened. Alternatively, the second scene may not have a specific object, but merely a background environment.

It is noted that the images referred to herein generally refer to two-dimensional images.

In some optional examples, in the first scene, virtual objects are included, and optionally, the first scene may even consist of only virtual objects; or in other alternative examples, in the first scene, both the virtual object and the real object in the real scene are included.

In some optional examples, in the second scene, a real environment and/or a real object in the real scene are included. Alternatively, the second scene may be a completely real scene consisting of real objects and a real environment.

Note that the label (label) has a correspondence with the input of the model. Specifically, a picture of a real scene acquired by shooting or the like (for example, training data is reconstruction of the real scene), or a picture with high realism (at least, the training data includes virtual objects that do not exist in the real scene) obtained by artificial rendering or artificial adjustment of a rendering map may be used as the tag image. Optionally, in the training process, the training input and the virtual object, or real object/background, to which the label relates are matched. For example, the input may be a combination of a virtual object and a real scene, or a combination of pure virtual objects, and the label is a corresponding manually captured actual photographed/high real picture video containing these (i.e., a combination of a virtual object and a real scene, or a pure virtual object).

Generally, the tag image includes all objects, environments, and scenes (e.g., indoor scenes, field scenes, etc.) in the rendered image, but is more realistic than the rendered image.

Note that the tag image of the first scene may include a virtual object, and may also include a real object in a real scene. For example, the tag image may include an image of a real scene acquired by means of photographing or the like, and may include a virtual 3D object designed by man, or the like. Generally, the tag image is a specific object scene combined with real and virtual objects.

It should be noted that the object in this document is not limited to "object" in general, but may also be "person", or some part of person (e.g. human face), or even environment (e.g. space, floor, wall, etc.).

It is to be noted that the present invention is not particularly limited to the virtual object. For example, the virtual object may be placed in the scene by any means, including but not limited to a man-made virtual object, or a virtual object obtained by reconstructing a real object, such as a reconstructed virtual person, a human face, an object, or a scene.

Optionally, the first scene and the second scene may be related, or the first scene and the second scene may not be related.

In step S12, the image of the second scene is input to a first machine learning module (also referred to as an image feature extraction module) for extracting high-dimensional features corresponding to the image of the second scene. Wherein the high-dimensional features are used to characterize information of objects and/or environments contained in the image of the second scene. In some examples, the high-dimensional features may reflect the condition of the three-dimensional space to which the two-dimensional image corresponds, but generally the high-dimensional features cannot directly represent the three-dimensional model of the second scene.

Optionally, the high-dimensional features obtained in this step are not vectors, but feature maps (feature maps).

Step S13, inputting the high-dimensional features into a second machine learning module, where the second machine learning module is configured to obtain illumination parameters corresponding to the second scene according to the high-dimensional features.

Wherein the illumination parameter is used to represent an attribute of the illumination. Optionally, the illumination parameters include, but are not limited to: the number of light sources, the illumination type of each light source (e.g., point light source or surface light source), the position of the light source, the illumination intensity (or luminous flux), the color, orientation, size, and the like. In fact, the environment and objects in the scene can only be seen with light, otherwise a piece of paint is blackened.

Step S14, inputting the data to be rendered of the first scene into a differentiable rendering module, and inputting the illumination parameter extracted in step S13 into the differentiable rendering module, where the differentiable rendering module is configured to render the first scene according to the illumination parameter by using a differentiable rendering manner to obtain an illumination-fused rendered image.

Note that the lighting information can be fused (or migrated) to any target scene rendering during the rendering stage. Specifically, the aforementioned fusions include, but are not limited to: and directly utilizing the extracted illumination parameters, or replacing at least some original illumination parameters of the target scene by the extracted illumination parameters, or superposing at least some original illumination parameters of the target scene with the extracted illumination parameters.

Optionally, the rendering the first scene according to the illumination parameter specifically includes: and performing global illumination rendering on the first scene according to the extracted illumination parameters. Optionally, the realistic rendering performed by the present invention in computer graphics means: a physical-based (physics-based) global illumination model. It should be noted that the present invention is not limited to specific ways of global illumination rendering, including but not limited to path tracing rendering (ray tracing rendering), radiometric rendering, or photon mapping rendering.

It should be noted that in a general rendering process, a rendered image can be output by inputting data of a scene to be rendered into the rendering module. In step S14, during rendering, the extracted illumination parameters are fused into the scene to be rendered, so that a rendering result with more realistic sensation can be obtained.

Step S15, calculating a loss according to the rendered image of the first scene with the illumination fused and the label image of the first scene, calculating a gradient map of the loss, and performing back propagation on the differentiable rendering module, the first machine learning module and the second machine learning module, so as to update the weights (or parameters called machine learning modules) of the first machine learning module and the second machine learning module in a gradient descent manner. Therefore, the first machine learning module and the second machine learning module are integrally optimized at the same time.

Wherein, the gradient map refers to a visualized gradient result obtained in the process of gradient descent.

The aforementioned loss is also referred to as a loss function or a cost function.

Note that when calculating the loss using the tag image, the tag is the image calculation loss fused with the illumination rendering, so in practice, the tag image may not necessarily be correlated with the map of the second scene.

And step S16, obtaining a rendering model comprising a first machine learning module, a second machine learning module and a differentiable rendering module after multiple rounds of training iteration and the target condition is reached.

Wherein, the aforementioned target conditions include but are not limited to: the preset iteration round number is reached, and/or a convergence condition is reached. The objective is to minimize the loss function as much as possible.

In some embodiments of the present application, the rendering the first scene according to the illumination parameter in the differentiable rendering manner in the foregoing step S14 includes: in calculating an objective function (e.g., a spherical harmonic function), an integral with respect to an input parameter (also referred to as a parameter to be rendered) is calculated to obtain an output result (also referred to as a rendering result). The back propagation of the differentiable rendering module, the first machine learning module and the second machine learning module in the step S15 includes: when the objective function is propagated in the reverse direction, the loss corresponding to the output result is differentiated with respect to the input parameter.

With respect to differentiable rendering: the differentiable rendering module is not a neural network model traditionally built with perceptrons (perceptrons) as the basic unit. For ease of understanding, it can be intuitively understood as a module that can perform neural network-like back propagation. Generally, the differentiable rendering module does not need to train its parameters, nor does it have a node tree structure similar to machine learning or deep learning visualization. Taking global illumination rendering as an example, spherical harmonics appear in global illumination rendering as a basis of global illumination rendering, and a complex integration link is involved in the calculation of the spherical harmonics, which can be considered to be similar to forward propagation of a machine learning model. While differentiable positive means: the inverse of the previous rendering, i.e. the differentiation of the rendering results with respect to the renderer input parameters, can be considered here like the back propagation of a machine learning model. Because the traditional renderer often introduces a more complex probability-based sampling (sampling) link in order to simplify the integration link, the operation process becomes a discrete summation and is not differentiable.

It should be noted that, unlike the way of calculating the gradient by parallelizing numerical operations adopted by the general machine learning/deep learning method, in the training process of the differentiable rendering, the gradient descent method obtains the data result through the process of image-to-numerical operation (graph-computer).

Therefore, in the solution of the present example, when the gradient is decreased by the computation of the loss function, the first machine learning module, the second machine learning module, and the differentiable rendering module all participate in the gradient decrease. All the loss calculation + gradient descent used for back propagation have unity at the operation level, that is, they can be mutually transmitted to perform a series of subsequent operations.

The rendering model training method provided by the application has the following three advantages.

First, in the field of traditional Computer Vision (CV) and Computer Graphics (CG), machine learning/deep learning (ml/dl) and a renderer are used in a splitting process, for example, ml/dl is used alone to obtain corresponding parameters in a picture or video, and then the parameters are applied to a pipeline (pipeline) of the traditional renderer. This is a multi-stage execution, and relatively independent task between stages, requiring manual intervention and manual adjustment. In the existing method, the process of obtaining the illumination parameters and the process of rendering are split, that is, the illumination parameters are obtained independently, for example, a neural network model for obtaining the illumination parameters is trained independently, and then the obtained illumination parameters are directly used for rendering. In the scheme of the example of the application, no matter in the training process or the forward use process, a plurality of modules are integrated into a unified whole, and images/videos are input and corresponding images/videos are output. When introducing additional virtual objects, it is necessary to use a series of parameters acquired in the middle, such as lighting parameters. For the training process, the present application employs end-to-end (also referred to as input-to-output) training.

Second, in the exemplary scenario of the present application, data is interpretable and transitive across modules. The differentiable rendering module used in the method is reversely differentiable, and gradient parameters reversely propagated by the differentiable rendering device can be directly transmitted to the previous first machine learning module/second machine learning module, so that manual participation is not needed. Specifically, in the existing machine learning and deep learning, the value of data corresponding to an image is focused, but a visualization result is not focused; the input and output of the model proposed by the present invention are visualized, wherein the output obtained by the module of the differentiable renderer is an image, but when the output is propagated reversely, the input part is converted into corresponding parameters, for example, including data corresponding to the image and illumination parameters, and seamlessly and directly transmitted to the previous training module (without intermediate layer processing such as data regularization or data scaling, because the gradient data and the updated weight have uniformity), and the data flow from the image to the front end.

That is, the differentiable renderer gets an image that gets the corresponding number in the reverse direction when it is computed and propagated in the reverse direction, regardless of what form the data is inside. The differentiable renderer is to: the image is converted into a corresponding gradient map, and the gradient map is converted into a corresponding number (for example, int8 format). The traditional renderer does not include the process of inverse gradient propagation because the traditional renderer adopts a probability-based sampling mode. Thus, in calculating a rendering equation (e.g., the bidirectional reflectance distribution function brdf), the probability-based sampling approach of conventional renderers results in the integration process being irreversible when complex integration loops are involved. Whereas the differentiable renderer of the present application is the inverse of the integration process.

Third, in the exemplary scenario of the present application, the image-to-data operation (Graph-computer) is correct and feasible, and may also embody the unity of training + forward use as a whole.

It should be noted that the focus of the present invention is not the design of the differentiable renderer, and therefore the present invention is not limited to the specific structure of the differentiable rendering module, and any differentiable rendering module can be adopted.

As an alternative example of the first machine learning module of the present invention, as shown in fig. 4, the first machine learning module (image feature extraction module) may be a deep learning module, and a specific structure of the first machine learning module may include k down-sampling layers and 2k cyclic convolution layers, where k is a natural number.

As an alternative example of the second machine learning module of the present invention, as shown in fig. 5 and fig. 6, the second machine learning module is a multi-layer perceptron module, and may include three neural network blocks, each of which shares parameters; each neural network block comprises a first hidden layer, a first active layer, a second hidden layer, a second active layer and a third hidden layer which are sequentially arranged.

It is noted that, in general, rendering is based on three-dimensional data to obtain a two-dimensional image, and reverse rendering is based on two-dimensional data to obtain three-dimensional data. In the method of the present example, the inverse rendering is implemented by using a differentiable rendering module, and the process of "inverse" essentially refers to the inverse propagation of the gradient, i.e. the differentiable of the rendering calculation process. The back propagation process is an executable capability of a specific module (e.g. a differentiable rendering module) herein, and is used for transferring gradient data of gradient descent and the like in a reverse direction.

An embodiment of the present invention further provides a rendering method, referring to fig. 2, the rendering method according to the embodiment of the present invention mainly includes the following steps:

step S21, inputting the data to be rendered of the third scene into a differentiable rendering module in a trained rendering model, inputting the image of the fourth scene into a first machine learning module in the rendering model, extracting an illumination parameter from the image of the fourth scene by using the first machine learning module and a second machine learning module in the rendering model, and rendering the third scene by using the differentiable rendering module in the rendering model according to the illumination parameter to obtain a rendered image of the third scene with illumination fusion. The rendering model is obtained according to any one of the above-mentioned rendering model training methods.

Wherein the virtual object is contained in the third scene. Alternatively, the third scene may only contain the virtual object, or may include both the virtual object and the real object in the real scene.

Wherein a real object is contained in the fourth scene. Alternatively, the fourth scene may even contain only real objects.

Optionally, the third scene and the fourth scene may be related, or the third scene and the fourth scene may also be unrelated.

It should be noted that the illumination information can be fused to any target scene rendering during the rendering stage. Specifically, the aforementioned fusions include, but are not limited to: and directly utilizing the extracted illumination parameters, or replacing at least some original illumination parameters of the target scene by the extracted illumination parameters, or superposing at least some original illumination parameters of the target scene with the extracted illumination parameters.

By using the rendering method provided by the invention, illumination fusion is realized, for example, when a third scene is rendered, a rendering result with a real illumination effect is obtained by using an image of a fourth scene obtained by shooting a real scene and virtual objects (such as human faces, people, objects, scenes and the like) which may exist, are artificially added or are placed in the third scene through various means.

The invention proposes: the overall training method and the using method for the first machine learning module F (also called an inference module) and the second machine learning module M (also called an illumination parameter extraction module) and the differentiable rendering module render. In the rendering stage, the extracted illumination information can be transferred to any scene for rendering, and a rendering graph with realistic illumination angle, brightness, color temperature and the like which are fit with and input into a real scene image can be obtained after reconstruction. The method provided by the invention can transfer the illumination effect to various scenes, such as human faces, human bodies, environment reconstruction and other application targets, and ensures the illumination consistency of the whole image.

Fig. 3 is a flowchart setting in the case of inference (model inference), and a third scene (a scene to be rendered) may include "a virtual object to be added by human," that is, one or more virtual objects to be rendered that do not exist in reality may be added by human, and a fourth scene (a real scene) plays a role here in providing acquired information about lighting parameters. Referring to fig. 3, in an embodiment of the present invention, a rendering method provided by the present invention includes: firstly, performing reverse rendering operation on an input real scene image to obtain a high-dimensional characteristic corresponding to the image; then, extracting illumination parameters from the multi-layer perceptron module to serve as expected illumination information; and finally, the illumination information can be transferred to any scene for rendering in the rendering stage, and a rendering graph with realistic illumination angle, brightness, color temperature and the like which are fit with and input into a real scene image can be obtained after reconstruction. By the method, the illumination effect can be transferred to various application targets such as human faces, human bodies, environment reconstruction and the like, and the illumination consistency of the whole image is ensured.

The method comprises the steps of extracting illumination information from a real scene, carrying out illumination fusion on a reconstructed or to-be-rendered scene, and rendering in a path tracking mode and the like.

In some embodiments of the present invention, a rendering method of examples of the present invention includes:

step S1, integrating the experience of deep learning algorithm model design, designing a set of image feature extraction module (inference module F) based on the deep learning algorithm model, wherein the structure of the image feature extraction module is as shown in FIG. 4.

Wherein m and n are respectively 4 and 8. It should be noted that the values of m and n are not limited in the present invention, and m and n can be set to any positive integer according to the difference of the operational capability of the device. Alternatively, m and n may be set to k and 2 × k, respectively, where k is a natural number.

The expected effect can be obtained by using the image feature extraction module with the structure as shown in FIG. 4.

After the training process is finished, the image feature extraction module can be used for extracting features of the input real scene image. Based on the operation mode of the image feature extraction module, a real scene picture I can be input, high-dimensional features ft are obtained, and the process is as follows:

F(I)→ft

f represents an image feature extraction module based on a deep learning algorithm model, I represents an input real scene picture, and ft represents obtained high-dimensional features.

It is noted that the format of the real scene picture I is not limited by the present invention, including but not limited to RGB format.

In step S2, a series of illumination parameters required by the next step can be obtained based on the multi-layer perceptron and the high-dimensional features obtained in the previous step, and the multi-layer perceptron is designed as shown in fig. 5.

In practice, the multi-layer perceptron may comprise 3-layer network blocks, each block sharing parameters.

The structure of each block is shown in fig. 6.

A plurality of illumination parameters can be finally obtained through the multilayer perceptron, and the process is as follows:

M(ft)→l_positon，l_intensity，l_pose，l_size

wherein, M represents a multi-layer perceptron module composed of a plurality of blocks in fig. 5, ft represents a high-dimensional feature obtained in the previous step, and l _ position, l _ intensity, l _ position, and l _ size represent the position, intensity (color), orientation, and size of the obtained illumination parameters, respectively. The specific parameters may be different depending on the type of illumination, for example, for a point light source, the orientation and size are not within the expected range of the obtained illumination parameters.

In step S3, to obtain the stable and available image feature extraction module F and the multi-layer perceptron module M, parameters in the modules need to be further trained through iterative training.

The overall training process is as follows: and based on the illumination parameters l _ position, l _ intensity, l _ position and l _ size obtained in the last step and the corresponding differentiable rendering module, performing path tracking rendering on the specific object scene to obtain an output image O similar to the illumination state of the input image real scene picture I. The process is as follows:

renderer(l_positon，l_intensity，l_pose，l_size)→O

then, the loss (loss) is calculated from the output image O and the label data O'. It should be noted that the present invention is not limited to the lost calculation method, and for example, a general method may be adopted. Further, a corresponding gradient map is calculated, and the inverse rendering process of the differentiable rendering module of step S3, the image feature extraction module of step S1 and step S2, and the multi-layer perceptron module are combined to update the weight (also referred to as model parameter or neural network parameter) which is propagated in the opposite direction.

It should be noted that the present invention is not limited to the specific neural network structure adopted in the foregoing steps S1 and S2; for example, the number of layers of the neural network is not limited, but may be adjusted according to an actual situation, and in practice, the number of network layers may be regarded as a hyper-parameter, where the hyper-parameter refers to a parameter whose value can be arbitrarily set before the training process starts, and the hyper-parameter is not obtained through training.

It should be noted that the training process proposed by the present invention is a whole inverse propagation of gradient descent, the whole steps S1 to S3 are a whole training process, and the differentiable rendering module performs gradient calculation and inverse propagation in the inverse rendering process, and also affects and updates the network parameters of the image feature extraction module in step S1.

After the training is finished, an integral model (a complete link) capable of estimating the illumination parameters and rendering the image with high reality sense result can be obtained, and the integral model comprises a first machine learning module F, a second machine learning module M and a differentiable rendering module render which correspond to each other.

It should be noted that, since the differentiable rendering module generally involves a huge amount of computation, the implementation process can generally utilize large computation resources such as servers and cloud computing, and the GPU can be introduced to accelerate the computation process.

When the device is used, any real scene RGB picture or RGB video frame is input, the attributes such as the position, the size and the like of an object or a scene to be rendered input by the differentiable rendering module render are adjusted, and the real illumination effect similar to that in the input picture or video frame can be generated.

It should be noted that the aforementioned illumination parameters l _ position, l _ intensity, l _ position, and l _ size are intermediate results of the overall link, and have specific physical meanings.

When the trained model is used, in combination with the object obj to be rendered in the scene to be rendered, the complete process is as follows:

I→F(I)→M(ft)→renderer(obj，l_positon，l_intensity，l_pose，l_size)→O

it should be noted that the foregoing renderers (l _ positon, l _ intensity, l _ pos, l _ size) mentioned in the model training process are the same as the renderers (obj, l _ positon, l _ intensity, l _ pos, l _ size) mentioned in the model using process, although the expressions are different, the specific process is the same. The object obj to be rendered is mentioned in the formula of the model using process, because the user can replace the object obj to be rendered at any time in the specific implementation process; the reason that the object obj to be rendered is not mentioned in the formula of the model training process is that the object to be rendered is generally preset in the training process, so that the object to be rendered in the formula is simplified.

The rendering method provided by the invention can automatically extract the illumination information, perform illumination fusion and obtain the rendering effect, can be applied to the situations of pure virtual scenes, augmented reality (virtual + reality), or real scenes and corresponding reconstruction thereof, and the like, and obtains the rendering result with reality sense.

It should be noted that the conventional rendering method does not include step S3 of the present invention, and the conventional rendering method renders the object in combination with the lighting parameters only by using the conventional renderer after obtaining the lighting parameters.

By utilizing the rendering model training method and the rendering method, the illumination parameters can be extracted accurately, the illumination of the whole image obtained by rendering can be consistent, and a result image with a very good sense of reality can be obtained.

The embodiment of the present invention further provides a rendering model training device, which mainly includes: the device comprises an acquisition module and a training module.

Wherein the acquisition module is configured to: one or more training data are acquired, the training data including data to be rendered for a first scene, an image of a second scene with illumination, and a label image of the first scene. The training module is used for:

inputting the image of the second scene into a first machine learning module, wherein the first machine learning module is used for extracting high-dimensional features corresponding to the image of the second scene, and the high-dimensional features are used for representing information of objects and/or environments contained in the image of the second scene;

inputting the high-dimensional features into a second machine learning module, wherein the second machine learning module is used for obtaining illumination parameters corresponding to a second scene according to the high-dimensional features;

the illumination parameters and the data to be rendered of the first scene are input into a differentiable rendering module, and the differentiable rendering module is used for rendering the first scene according to the illumination parameters by adopting a differentiable rendering mode to obtain a rendered image with illumination fusion;

calculating loss according to the rendering image and the label image fused by illumination, calculating a gradient map of the loss, and performing back propagation on the differentiable rendering module, the first machine learning module and the second machine learning module so as to update the weights of the first machine learning module and the second machine learning module by using a gradient descent mode;

after multiple rounds of training iteration, a rendering model comprising a first machine learning module, a second machine learning module and a differentiable rendering module is obtained after a target condition is reached.

In addition, the rendering model training apparatus shown in the embodiments of the present invention may further include a module and a unit for executing the rendering model training method described in each of the foregoing embodiments, and for detailed description and technical effects thereof, reference may be made to corresponding descriptions in each of the foregoing embodiments, which is not described herein again.

An embodiment of the present invention further provides a rendering apparatus, which mainly includes: and an inference module. Wherein the inference module is to: and inputting the data to be rendered of the third scene into a differentiable rendering module in the trained rendering model obtained by the rendering model training device, and inputting the image of the fourth scene into a first machine learning module in the rendering model so as to extract the illumination parameters of the image of the fourth scene and render the third scene according to the illumination parameters to obtain a rendered image with illumination fusion of the third scene.

In addition, the rendering apparatus shown in the embodiments of the present invention may further include a module and a unit for executing the rendering method described in each of the foregoing embodiments, and for detailed description and technical effects thereof, reference may be made to corresponding descriptions in each of the foregoing embodiments, which are not described herein again.

FIG. 7 is a schematic block diagram illustrating a rendering model training apparatus according to one embodiment of the present invention. As shown in fig. 7, the rendering model training apparatus 100 according to an embodiment of the present application includes a memory 101 and a processor 102.

The memory 101 is used to store non-transitory computer readable instructions. In particular, memory 101 may include one or more computer program products that may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. The volatile memory may include, for example, Random Access Memory (RAM), cache memory (cache), and/or the like. The non-volatile memory may include, for example, Read Only Memory (ROM), hard disk, flash memory, etc.

The processor 102 may be a Central Processing Unit (CPU) or other form of processing unit having data processing capabilities and/or instruction execution capabilities, and may control other components in the rendering model training device 100 to perform desired functions. In one embodiment of the present application, the processor 102 is configured to execute the computer readable instructions stored in the memory 101, so that the rendering model training 100 performs all or part of the steps of the rendering model training method of the embodiments of the present application.

For the detailed description and the technical effects of the present embodiment, reference may be made to the corresponding descriptions in the foregoing embodiments, which are not repeated herein.

Fig. 8 is a schematic block diagram illustrating a rendering apparatus according to an embodiment of the present invention. As shown in fig. 8, a rendering apparatus 200 according to an embodiment of the present application includes a memory 201 and a processor 202.

The memory 201 is used to store non-transitory computer readable instructions. In particular, memory 201 may include one or more computer program products that may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. The volatile memory may include, for example, Random Access Memory (RAM), cache memory (cache), and/or the like. The non-volatile memory may include, for example, Read Only Memory (ROM), hard disk, flash memory, etc.

The processor 202 may be a Central Processing Unit (CPU) or other form of processing unit having data processing capabilities and/or instruction execution capabilities, and may control other components in the rendering device 200 to perform desired functions. In an embodiment of the present application, the processor 202 is configured to execute the computer readable instructions stored in the memory 201, so that the rendering apparatus 200 performs all or part of the steps of the rendering method of the embodiments of the present application.

Embodiments of the present invention also provide a computer storage medium, where computer instructions are stored, and when the computer instructions are executed on a device, the device executes the above related method steps to implement the rendering method in the above embodiments.

Embodiments of the present invention also provide a computer program product, which when run on a computer, causes the computer to execute the above related steps to implement the rendering method in the above embodiments.

In addition, the embodiment of the present invention further provides an apparatus, which may specifically be a chip, a component or a module, and the apparatus may include a processor and a memory connected to each other; the memory is used for storing computer execution instructions, and when the device runs, the processor can execute the computer execution instructions stored in the memory, so that the chip can execute the rendering method in the above-mentioned method embodiments.

The apparatus, the computer storage medium, the computer program product, or the chip provided by the present invention are all configured to execute the corresponding methods provided above, and therefore, the beneficial effects achieved by the apparatus, the computer storage medium, the computer program product, or the chip may refer to the beneficial effects in the corresponding methods provided above, and are not described herein again.

Although the present invention has been described with reference to a preferred embodiment, it should be understood that various changes, substitutions and alterations can be made herein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. A rendering model training method, the method comprising the steps of:

acquiring one or more training data, wherein the training data comprises data to be rendered of a first scene, an image of a second scene with illumination, and a label image of the first scene;

inputting the high-dimensional features into a second machine learning module, wherein the second machine learning module is used for obtaining illumination parameters corresponding to the second scene according to the high-dimensional features;

inputting the illumination parameter and the data to be rendered of the first scene into a differentiable rendering module, wherein the differentiable rendering module is used for rendering the first scene according to the illumination parameter by adopting a differentiable rendering mode to obtain an illumination-fused rendering image;

calculating loss according to the illumination fused rendering image and the label image, calculating a gradient map of the loss, and performing back propagation on the differentiable rendering module, the first machine learning module and the second machine learning module so as to update the weights of the first machine learning module and the second machine learning module by using a gradient descent mode;

and after multiple rounds of training iteration, obtaining a rendering model comprising the first machine learning module, the second machine learning module and the differentiable rendering module after a target condition is reached.

2. The rendering model training method of claim 1,

the rendering the first scene according to the illumination parameter by adopting a differentiable rendering mode comprises the following steps: when calculating the objective function, calculating an integral relative to the input parameter to obtain an output result;

the back propagating the differentiable rendering module, the first machine learning module, and the second machine learning module comprises: when the objective function is propagated in the reverse direction, the loss corresponding to the output result is differentiated with respect to the input parameter.

3. The rendering model training method of claim 1,

the illumination parameters include one or more of number of light sources, illumination type, location, illumination intensity, color, orientation, size of the respective light sources.

4. The rendering model training method of claim 1,

in the first scene, virtual objects are contained or both the virtual objects and real objects in the real scene are included; in the second scene, real objects in a real scene are included.

5. The rendering model training method of claim 1,

the first machine learning module is a deep learning module and comprises k down-sampling layers and 2k cyclic convolution layers, wherein k is a natural number.

6. The rendering model training method of claim 1,

the second machine learning module is a multilayer perceptron module and comprises three neural network blocks, and each neural network block shares parameters; wherein each neural network block comprises a first hidden layer, a first active layer, a second hidden layer, a second active layer and a third hidden layer which are arranged in sequence.

7. The rendering model training method of claim 1, wherein the rendering the first scene according to the illumination parameter comprises:

and performing global illumination rendering on the first scene according to the illumination parameters.

8. A method of rendering, the method comprising the steps of:

inputting data to be rendered of a third scene into a differentiable rendering module in a trained rendering model obtained by the rendering model training method according to any one of claims 1 to 7, and inputting an image of a fourth scene into a first machine learning module in the rendering model, so as to extract an illumination parameter from the image of the fourth scene and render the third scene according to the illumination parameter, thereby obtaining an illumination-fused rendered image of the third scene.

9. A rendering model training apparatus, the apparatus comprising:

an acquisition module, configured to acquire one or more training data, where the training data includes data to be rendered for a first scene, an image of a second scene with illumination, and a label image of the first scene; and

a training module to: inputting the image of the second scene into a first machine learning module, wherein the first machine learning module is used for extracting high-dimensional features corresponding to the image of the second scene, and the high-dimensional features are used for representing information of objects and/or environments contained in the image of the second scene; inputting the high-dimensional features into a second machine learning module, wherein the second machine learning module is used for obtaining illumination parameters corresponding to the second scene according to the high-dimensional features; inputting the illumination parameter and the data to be rendered of the first scene into a differentiable rendering module, wherein the differentiable rendering module is used for rendering the first scene according to the illumination parameter by adopting a differentiable rendering mode to obtain an illumination-fused rendering image; calculating loss according to the illumination fused rendering image and the label image, calculating a gradient map of the loss, and performing back propagation on the differentiable rendering module, the first machine learning module and the second machine learning module so as to update the weights of the first machine learning module and the second machine learning module by using a gradient descent mode; and after multiple rounds of training iteration, obtaining a rendering model comprising the first machine learning module, the second machine learning module and the differentiable rendering module after a target condition is reached.

10. A rendering apparatus, characterized in that the apparatus comprises:

an inference module to: inputting data to be rendered of a third scene into a differentiable rendering module in a trained rendering model obtained by the rendering model training device according to claim 9, and inputting an image of a fourth scene into a first machine learning module in the rendering model, so as to extract an illumination parameter from the image of the fourth scene and render the third scene according to the illumination parameter, thereby obtaining an illumination-fused rendering image of the third scene.

11. A rendering model training apparatus comprising:

a memory for storing non-transitory computer readable instructions; and

a processor for executing the computer readable instructions such that the computer readable instructions, when executed by the processor, implement the rendering model training method of any of claims 1 to 7.

12. A rendering device, comprising:

a memory for storing non-transitory computer readable instructions; and

a processor for executing the computer readable instructions such that the computer readable instructions, when executed by the processor, implement the rendering method of claim 8.

13. A computer storage medium comprising computer instructions which, when run on an apparatus, cause the apparatus to perform the rendering model training method of any one of claims 1 to 7 or the rendering method of claim 8.