CN114494569B

CN114494569B - Cloud rendering method and device based on lightweight neural network and residual streaming

Info

Publication number: CN114494569B
Application number: CN202210100409.5A
Authority: CN
Inventors: 王锐; 霍宇驰; 鲍虎军; 林子豪
Original assignee: Guangguangyun Hangzhou Technology Co ltd
Current assignee: Guangguangyun Hangzhou Technology Co ltd
Priority date: 2022-01-27
Filing date: 2022-01-27
Publication date: 2023-09-19
Anticipated expiration: 2042-01-27
Also published as: CN114494569A

Abstract

The invention discloses a cloud rendering method and a cloud rendering device based on a lightweight neural network and residual streaming, wherein the cloud uses an image prediction model constructed based on the lightweight neural network to perform image prediction calculation on input data and output a predicted low-quality image; meanwhile, a high-quality image is generated based on three-dimensional scene data, image residual errors of the high-quality image and a low-quality image predicted by a cloud are calculated, the image residual errors are compressed by a streaming data compression method and then transmitted to a terminal, the terminal performs image prediction calculation on input data by using an image prediction model constructed based on a lightweight neural network, a predicted low-quality image is output, the received image residual errors are superimposed on the low-quality image predicted by the terminal, and a reconstruction result of a current frame is obtained and is used as a final rendering diagram of the current frame. The method and apparatus obtain real-time high quality images while maintaining low delay and high frame rate on the terminal without requiring high transmission bandwidth by introducing streaming image residuals.

Description

Cloud rendering method and device based on lightweight neural network and residual streaming

Technical Field

The invention belongs to the field of real-time rendering, and particularly relates to a cloud rendering method and device based on a lightweight neural network and residual streaming.

Background

With the development of deep learning technology, attempts have been made to use deep neural networks to complete the work of conventional graphics rendering pipelines, with some achievements. However, in order to obtain a high-quality and high-resolution picture, the network size is very large, so that the time required for performing one prediction by the neural network is long, and the used video memory is large, so that the neural network cannot be operated on the mobile device in real time. If the network size is reduced to reduce the network consumption of resources, the predicted picture quality is greatly compromised.

In the traditional rendering method, people try to solve the problem that the computing power of mobile terminal equipment is weak by utilizing the strong computing power of the cloud. As disclosed in patent document CN1856819a, a method of performing a rendering operation on a server with high computing power and then compressing and transmitting the result to the terminal, but real-time transmission of high image quality has a problem of transmission delay, i.e., a technical challenge of how to guarantee higher image quality while maintaining low delay and high frame rate on the terminal.

In view of the problem of transmission delay, patent document CN101971625a discloses a streaming (video streaming) method for compressing streaming interactive video, which solves the problem of transmission delay by improving the streaming (video streaming) method based on conventional video coding. But introduces a new problem that the real-time transmission of high frame rate, high resolution pictures requires the use of a large amount of network bandwidth, which makes the method unsuitable for practical application in three-dimensional scenes. The reason for this problem is that these methods are all based on the premise that the final result is plotted as transmitted over the network.

Disclosure of Invention

In view of the above, an object of the present invention is to provide a cloud rendering method and apparatus based on lightweight neural network and residual streaming, which can obtain real-time high-quality images while maintaining low delay and high frame rate on a terminal without requiring high transmission bandwidth by introducing streaming image residual.

In order to achieve the above object, the embodiment of the present invention provides a cloud rendering method based on lightweight neural network and residual streaming, including the following steps:

the cloud uses an image prediction model constructed based on a lightweight neural network to perform image prediction calculation on input data and output a predicted low-quality image; meanwhile, a high-quality image is generated based on three-dimensional scene data, image residual errors of the high-quality image and a low-quality image predicted by a cloud are calculated, the image residual errors are compressed by a streaming data compression method and then transmitted to a terminal, wherein input data comprise illumination information, three-dimensional scene geometric information, three-dimensional scene motion information and a reconstruction result of a historical frame, which are obtained based on the three-dimensional scene data;

the terminal utilizes an image prediction model constructed based on a lightweight neural network to perform image prediction calculation on input data, outputs a predicted low-quality image, and superimposes received image residual errors on the low-quality image predicted by the terminal to obtain a reconstruction result of the current frame as a final rendering image of the current frame.

In one embodiment, the generating a high quality image based on rendering a three-dimensional scene includes: and rendering the three-dimensional scene data by using a rendering pipeline to obtain a high-quality image.

In one embodiment, the generating a high quality image based on rendering a three-dimensional scene includes: and (3) performing image prediction calculation on the input data by using an image generation model constructed based on the complex neural network, and outputting a high-quality image.

In one embodiment, the illumination information comprises light source parameters, wherein the light source parameters comprise at least one of a light source type, a light source shape, a light source position, an illumination direction, an illumination intensity, an ambient light map.

In one embodiment, the illumination information includes a coded vector of light source parameters that have been coded.

In one embodiment, the three-dimensional scene geometry information is derived from a rendering pipeline of three-dimensional scene data, including position maps, depth maps, normal maps, texture maps.

In one embodiment, the three-dimensional scene motion information is obtained from a rendering pipeline of three-dimensional scene data, including three-dimensional scene motion information between two adjacent frames, or three-dimensional scene motion information between a current frame and each historical frame.

In one embodiment, when the image prediction model parameters constructed based on the lightweight neural network are optimized independently, input data is used as sample data, a high-quality image obtained by using a rendering pipeline for three-dimensional scene data is used as a label of the sample data, the difference between a prediction result corresponding to the sample data and the label is minimum, and the transmission bandwidth of an image residual error is combined as small as possible as an optimization target to optimize the image prediction model parameters.

In one embodiment, when the image generation model parameters constructed based on the complex neural network are optimized independently, input data is used as sample data, a high-quality image obtained by using a rendering pipeline for three-dimensional scene data is used as a label of the sample data, and the difference between a prediction result corresponding to the sample data and the label is minimum as an optimization target to optimize the image generation model parameters.

In one embodiment, when optimizing the image prediction model parameters, the Loss function Loss corresponding to the optimization objective is:

where i is the image index, loss _i For the difference between the prediction result corresponding to the ith sample data and the label, B represents the average/peak bandwidth required for transmitting the series of k image residuals, λ is a weight parameter, and is a real number greater than 0, and the greater the value, the smaller the bandwidth required for transmitting the image residuals.

In one embodiment, the image residual is converted into an integer of 0-255 through linear transformation to obtain a residual picture, and a series of residual pictures are entropy coded by adopting a streaming data compression method and then transmitted to a terminal; and the terminal decodes the received residual image stream and then obtains an image residual by using the inverse transformation of linear transformation, and the image residual is used for reconstructing the image of the current frame.

In order to achieve the above object, the present invention further provides a cloud rendering device based on a lightweight neural network and residual streaming, which includes a cloud and a terminal for realizing data transmission with the cloud;

the cloud end is provided with an image prediction model constructed based on a lightweight neural network, and is used for performing image prediction calculation on input data and outputting a predicted low-quality image; the method comprises the steps of generating a high-quality image based on three-dimensional scene data, calculating image residual errors of the high-quality image and a low-quality image predicted by a cloud, compressing the image residual errors through a streaming data compression method, and transmitting the compressed image residual errors to a terminal, wherein input data comprise illumination information, three-dimensional scene geometric information, three-dimensional scene motion information and a reconstruction result of a historical frame, which are obtained based on the three-dimensional scene data;

the terminal is provided with an image prediction model constructed based on a lightweight neural network, and is used for performing image prediction calculation on input data and outputting a predicted low-quality image; and the method is also used for superposing the received image residual error on the low-quality image predicted by the terminal to obtain a reconstruction result of the current frame as a final rendering diagram of the current frame.

Compared with the prior art, the cloud rendering method and the cloud rendering device based on the lightweight neural network and residual streaming provided by the embodiment obtain a high-quality image from the cloud, calculate the image residual of a low-quality image obtained in the same mode as the high-quality image and the local image, compress the image residual by a streaming data compression method and transmit the compressed image residual to the terminal, and greatly reduce the requirement on network bandwidth due to small data quantity of the image residual, so that low delay and high frame rate of data transmission are ensured; the terminal generates a low-quality image with small calculation consumption, contains most of information of a three-dimensional scene, and based on the information, the terminal superimposes the received image residual error on the locally generated low-quality image to obtain a high-quality reconstruction result, so that the high quality of the reconstructed image is ensured, in a word, the picture predicted by the local lightweight neural network is enhanced by using the calculation force of the cloud, and the real-time picture with high quality, high resolution and low delay is reconstructed.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of a lightweight neural network and residual streaming based cloud rendering method provided by an embodiment;

FIG. 2 is a flow chart of a lightweight neural network and residual streaming based cloud rendering method provided by another embodiment;

fig. 3 is a schematic structural diagram of a cloud rendering device based on lightweight neural network and residual streaming according to an embodiment.

Detailed Description

The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the detailed description is presented by way of example only and is not intended to limit the scope of the invention.

In order to solve the problem that the picture quality predicted by an image prediction model constructed by a lightweight neural network at a terminal by a user is low, an embodiment provides a cloud rendering method based on the lightweight neural network and residual streaming. Fig. 1 is a flowchart of a cloud rendering method based on lightweight neural network and residual streaming provided by an embodiment. Fig. 2 is a flow chart of a cloud rendering method based on lightweight neural network and residual streaming, provided by another embodiment. In the flow shown in fig. 1 and 2, the difference is that the cloud in fig. 1 obtains a high-quality image through a rendering pipeline by using a rendering engine, and the cloud in fig. 2 generates the high-quality image by using an image generation model constructed by a complex neural network.

As shown in fig. 1 and 2, the cloud rendering method based on lightweight neural network and residual streaming provided by the embodiment includes the following steps:

and step 1, the cloud end performs image prediction calculation on input data by using an image prediction model constructed based on a lightweight neural network, and outputs a predicted low-quality image.

And 2, generating a high-quality image based on the three-dimensional scene data, calculating image residual errors of the high-quality image and the cloud predicted low-quality image, and transmitting the image residual errors to the terminal after being compressed by a streaming data compression method.

And 3, the terminal performs image prediction calculation on the input data by using an image prediction model constructed based on the lightweight neural network, and outputs a predicted low-quality image.

And 4, the terminal superimposes the received image residual error on the low-quality image predicted by the terminal to obtain a reconstruction result of the current frame as a final rendering diagram of the current frame.

In the embodiment, the image prediction model constructed based on the lightweight neural network and adopted by the cloud end and the terminal are the same, the image prediction model can perform image prediction calculation based on input data, a predicted low-quality image is output, and the low-quality images predicted by the cloud end and the terminal are the same.

In an embodiment, the input data includes illumination information obtained based on three-dimensional scene data, three-dimensional scene geometry information, three-dimensional scene motion information, and reconstruction results of historical frames. Where the illumination information refers to information related to illumination of a three-dimensional scene, in one possible implementation, the illumination information may include light source parameters including at least one of a light source type, a light source shape, a light source position, an illumination direction, an illumination intensity, and an ambient light map. When the light source is a point light source, the light source parameters comprise light source position and illumination intensity, when the light source is a parallel light source, the light source parameters comprise illumination direction and illumination intensity, and when the light source is a surface light source, the light source parameters comprise light source position, light source shape, illumination direction and illumination intensity distribution. When the method is applied, the light source parameters are directly input into an image prediction model for image prediction. In another possible embodiment, the illumination information may include a coded vector in which the light source parameters are coded. And mapping the light source parameters to a certain hidden space by adopting a mode of coding the light source parameters to obtain coding vectors of the light source parameters in the hidden space, and inputting the coding vectors into an image prediction model for image prediction.

The three-dimensional scene geometric information and the three-dimensional scene motion information are taken as intermediate products of a rendering pipeline, and are obtained from the rendering pipeline of the three-dimensional scene data by the rendering engines of the terminal and the cloud. Wherein the three-dimensional scene geometric information comprises a position map, a depth map, a normal map, a material map and the like of an image space; the three-dimensional scene motion information indicates the motion condition of a moving object in the three-dimensional scene, including the three-dimensional scene motion information between two adjacent frames or the three-dimensional scene motion information between the current frame and each historical frame.

The reconstruction result of the historical frame is the reconstruction result of the terminal in the historical time, and can be the reconstruction result of any one historical time or the superposition result of the reconstruction results of a plurality of historical times.

A lightweight neural network can be understood as a neural network with a smaller number of hidden layers and a simpler structure. The lightweight neural network is characterized by less occupied resources and can also run in real time at equipment ends with weaker performance. Because the network structure complexity is low, the output obtained by calculation aiming at the input data is a prediction result with lower quality, namely a low-quality image.

In an embodiment, the lightweight neural network refers to a network with a relatively simple network structure, including a U-Net network with a small number of layers, a Mobile Net, a Rep VGG, and the like, for example, a U-Net network with a number of layers less than or equal to 20.

In embodiments, the cloud may generate high quality images based on three-dimensional scene data in two ways. In one possible implementation, as shown in fig. 1, three-dimensional scene data is rendered using a rendering pipeline, resulting in a high quality image. The rendering engine of the cloud performs rendering operation on the three-dimensional scene data through a rendering pipeline, and outputs complete coloring effects, namely high-quality images, besides the three-dimensional scene geometric information and the three-dimensional scene motion information.

In another embodiment, as shown in fig. 2, an image prediction calculation is performed on input data using an image generation model constructed based on a complex neural network, and a high-quality image is output. The complex neural network is a neural network with more hidden layers, more hidden units and more complex structures, and the fitting capacity of the neural network is increased by increasing the complexity of the neural network, so that the deviation between the predicted result and the real result of the neural network is small.

In an embodiment, the complex neural network is a network with a complex structure relative to the lightweight neural network, including a U-Net network with a large number of layers. For example, U-Net networks with layers exceeding 20.

Both of the two ways provided by the embodiments require more computing resources to complete the high quality picture rendering to the process of generating high quality images. Such computing resource overhead is not borne by the terminal, so that the computing resource overhead is obtained by computing the terminal in the cloud, and the computing pressure of the local terminal is reduced.

In an embodiment, both the image prediction model and the image generation model need to be parameter optimized before being applied online. As shown in fig. 1, when a high-quality image is obtained through a rendering pipeline, only parameters of an image prediction model constructed based on a lightweight neural network need to be optimized, and the specific process includes: the input data is used as sample data, a high-quality image obtained by using a rendering pipeline to the three-dimensional scene data is used as a label of the sample data, the difference between a prediction result corresponding to the sample data and the label is minimum, and the transmission bandwidth of an image residual error is combined as small as possible as an optimization target, so that the image prediction model parameters are optimized.

As shown in fig. 2, when a high-quality image is predicted by an image generation model constructed based on a complex neural network, the image generation model needs to be trained alone to optimize image generation model parameters, and the specific process includes: and optimizing image generation model parameters by taking input data as sample data, taking a high-quality image obtained by utilizing a rendering pipeline to the three-dimensional scene data as a label of the sample data and taking the minimum difference between a prediction result corresponding to the sample data and the label as an optimization target.

In an embodiment, the optimization objective of the parameters of the optimized image prediction model includes 2 parts, the first part is measured by the average bandwidth after the residual stream is compressed, and is denoted as B (bandwidth), and the other part is the difference between the prediction result corresponding to the sample data and the label image, where the difference can be L1, L2, MSE, RMSE, SSIM, and,PSNR and other functional measures, collectively referred to as loss _i . For a video with k frames, the loss function corresponding to the optimization objective is:

where i is the image index, loss _i For the difference between the prediction result corresponding to the ith sample data and the label, B represents the average/peak bandwidth required for transmitting the series of k image residuals, λ is a weight parameter, and is a real number greater than 0, and the greater the value, the smaller the bandwidth required for transmitting the image residuals. By adjusting the size of λ, a trade-off can be made between reconstructed picture quality and the bandwidth required to transmit residual information.

When loss is low _i When measured by L2 norm, the corresponding loss function is

Wherein i is an image index, C _i For reconstruction result, GT _i In the case of a label image, I.I ₂ Representing the 2 norms of the matrix element form.

In an embodiment, the optimization targets of the optimized image generation model parameters include the difference between the prediction result corresponding to the sample data and the label image, and the difference can be measured by functions such as L1, L2, MSE, RMSE, SSIM, PSNR, and the like, and are collectively called loss _i 。

In the embodiment, the image residual error of the high-quality image and the low-quality image predicted by the cloud end refers to a result obtained by performing pixel-by-pixel difference on a picture with the same resolution of the high-quality image and the low-quality image, and because the residual error has positive or negative value, in the embodiment, the calculated image residual error can be directly transmitted, so that the HDR image is corresponding. Of course, in one possible implementation, a linear transformation is first required to be converted to an integer of 0-255 before streaming, so as to obtain a residual picture corresponding to the LDR image. And then a series of residual pictures are entropy coded by adopting a stream data compression method such as H265 and then transmitted to a terminal. The terminal decodes the residual stream sent by the cloud to obtain a residual picture of the current frame, decodes the residual in the residual picture, and then uses the inverse transformation of linear transformation to obtain an original image residual. And then overlapping the low-quality image predicted by the terminal with the residual error to obtain a reconstruction result.

Because the obtained residual pictures have certain space-time continuity, a stream data compression method can be adopted to reduce the redundancy in time and space among a series of residual pictures and reduce the bandwidth required by transmission. And simultaneously, in the process of optimizing the model parameters, the second part of optimization is adopted, so that the bandwidth requirement in the process of transmitting residual errors is further reduced.

Fig. 3 is a schematic structural diagram of a cloud rendering device based on lightweight neural network and residual streaming according to an embodiment. As shown in fig. 3, the cloud rendering device provided by the embodiment includes a cloud end and a terminal, wherein the cloud end is deployed with an image prediction model constructed based on a lightweight neural network, and is used for performing image prediction calculation on input data and outputting a predicted low-quality image; the method comprises the steps of generating a high-quality image based on three-dimensional scene data, calculating image residual errors of the high-quality image and a low-quality image predicted by a cloud, compressing the image residual errors through a streaming data compression method, and transmitting the compressed image residual errors to a terminal, wherein input data comprise illumination information, three-dimensional scene geometric information, three-dimensional scene motion information and a reconstruction result of a historical frame, which are obtained based on the three-dimensional scene data;

the terminal is provided with an image prediction model constructed based on a lightweight neural network, and is used for carrying out image prediction calculation on input data and outputting a predicted low-quality image; and the method is also used for superposing the received image residual error on the low-quality image predicted by the terminal to obtain a reconstruction result of the current frame as a final rendering diagram of the current frame.

It should be noted that, the cloud rendering device provided by the embodiment and the cloud rendering method provided by the foregoing embodiment belong to an invention conception, in the cloud rendering device, an input data acquisition mode, an image prediction model construction mode, a high-quality image generation mode and a terminal and cloud matched image rendering mode are the same as those of the cloud rendering method provided by the foregoing embodiment, and are not repeated herein.

According to the cloud rendering method and the cloud rendering device, the cloud obtains the high-quality image, calculates the image residual error of the low-quality image obtained in the same mode as the high-quality image and the local image, compresses the image residual error through the streaming data compression method and transmits the compressed image residual error to the terminal, and the requirement on network bandwidth is greatly reduced due to the small data size of the image residual error, and meanwhile, low delay and high frame rate of data transmission are guaranteed; the terminal generates a low-quality image with small calculation consumption, contains most of information of a three-dimensional scene, and based on the information, the terminal superimposes the received image residual error on the locally generated low-quality image to obtain a high-quality reconstruction result, so that the high quality of the reconstructed image is ensured, in a word, the picture predicted by the local lightweight neural network is enhanced by using the calculation force of the cloud, and the real-time picture with high quality, high resolution and low delay is reconstructed.

The foregoing detailed description of the preferred embodiments and advantages of the invention will be appreciated that the foregoing description is merely illustrative of the presently preferred embodiments of the invention, and that no changes, additions, substitutions and equivalents of those embodiments are intended to be included within the scope of the invention.

Claims

1. A cloud rendering method based on lightweight neural network and residual streaming, characterized by comprising the steps of:

when the image prediction model parameters constructed based on the lightweight neural network are optimized independently, input data are used as sample data, a high-quality image obtained by using a rendering pipeline to three-dimensional scene data is used as a label of the sample data, the difference between a prediction result corresponding to the sample data and the label is minimum, and the transmission bandwidth of an image residual error is combined as small as possible as an optimization target to optimize the image prediction model parameters;

2. The lightweight neural network and residual streaming based cloud rendering method of claim 1, wherein the generating high quality images based on three-dimensional scene data comprises: and rendering the three-dimensional scene data by using a rendering pipeline to obtain a high-quality image.

3. The lightweight neural network and residual streaming based cloud rendering method of claim 1, wherein the generating high quality images based on three-dimensional scene data comprises: and (3) performing image prediction calculation on the input data by using an image generation model constructed based on the complex neural network, and outputting a high-quality image.

4. The lightweight neural network and residual streaming based cloud rendering method of claim 1, wherein the illumination information comprises light source parameters, wherein the light source parameters comprise at least one of light source type, light source shape, light source location, illumination direction, illumination intensity, ambient light map.

5. The cloud rendering method based on lightweight neural network and residual streaming according to claim 1, wherein the illumination information includes encoded vectors of light source parameters subjected to encoding processing.

6. The cloud rendering method based on lightweight neural network and residual streaming according to claim 1, 4 or 5, wherein the three-dimensional scene geometry information is obtained from a rendering pipeline of three-dimensional scene data, including position map, depth map, normal map, texture map;

the three-dimensional scene motion information is obtained from a rendering pipeline of the three-dimensional scene data and comprises three-dimensional scene motion information between two adjacent frames or three-dimensional scene motion information between a current frame and each historical frame.

7. A cloud rendering method based on lightweight neural network and residual streaming according to claim 3, wherein when the image generation model parameters constructed based on complex neural network are optimized individually, input data is used as sample data, a high-quality image obtained by using a rendering pipeline for three-dimensional scene data is used as a label of the sample data, and a difference between a prediction result corresponding to the sample data and the label is minimized as an optimization target, so as to optimize the image generation model parameters.

8. The cloud rendering method based on lightweight neural network and residual streaming according to claim 1, wherein when optimizing image prediction model parameters, a loss function corresponding to an optimization objectiveLossThe method comprises the following steps:

；

wherein i is the index of the image,Loss _i for the difference between the prediction result corresponding to the i-th sample data and the label,Bthe average/peak bandwidth required to transmit the series of k image residuals is represented as a weight parameter, and the larger the value is, the smaller the bandwidth required to transmit the image residuals is represented as a real number greater than 0.

9. The cloud rendering method based on the lightweight neural network and residual streaming according to claim 1, wherein the image residual is converted into an integer of 0-255 through linear transformation to obtain a residual picture, and a series of residual pictures are entropy coded by adopting a streaming data compression method and then transmitted to a terminal; and the terminal decodes the received residual image stream and then obtains an image residual by using the inverse transformation of linear transformation, and the image residual is used for reconstructing the image of the current frame.

10. The cloud rendering device based on the lightweight neural network and the residual streaming is characterized by comprising a cloud end and a terminal for realizing data transmission with the cloud end;